In this analysis, I have scraped Twitter data for all 30 major league baseball teams and have given the tweet
a positve or negative rating based off of Hu & Liu’s opinion lexicon.
For example, a tweet with 5 positve words and 3 negative words will be given a sentiment score of +2.
Each team is searched for in Twitter by using the team's official hashtag followed by the team name. Teams with common names or
with the same name as a team from another sport were included with the addition of the word ‘mlb’.
For this analysis, 1,500 tweets were scraped for each team.
Although the season is nearly two weeks over, playoff teams tend to have a higher positive rating than non-playoff
teams. The Royals, which were searched for by using both ‘#royals’ and ‘mlb’ in order to eliminate any common tweets with
the word ‘royals’, are heavily weighted with positive tweets. After reading through tweets about the Royals,
there seems to be a lot of optimism already about free agency and the 2015 season.
Teams with more negative tweets may have had a player reject an offer for next year. For example, while running
the code, reports came out that Francisco Liriano rejected an offer from the Pirates, which stirred up some negative backlash and
negative words in tweets that contained '#pirates'. Another example is with the Dodgers, which have a negative trend because there were
reports that Hanley Ramirez rejected an offer previous to me running the code.
Although interesting to see trends, it is important to note that only 1,500 tweets were scraped for each team over
a specific period of time. This analysis is meant to show a demonstration of scraping Twitter data and does not reflect the
general fan base for each team.
The code can be seen here.
The top 6 bar graphs show the count of tweets for each team on the y-axis and the Twitter sentiment
score on the x-axis, broken down by division. For example, the Blue Jays had roughly 800 of the 1500
tweets with a sentiment score of 0, about 500 with a sentiment score of +1, about 50 of the tweets had a
sentiment score of +2 and so on.
The bottom 6 box plots have the team's Twitter sentiment score on the y-axis and the team name on the x-axis.
There are 1,500 dots for each team with each dot representing the score given for each tweet. The larger the height
of the box plot, the larger the distribution between positive and negative tweets.
NOTE: All data was scraped on 11/10/14
By Danny Malter
AriBall.com