Yahoo Labs Targeting Sciences Group Uses Tumblr to Predict NFL Season

Sep 3, 2014

by Nemanja Djuric, Vladan Radosavljevic, and Mihajlo Grbovic The summer of soccer is behind us, and sports fans across the U.S. can finally turn their attention to real football (that is, American football). After more than seven months of silence, the NFL stage is set for a new season of blood, sweat, and data. Yes, data. Everybody hopes to see his or her favorite team clinch the coveted Vince Lombardi Trophy, but data-driven predictions are another matter. And predictions are all the more fun when you add social media to the mix. Following the success of our World Cup predictor where we correctly forecasted three out of four semifinalists using specific Tumblr chatter, Yahoo Labs is once again using the power of data science to bring you an answer to the only question that really matters this season: Who will win? Our statistical analysis includes Tumblr posts from May through August, which we used to create a machine learning predictor based on the popularity of each team and its players according to Tumblr’s 200+ million blogs. image The first step in creating our predictor was to isolate NFL-related Tumblr posts using NFL-related hashtags, including #nfl, #american_football, #offseason, and #football, found through state-of-the-art tag-clustering technology. Then, we counted the number of team mentions in those posts using only their short names (e.g., Eagles or 49ers) as a measure of popularity of the given team on the social network. In addition, we searched all the Tumblr content for full team names (e.g., Philadelphia Eagles or San Francisco 49ers). The popularity of the teams computed in this way is represented by the following two graphs: image Further, we took the players from each team and computed each player’s individual popularity on Tumblr. Finally, we combined the aforementioned calculations with NFL game outcomes from 2013 and trained two statistical models that separately predicted the number of touchdowns and fields goals each team would score against its opponent, factoring in whether a team plays at home or away. For more details about the mathematics behind our approach, please see “Goalr! The Science of Predicting the World Cup on Tumblr” and our associated technical paper. When we put this plethora of data together, we were able to calculate the winner of every game in the 2014 season, as well as the overall Super Bowl champion. And, in answer to the initial question, we determined the Tumblr community believes the Denver Broncos will reign victorious. Don’t agree? Then make your voice heard on Tumblr and you could change the outcome. Let the games begin! image Week 1 schedule and predicted results: image