Modeling Locations with Social Media

Feb 1, 2013

In this paper we focus on the locations explicit and implicit in users' descriptions of their surroundings. We propose a statistical language modeling approach to identifying locations in arbitrary text, and investigate several ways to estimate the models, based on the term frequency and the user frequency.  The geotagged public photos in Flickr serve as a convenient ground truth.  Our results show that we can predict location within a 1 km by 1 km cell with 17 percent accuracy, and within a 3 km radius around such a 1 km cell with 40 percent accuracy, using only a photo's tags.  This is significantly better than the state of the art.  Further we examine several estimation strategies that leverage the physical proximity of places, and show that for sparsely represented locations, smoothing from the immediate neighborhood improves results. We also show that estimation strategies based on user frequency are much more reliable than approaches based on the raw term frequency.

  • Information Retrieval