Build Your Own Music Recommender by Modeling Internet Radio Streams

Jan 1, 2012

Abstract: In the Internet music scene, where recommendation technology is key for navigating huge collections, large market players enjoy a considerable advantage. Accessing a wider pool of user feedback leads to an increasingly more accurate analysis of user tastes, effectively creating a ``rich get richer'' effect. This work aims at significantly lowering the entry barrier for creating music recommenders, through a paradigm coupling a public data source and a new collaborative filtering (CF) model. We claim that Internet radio stations form a readily available resource of abundant fresh human signals on music through their playlists, which are essentially cohesive sets of related tracks. In a way, our models rely on the knowledge of a diverse group of experts in lieu of the commonly used wisdom of crowds. Over several weeks, we aggregated publicly available playlists of thousands of Internet radio stations, resulting in a dataset encompassing millions of plays, and hundreds of thousands of tracks and artists. This provides the large scale ground data necessary to mitigate the cold start problem of new items at both mature and emerging services. Furthermore, we developed a new probabilistic CF model, tailored to the Internet radio resource. The success of the model was empirically validated on the collected dataset. Moreover, we tested the model at a cross-source transfer learning manner -- the same model trained on the Internet radio data was used to predict behavior of Yahoo Music users. This demonstrates the ability to tap the Internet radio signals in other music recommendation setups. Based on encouraging empirical results, our hope is that the proposed paradigm will make quality music recommendation accessible to all interested parties in the community. Download: BootstrapMusicRecommendation WWW2012.pdf ACM COPYRIGHT NOTICE. Copyright © 2012 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or

  • WWW'2012, Lyon, France