Webscope Adds 50th Dataset

Feb 24, 2014

Yahoo is one of the largest Internet destinations on the planet. So it goes without saying that we have an immense amount of varied data. At Yahoo Labs, we constantly strive to advance the state of knowledge and understanding in web sciences. We very much believe that one of the best ways to progress is by being collaborative and open. That is why we are very pleased to now share our 50th dataset in our Webscope program. The Yahoo Webscope Program is a reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists. All datasets have been reviewed to conform to Yahoo’s data protection standards, including strict controls on privacy. We offer data in the following categories: Graph and Social Data, Ratings and Classification Data, Advertising and Market Data, Competition Data, Computing Systems Data, Image Data, and Language Data. Our newest dataset is Yahoo Search Query Log To Entities. With this dataset you can train, test, and benchmark entity linking systems on the task of linking web search queries – within the context of a search session – to entities. Entities are a key enabling component for semantic search, as many information needs can be answered by returning a list of entities, their properties, and/or their relations. A first step in any such scenario is to determine which entities appear in a query – a process commonly referred to as named entity resolution, named entity disambiguation, or semantic linking. The Yahoo Search Query Log To Entities dataset allows researchers and other practitioners to evaluate their systems for linking web search engine queries to entities. The dataset contains manually identified links to entities in the form of Wikipedia articles and provides the means to train, test, and benchmark such systems using manually created, gold standard data. By releasing this dataset publicly, we aim to foster research into entity linking systems for web search queries. To date, we have accommodated nearly 12,000 requests for datasets at over 1,300 universities in 94 countries. You can learn more about Webscope and request our datasets here. We hope you find this resource useful.