Announcing a new Webscope dataset

Aug 28, 2020

In our paper “Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph” presented at the Wiki Workshop held at The Web Conference 2020, we described a novel framework for recommending related entities given an entity as input. 

As part of this work, we generated a dataset that consists of a large, normalized, entity graph generated from Wikipedia by aggregating hyperlinks between Wikipedia pages across languages (10 million vertices and 998 million edges, each with some extra features), the corresponding entity embeddings trained from the graph using the lg2vec method (10 million vectors  of dimension 200), and a labeled dataset consisting of 45k query entities and their list of recommended related entities that can be used as ground truth for training and evaluating related-entity recommendation systems. 

Today we are making this dataset, the “Wikipedia Graph and Related Entity Recommendation Dataset”, available to academics via our Webscope data-sharing program to further advance research in graph mining and entity recommendation.