The Yahoo! Knowledge Graph

Aug 20, 2014

We present the Yahoo! Knowledge Graph, a platform designed to build, maintain, and serve a unified knowledge graph of all the entities and concepts we care about at Yahoo! It is designed to support knowledge-based applications across the company: Web Search, Media Verticals, Content Understanding, Personalization, Advertisement, etc. The resulting knowledge graph provides key information about entities (i.e. attributes, relationships, features, links to content) as well as interlinking across data sources. Typical usages include: searching and displaying information about entities; recognizing entities in context; connecting entities to content and data sources; and discovering and recommending related information. We acquire and extract informations about entities from multiple complementary sources on an ongoing basis using simple information extraction techniques. We leverage open data sources such as Wikipedia as well as closed data sources from paid providers. We store these informations uniformly in a central knowledge repository where entities and their attributes and relationships are categorized, normalized, and validated against a common ontology using a generalized and scalable framework. We use machine learning techniques to disambiguate and blend together entities that co-refer to the same real-world objects, eventually turning siloed, incomplete, inconsistent, and possibly inaccurate informations into a rich, unified, disambiguated knowledge graph. We have a plugin system to enrich the graph with inferred information useful for the applications we support. We also leverage editorial curation for hot fixes. We provide access to our knowledge graph via APIs. We also generate data exports on an ongoing basis for large-scale offline data processing. The Yahoo! Knowledge platform manages millions of interconnected entities and relationships, and runs on top of distributed storage and data processing systems.

  • Semantic Technology and Business Conference (SemTech 2014)
  • Conference/Workshop Paper