The Y! Knowledge Base: Making Knowledge Reusable at Yahoo!

Jun 2, 2013

We introduce the Yahoo! Knowledge Base, a lightweight unified knowledge graph of all the concepts and entities we care about at Yahoo! that provides key information about entities and how they relate to each others. We acquire, extract, and mine facts and relationships about entities from multiple complementary sources on an ongoing basis, automatically. We use open data sources such as Wikipedia and closed data sources from paid providers. We store these informations persistently and uniformly in a central knowledge base. Entities are aligned against a common ontology, and their metadata mapped and normalized to standard schemas. We use editorial curation and algorithmic reconciliation (heuristics, machine learning) to turn siloed, incomplete, inconsistent, and possibly inaccurate, informations into a rich, unified, disambiguated knowledge graph. Informations are reviewed, matched, deduped and blended together. We make this knowledge accessible via online search APIs and data exports for large-scale offline data processing. The platform manages millions of interconnected entities and run on top of Hadoop clusters and Graph databases. It supports content analysis, user profiling, semantic search, and knowledge-based apps at Yahoo!

  • Semantic Technology and Business Conference (SemTech 2013)
  • Conference/Workshop Paper