New Product Yahoo Recommends Amplifies the Science of Personalization at Mass Scale

Dec 10, 2014

By: Nathan Liu and Suju Rajan

There is no disputing it: When it comes to driving user engagement and retention for media content online, nothing beats a personalized approach. At Yahoo Labs, we’re dedicated to presenting the most relevant and engaging content to each of Yahoo’s users with a robust scientific approach, immediately and at scale. And though we’ve been doing this for some time across Yahoo’s homepage and various properties, this year, we really set out to up our game. Our result–in conjunction with the Recommends Engineering and Product team–is called Yahoo Recommends.

Yahoo Recommends is a recently-released product that generates personalized content recommendations and a native advertising experience for users. The product makes it easier for users to discover a publisher’s content and relevant native advertising units on a publisher’s site. As of today, the product produces recommendations across hundreds of websites from several prominent digital media publishers including CBS, Vox, and Hearst. Our recommendations are resulting in millions of clicks on these partners’ sites every day. At the same time, Yahoo has the opportunity to monetize the space via native ads in the recommendation modules and to better understand user behavior on partner sites.

The most impressive fact about Yahoo Recommends is that it is self-tuning and easy for partners to setup. A publishing partner can create a module any time by filling out a simple form on the portal. They can then deploy the module on their site by including a JavaScript snippet in their code which triggers content crawling and event logging and invokes the visible module on the page. Our machine learning pipeline then starts producing signals and models fully customized to each specific module. Where it previously took multiple teams of data scientists several months to roll out a personalized module on a case-by-case basis, the entire process is now fully automated with no need for custom code. Best of all, the system takes about an hour to start producing relevant recommendations. image So how does the Yahoo Recommends team manage to operate at mass scale? Reflecting on the broad range of personalization applications that have been worked on in the past, we realized there is a common framework underlying most personalization systems that consists of several classes of relevance signals and scoring based on machine-learned ranking. We then designed a rich collection of input signals in an attempt to characterize three types of relevance:
  • Universal relevance: Certain types of content such as breaking news stories (e.g., a virus outbreak) are things that most users want to know about regardless of their topical interests. We try to measure this by the trendiness and popularity of a news item. We track dozens of time series on how users are engaging with each piece of content in the module, on a publisher site, on social media, and on search engines.

  • Contextual relevance: Users see our recommendations in different contexts. For example, if a user is currently reading an iPhone 6 review, it would certainly be meaningful to recommend other articles about smartphones or Apple products. In order to identify contextually-relevant content, we leverage our internal knowledge graph to measure content relatedness as well as a collaborative filtering style approach to uncover novel patterns about “people who read X who would also read Y.”

  • Personal relevance: As one of the top destinations on the Web, there is significant overlap in the user base between Yahoo and our partner sites. This puts us in a unique position of not suffering much from the cold start problem for user understanding. However, we also need to address a new challenge of catering to changing user interests as people move about from one site to another. For example, a person is interested in smartphones; for gadget reviews her top choice is, but when it comes to news about the mobile industry, becomes her go-to destination. In order to show the right content at the right place, we need to understand both her interests, and when and where she should be served a particular item. To do that, we leverage a type of machine-learned model known as matrix/tensor factorization to incorporate context awareness into user understanding.

In the end, nearly hundreds of millions of raw signals are produced, which are blended by a machine-learned ranking (MLR) function. The blending has to have an element of adaptive learning since the same signal may have different effects in different publisher modules. For example, the recency of content matters more for a sports news site than for a food recipe site. Therefore, we developed a powerful MLR platform using a distributed online learning algorithm which incrementally and rapidly self-updates hundreds of ranking models by learning from the logged events for each Recommends module.

Yahoo Recommends tests the limits of context-aware recommendations. The wealth of signals and data, the dynamic nature of the content that needs to be recommended, and the plethora of publisher modules takes personalization to a whole new level.