Congratulations 2019 Faculty and Research Engagement Program (FREP) Recipients!

Yahoo Research is excited to announce the 2019 Faculty and Research Engagement Program (FREP) recipients.

Yahoo Faculty Research and Engagement Program FAQs

Here’s the answers to some common questions that you might have….

Yahoo FREP Preview of the proposal submission form (as text)

Here’s the information you’ll need to gather prior to your FREP proposal submission.

Yahoo Faculty Research and Engagement Program 2019 - OPEN FOR SUBMISSIONS!

We’re excited to launch the 2019 Yahoo Faculty Research and Engagement Program (FREP). This is a call for proposals for unrestricted Research Grants in various research areas.

Introducing the Yahoo News Ranked Multi-label Corpus, a Novel Dataset to Improve Multilabel Learning

Today we announced the availability of the Yahoo News Ranked Multi-label Corpus (YNMLC), a novel dataset to improve multilabel learning. YNMLC is the latest of 60+ datasets that we make available to academic researchers as part of our Webscope data sharing program. Our YNMLC corpus provides raw text so that researchers can extract their own features that are best for their algorithms. Apart from that, to the best of our knowledge, our corpus is the only one that provides a ranking of the labels for each document in terms of its importance. YNMLC is one of the few large-scale, expertly manually-labeled (by Yahoo News editors) datasets addressing the task of MLL. 

HBase Goes Fast and Lean with the Accordion Algorithm

The Scalable Systems research team at Yahoo is contributing a significant enhancement to the forthcoming release of Apache HBase 2.0 called Accordion, an algorithm that simultaneously improves performance speed and disk space, two metrics typically considered at odds. Accordion optimizes RAM by re-organizing in-memory data in efficient data structures and reducing redundancies. The HBase server’s memory footprint then periodically expands and contracts (like an Accordion) to produce a longer lifetime of data in memory, less I/O, and overall faster performance.    

Researching the Definition of Good Online Conversations and How They Should Rank with the Yahoo News Annotated Comments Corpus

In recent statistical experiments looking at online comment threads, Yahoo Research shows that automatically identifying and ranking good conversations on top will cultivate a more civil and constructive atmosphere in online communities and potentially encourage participation from more users. In an effort to foster more respectful online discussions and encourage more research among academics surrounding comments, we present the Yahoo News Annotated Comments Corpus (YNACC) via our data sharing program, Webscope.

Adapting to the Evolution of Email with Machine-Generated Mail Mining

As a result of recent research, we know that machines account for 90% of all mail traffic. These machine-generated messages, whether they are purchase receipts, flight reservations, or something else, contain loads of personal information. At Yahoo Research, we have developed a new classifying technology that distinguishes between human- and machine-generated mail that has allowed us to unpack that data in a meaningful ways to advance the mail experience for our users.

Reinventing Mail Search and Solving the Catch 22 Dilemma Between Precision and Recall

How many times have you tried to search for an email and not been able to find it? You might think that getting the right email search result is as easy as getting a good search result on the Web, where you often find what you’re looking for on the first try. Unfortunately, experience tells us otherwise. The reason? Mail search is, perhaps surprisingly, an entirely different animal. At Yahoo Research, we’ve been working hard at changing this situation.

Introducing Similarity Search at Flickr

We're introducing large-scale visual similarity search on our photosharing community Flickr, a feature powered by deep neural networks. Our Computer Vision team uses a state-of-the-art approximate nearest neighbor algorithm, Locally Optimized Product Quantization (LOPQ), to compute distances on and store high-dimensional floating point feature vectors for an index of billions of photos.