Reinventing Mail Search and Solving the Catch 22 Dilemma Between Precision and Recall

How many times have you tried to search for an email and not been able to find it? You might think that getting the right email search result is as easy as getting a good search result on the Web, where you often find what you’re looking for on the first try. Unfortunately, experience tells us otherwise. The reason? Mail search is, perhaps surprisingly, an entirely different animal. At Yahoo Research, we’ve been working hard at changing this situation.

Adapting to the Evolution of Email with Machine-Generated Mail Mining

As a result of recent research, we know that machines account for 90% of all mail traffic. These machine-generated messages, whether they are purchase receipts, flight reservations, or something else, contain loads of personal information. At Yahoo Research, we have developed a new classifying technology that distinguishes between human- and machine-generated mail that has allowed us to unpack that data in a meaningful ways to advance the mail experience for our users.

Introducing Similarity Search at Flickr

We're introducing large-scale visual similarity search on our photosharing community Flickr, a feature powered by deep neural networks. Our Computer Vision team uses a state-of-the-art approximate nearest neighbor algorithm, Locally Optimized Product Quantization (LOPQ), to compute distances on and store high-dimensional floating point feature vectors for an index of billions of photos.

Yoelle Maarek to Deliver Keynote at WWW 2017

Yoelle Maarek, Vice President of Research at Yahoo, will deliver a keynote address at the 16th International World Wide Web Conference taking place in Perth, Australia from 3-7 April 2017.

Researching the Future of Automated Question-Answering

Our Yahoo Research Text Mining team in Haifa is working to advance the state-of-the-art in question-answering, which, among other things, will lead to conversational bots appearing more human. Guess who came out on top in their most recent experiment: humans or algorithms?

Presenting an Open Source Toolkit for Lightweight Multilingual Entity Linking

We just open sourced Fast Entity Linker, our unsupervised, accurate, and extensible multilingual named entity recognition and linking system, along with datapacks for English, Spanish, and Chinese. Fast Entity Linker is one of only three freely-available multilingual named entity recognition and linking systems. The system achieves a low memory footprint and fast execution times by using compressed data structures and aggressive hashing functions.

10 Years of Hadoop and its Israeli Pioneering Researchers

The Yahoo Research Scalable Platforms team in Haifa collectively adds many years of experience to Yahoo and the Hadoop community in distributed computing research and development. The team specializes in scalability and high availability, arguably the biggest challenges in big data platforms. We profile the team and their impact in the Israeli research and engineering community.

Open Sourcing a Deep Learning Solution for Detecting NSFW Images

To the best of our knowledge, there is no open source model or algorithm for identifying NSFW images. In the spirit of collaboration and with the hope of advancing this endeavor, we are releasing our deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.

Bart Thomée Receives 2016 ACM SIGMM Rising Star Award

The ACM Special Interest Group on Multimedia (SIGMM) has announced that research scientist Bart Thomée has been presented their 2016 Rising Star Award for his "significant contributions in the areas of geo-multimedia computing, media evaluation, and open research datasets."

Creating Animated GIFs Automatically from Video

Various websites offer easy-to-use tools to manually generate GIFs from portions of a video. In collaboration with ETH Zürich, we’ve gone one step further by developing a system that automatically generates animated GIFs from the most “GIFable” video segments.

Pages