Researching the Future of Automated Question-Answering

Our Yahoo Research Text Mining team in Haifa is working to advance the state-of-the-art in question-answering, which, among other things, will lead to conversational bots appearing more human. Guess who came out on top in their most recent experiment: humans or algorithms?

Presenting an Open Source Toolkit for Lightweight Multilingual Entity Linking

We just open sourced Fast Entity Linker, our unsupervised, accurate, and extensible multilingual named entity recognition and linking system, along with datapacks for English, Spanish, and Chinese. Fast Entity Linker is one of only three freely-available multilingual named entity recognition and linking systems. The system achieves a low memory footprint and fast execution times by using compressed data structures and aggressive hashing functions.

10 Years of Hadoop and its Israeli Pioneering Researchers

The Yahoo Research Scalable Platforms team in Haifa collectively adds many years of experience to Yahoo and the Hadoop community in distributed computing research and development. The team specializes in scalability and high availability, arguably the biggest challenges in big data platforms. We profile the team and their impact in the Israeli research and engineering community.

Open Sourcing a Deep Learning Solution for Detecting NSFW Images

To the best of our knowledge, there is no open source model or algorithm for identifying NSFW images. In the spirit of collaboration and with the hope of advancing this endeavor, we are releasing our deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.

Bart Thomée Receives 2016 ACM SIGMM Rising Star Award

The ACM Special Interest Group on Multimedia (SIGMM) has announced that research scientist Bart Thomée has been presented their 2016 Rising Star Award for his "significant contributions in the areas of geo-multimedia computing, media evaluation, and open research datasets."

Creating Animated GIFs Automatically from Video

Various websites offer easy-to-use tools to manually generate GIFs from portions of a video. In collaboration with ETH Zürich, we’ve gone one step further by developing a system that automatically generates animated GIFs from the most “GIFable” video segments.

Open Sourcing SparkADMM: a Massively-parallel Framework for Solving Big Data Problems

We've just published SparkADMM – a massively parallel abstract programming framework for solving big data optimization problems through ADMM over Apache Spark – to the ‪Open Source‬ community on GitHub. The implementation allows for the quick deployment of ADMM solvers over ‪Spark‬, without any prior knowledge about ADMM for consensus.

Science Powering Product: Large-scale Query-to-Ad Matching in Sponsored Search

In a yearlong effort, a team of our research scientists and engineers who specialize in machine learning created an advanced query-to-ad matching model for sponsored search. Today, search2vec accounts for more than 30% of all broad match impressions and revenue on Yahoo Search! Their new blog post provides highlights of their SIGIR 2016 paper describing search2vec and presents a new dataset available to researchers.

EURO 2016 According to the Science of Tumblr

The EURO 2016 football tournament begins this week and we thought it would be fun to try to predict the results through a combination of big data, science, and social media. We used our expertise and unique access to data from Tumblr – as well as Yahoo Sports – to draw conclusions on who will be crowned the champions.

Promoting a Culture of Learning with Research

At Yahoo, we encourage a culture of learning, both personally and professionally, internally and externally. Yahoo Research, in particular, often takes observations and shares them with the academic community. At the same time, we look to the academic community for revelations to share amongst Yahoos. It is in this open spirit we present a Big Thinkers talk with Dr. Marti Hearst, a luminary in the fields of Natural Language Processing (NLP) and Search, covering fascinating new insights on learning in Massive Open Online Courses (MOOCs).