Big Thinker Andrew McCallum at Yahoo

Andrew McCallum

Dr. Andrew McCallum, Professor and Director of the Information Extraction and Synthesis Laboratory in the School of Computer Science at University of Massachusetts Amherst, is coming to Yahoo. His talk is entitled: "Probabilistic Databases for Large-scale Knowledge-base Construction" Watch LIVE on the Yahoo Labs website homepage and ask questions or comment on the Yahoo Labs Facebook and Twitter (#BigThinkers) pages.


When building large-scale knowledge bases we want to account for uncertainty in order to perform joint inference and accurately integrate new evidence. However, reasoning about data at this scale quickly involves more random variables than can fit in machine memory. For this reason we have become interested in probabilistic databases, which we use not only for storing and querying the results of an information extraction (IE) system, but also for aiding the performance of IE joint inference itself---managing the many random variables and intermediate results of IE. In this approach only raw textual and tabular evidence is presented to the database, and IE inference is performed "inside the database." Thus we have taken to calling this an Epistemological Database, indicating that the database doesn’t directly observe the truth about entities and relations; it must infer the truth from available evidence [VLDB 2010; AKBC 2012]. After describing these ideas I will present two pieces of recent work: first, large-scale, non-greedy, Monte Carlo entity resolution running with distributed processing, which also supports probabilistic reasoning about crowd-sourced human edits; and second, an approach to "schema-less" relation extraction based on tensor factorization which we call "universal schema." All of the above are implemented on top of our probabilistic programming framework FACTORIE, a Scala library for factor graphs and natural language processing.
Joint work with Michael Wick, Sameer Singh, Karl Schultz, Sebastian Riedel, Limin Yao, Ari Kobren, Luke Vilnis and Gerome Miklau.


Andrew McCallum is a Professor and Director of the Information Extraction and Synthesis Laboratory in the School of Computer Science at University of Massachusetts Amherst. He has published over 250 papers in many areas of AI, including natural language processing, machine learning, data mining and reinforcement learning, and his work has received over 38,000 citations. He obtained his PhD from the University of Rochester in 1995 with Dana Ballard and a postdoctoral fellowship from CMU with Tom Mitchell and Sebastian Thrun. In the early 2000's he was Vice President of Research and Development at WhizBang Labs, a 170-person start-up company that used machine learning for information extraction from the Web. He is a AAAI Fellow, the recipient of the UMass Chancellor's Award for Research and Creative Activity, the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and research awards from Google, IBM, Yahoo, and Microsoft. He was the General Chair for the International Conference on Machine Learning (ICML) 2012, and is the current president of the International Machine Learning Society, as well as member of the editorial board of the Journal of Machine Learning Research. For the past ten years, McCallum has been active in research on statistical machine learning applied to text, especially
information extraction, entity resolution, semi-supervised learning, topic models, and social network analysis. His work on open peer review can be found at McCallum's web page is

YAHOO LABS BIG THINKERS SPEAKER SERIES Yahoo Labs is proud to bring you its 2015 Big Thinkers Speaker Series. Each year, some of the most influential, accomplished experts from the research community visit our campus to share their insights on topics that are significant to Yahoo. These distinctive speakers are shaping the future of the new sciences underlying the Web and are guaranteed to inform, enlighten, and inspire.