Ihab Ilyas presents "Data Cleaning from Theory to Practice"

Ihab Ilyas

Title: "Data Cleaning from Theory to Practice" ABSTRACT
          
With decades of research on the various aspects of data cleaning, multiple technical challenges have been tackled and interesting results have been published in many research papers. Example quality problems include missing values, functional dependency violations and duplicate records. Unfortunately, very little success can be claimed in adopting any of these results in practice. Businesses and enterprises are building silos of home-grown data curation solutions under various names, often referred to as ETL layers in the business intelligence stack. The impedance mismatch between the challenges faced in industry and the challenges tackled in research papers explain to a large extent the growing gap between the two worlds. In this talk I claim that being pragmatic in developing data cleaning solutions does not necessarily mean being unprincipled or ad-hoc. I discuss a subset of these practical challenges including data ownership, human involvement, and holistic data quality concerns. These new sets of challenges often hinder current research proposals from being adopted in the real world. I also go through a quick overview of the approach we use in tamr (a data curation startup) to tackle these challenges.

BIOGRAPHICAL NOTE

Ihab Ilyas is a Professor of Computer Science at the University of Waterloo. He received his PhD in computer science from Purdue University, West Lafayette in 2004. He holds BS and MS degrees in computer science from Alexandria University. His main research is in the area of database systems, with special interest in data quality, managing uncertain data, rank-aware query processing, and Information Extraction. From 2011 to 2013 he has been on leave leading the Data Analytics Group at the Qatar Computing Research Institute. He spent two summers with IBM Almaden Research Center and he is an IBM CAS faculty fellow since January 2006. Ihab is a recipient of the Ontario Early Researcher Award in 2008, the David R. Cheriton Faculty Fellowship in 2013, and the NSERC Discovery Accelerator Award in 2014. Ihab is a co-founder of Tamr, a startup focusing on large-scale data integration and curation.