Dr. Jiawei Han Discusses How "Big Data Needs Big Structure"

May 23, 2014

Jiawei Han - Big Thinkers On Thursday, we were honored to have had Dr. Jiawei Han, Abel Bliss Professor of Computer Science at the University of Illinois at Urbana-Champaign, present a Big Thinkers talk at Yahoo entitled, "Construction, Exploration and Mining of Semi-Structured, Heterogeneous Information Networks." Dr. Han's presentation focused on the necessity of mining typed, heterogeneous information networks in order to uncover considerable knowledge from interconnected data. Dr. Han talked about how when mining heterogeneous information networks, one needs to treat each term as "first-class citizen" (just diff types); how in heterogeneous information networks, different meta-paths carry rather different semantics (giving diff results); and how meta path relationships among similar-typed links share similar semantics and are comparable and inferable. The event was broadcast live on our labs.yahoo.com homepage and viewers had the opportunity to ask questions and comment on our Twitter stream @YahooLabs as well as our Facebook page. You can view Dr. Han's full presentation here: ABSTRACT People and informational objects are interconnected, forming gigantic, interconnected, integrated information networks.  By structuring these data objects into multiple types, such networks become semi-structured heterogeneous information networks.  Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, semi-structured, heterogeneous information networks.  For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks.  Effective construction, exploration and analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. In this talk, we first present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. Departing from many existing network models that view data as homogeneous graphs or networks, the semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and can uncover surprisingly rich knowledge from interconnected data. This heterogeneous network modeling will lead to the discovery of a set of new principles and methodologies for mining and exploring interconnected data, such as rank-based clustering and classification, meta path-based similarity search, and meta path-based link/relationship prediction.  Then we discuss our recent progress on the construction of quality semi-structured heterogeneous information networks from unstructured data.  We will also point out some promising research directions in this domain. BIOGRAPHICAL NOTE Jiawei Han is an Abel Bliss Professor of Computer Science at the University of Illinois at Urbana-Champaign.  He has been researching data mining, information network analysis, database systems, and data warehousing, with over 600 journal and conference publications. He has chaired or served on many program committees of international conferences, including PC co-chair for KDD, SDM, and ICDM conferences, and Americas Coordinator for VLDB conferences. He also served as the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data and as the Director of Information Network Academic Research Center supported by the U.S. Army Research Lab.  He is a Fellow of the ACM and Fellow of the IEEE, and received the 2004 ACM SIGKDD Innovations Award, 2005 IEEE Computer Society Technical Achievement Award, 2009 IEEE Computer Society Wallace McDowell Award, and the 2011 Daniel C. Drucker Eminent Faculty Award at UIUC. His book, "Data Mining: Concepts and Techniques" has been used popularly as a textbook worldwide. YAHOO LABS BIG THINKERS SPEAKER SERIES Yahoo Labs is proud to bring you its 2014 Big Thinkers Speaker Series. Each year, some of the most influential, accomplished experts from the research community visit our campus to share their insights on topics that are significant to Yahoo. These distinctive speakers are shaping the future of the new sciences underlying the Web and are guaranteed to inform, enlighten, and inspire.