|
|
Campus Seminar Series
One of the many ways Yahoo! Academic Relations (AR) interacts with top universities across the U.S. to sponsor seminar series in academic areas of high interest to Yahoo! scientists, engineers, and business people. Talks in these series are given by world-class faculty members, PhD students, and industry experts.
With an eye towards maximizing the value of these series, AR is piloting a program under which we have arranged to capture the seminar talks we sponsor in several of the premier CS programs in the country (Carnegie Mellon University, MIT, and the University of Illinois at Urbana-Champaign) as well as in the Kellogg Business School at Northwestern University and archive the talks on this site so they may be accessed at any time. Future plans call for expansion of the pilot program to include all seminars at all the institutions with which AR maintains strategic relationships.
We hope you find these talks of value and are very interested in hearing any feedback and/or suggestions you have. Please submit any and all comments to academicrelations@yahoo-inc.com.
Carnegie Mellon University Series - Language Technologies Institute Weekly Seminar
Description: The Language Technologies Institute (LTI) at Carnegie Mellon University (CMU) conducts extensive research on Computational Linguistics, Machine Translation, Speech Recognition and Synthesis, Information Retrieval, Computational Biology, Machine Learning, Text Mining, Knowledge Representation, Computer-Assisted Language Learning and Intelligent Language Tutoring. Seminar talks are given by faculty and grad students from the LTI as well as by outside speakers from academia and industry.
-
Talk Title: A model of turn-taking in task-oriented dialogue
Date: February 20, 2009
Abstract: As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user appear to play an important role in system usability. This study explores that issue in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings --and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention.
Speaker: Agustin Gravano, Columbia University
Bio: Agustin Gravano recently defended his Ph.D. thesis in the Computer Science Department at Columbia University, New York. In 2001, he had earned his B.S. in Computer Science from the University of Buenos Aires, Argentina, where he will be returning in July 2009. His main research topic is prosodic variation in spoken dialogue, aimed at improving the models used in interactive voice response systems, both for understanding the user's input and for generating natural responses.
Watch the video of the talk here
Massachusetts Institute of Technology - Yahoo!/MIT EECS HCI-IR Seminar Series
Description: The Yahoo/MIT EECS HCI-IR Seminar Series is a monthly series of speakers on topics at the intersection of human-computer interaction and information retrieval. Topics of interest include novel interaction techniques, interactive information retrieval, exploratory search, information visualization, and field studies and user studies of information retrieval needs. Seminars are generally held on the first Tuesday morning of each month.
-
Talk Title: Set Retrieval 2.0
Date: November 4, 2008
Abstract: The earliest information retrieval systems were set retrieval systems, also known as Boolean retrieval systems because they expected users to enter queries as Boolean expressions. While set retrieval still survives in professional search applications, it has been largely supplanted by best-match or ranked retrieval familiar to anyone who has used web search. Best-match retrieval offers several advantages, the most salient being that it does not require users to be professionally trained. But one of its significant disadvantages is a loss of transparency. Users make a leap of faith that the ranking algorithm works, and then resign themselves to trying again when they are not satisfied with their search results. What we need is a retrieval approach that combines the best of both worlds, providing transparency but not requiring professional training. We find such an approach in the emerging field of human-computer information retrieval (HCIR), which conceives information seeking as a dialogue between the user and the system. This presentation will outline the principles of information seeking as a dialogue and walk though concrete examples that illustrate the principles of HCIR. The foundation is an interactive set retrieval approach that responds to queries with an overview of the user's current context and an organized set of options for incremental exploration. Contextual summaries of document sets optimize system's communication with user, while query refinement options optimize user's communication with system. By enabling bidirectional communication between the user and the system, we can address the inherent limitations of best-match approaches.
Speaker: Daniel Tunkelang, Endeca
Bio: Daniel Tunkelang is co-founder and Chief Scientist of Endeca, a provider of enterprise information access solutions. He leads Endeca's efforts to develop features and capabilities that emphasize user interaction and is a leading industry advocate of dialog-oriented approaches to information retrieval. He publishes The Noisy Channel, a blog about HCIR and related issues.
Watch the video of the talk here
-
Talk Title: Augmented Social Cognition: Using Web2.0 technology to enhance the ability of groups to remember, think, and reason
Date: February 3, 2009
Abstract: We are experiencing the new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by Wikipedia and del.icio.us, Web 2.0 environments are turning people into social information foragers and sharers. Users interact to resolve conflicts and jointly make sense of topic areas from "Obama vs. Clinton" to "Islam." PARC's Augmented Social Cognition researchers -- who come from cognitive psychology, computer science, HCI, sociology, and other disciplines -- focus on understanding how to "enhance a group of people's ability to remember, think, and reason". Through Web 2.0 systems like social tagging, blogs, Wikis, and more, we can finally study, in detail, these types of enhancements on a very large scale. In this talk, we summarize recent PARC work and early findings on: (1) how conflict and coordination have played out in Wikipedia, and how social transparency might affect reader trust; (2) how decreasing interaction costs might change participation in social tagging systems; and (3) how computation can help organize user-generated content and metadata.
Speaker: Ed Chi, PARC
Bio: Ed H. Chi is area manager and senior research scientist at Palo Alto Research Center's Augmented Social Cognition Group. He leads the group in understanding how Web2.0 and Social Computing systems help groups of people to remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota, and has been doing research on user interface software systems since 1993. He has been featured and quoted in the press, such as the Economist, Time Magazine, LA Times, and the Associated Press. With 19 patents and over 50 research articles, his most well-known past project is the study of Information Scent --- understanding how users navigate and understand the Web and information environments. He has also worked on computational molecular biology, ubicomp, and recommendation/search engines. He has won awards for both teaching and research. In his spare time, Ed is an avid Taekwondo martial artist, photographer, and snowboarder.
Watch the video of the talk here
-
Talk Title: Geo-Sensitive Queries in Web Search Ranking
Date: March 3, 2009
Abstract: With the fast penetration of the Web throughout the world, the number of search users has increased dramatically from many geographic locations. Search engines are now facing the problem of `localization' of search results, i.e. displaying local results at higher rank when the query is geographically sensitive. Identifying geo-sensitive queries is important to meet users' needs. With the recent surge in the volume of search queries that explicitly or implicitly express users' geographical interests, to accurately infer users' locality preference becomes an increasingly important yet challenging issue. In this talk, I present some models to identify the distribution of such geographical interests by mining the user click stream data in the search engine logs, and address important issues in spatial Web search. Finally, the model is adapted to generate meaningful relevance features for search ranking. We evaluated our proposals on a large dataset from the Yahoo! Search query logs, and report our findings.
Speaker: Belle Tseng, Yahoo Labs
Bio: Dr. Belle Tseng is a Senior Manager in the Web Search Ranking Department of Yahoo. She leads a R&D team of researchers with strong background in information retrieval and machine learning to improve the search relevance of Yahoo search engines across the world. Belle is an alumnus of MIT receiving her B.S. and M.S. in Mathematics and Electrical Engineering from MIT in 1992. She then received her Ph.D. in Electrical Engineering from Columbia University in 1996. Before joining Yahoo, Belle spent four years as a Senior Research Staff Member at NEC Laboratories America where she manages research projects on relational data mining and social network analysis. Prior to joining NEC, she spent seven years as a Research Staff Member at IBM T. J. Watson Research Center working on multimedia retrieval, personalization, and summarization.
Dr. Tseng published over 100 technical papers, holds five US patents, and has more than 10 patents pending in the area of web search, multimedia understanding, stereoscopic system, and social information analysis. She serves on the Organizing Committees of ICME 2009, ICWSM 2008, ICDE 2008, and on several Technical Program Committees. She is a receipt of the NSF Fellowship, the IBM Invention Achievement Awards, and the NEC Technology Commercialization Award.
Watch the video of the talk here
-
Talk Title: Putting our digital information in its place: Lessons learned from fieldwork and prototyping in the Keeping Found Things Found project
Date: May 5, 2009
Abstract: Does place matter for digital information? If so, how? Research points to the importance of "place-like" senses of direction, context, connection and control when managing digital information. Support for place in the Personal Project Planner prototype begins with the idea that relevant information can be located with reference to a simple planning document. This document works as a light-weight, editable overlay to existing applications and the stores of information managed by these applications. A basic premise of the Planner is that effective management of personal information can leverage and emerge from informal planning and other everyday activities.
Speaker: William Jones, University of Washington
Bio: William Jones is a Research Associate Professor in the Information School at the University of Washington where he manages the Keeping Found Things Found group (kftf.ischool.washington.edu). He has published in the areas of personal information management (PIM), human-computer interaction, information retrieval and cognitive psychology. Prof. Jones wrote the book "Keeping Found Things Found: The Study and Practice of Personal Information Management" and also edited the book "Personal Information Management" (with co-editor Jaime Teevan). He holds several patents relating to search and PIM from his work as a program manager at Microsoft in Office and in MSN Search. Prof. Jones received his doctorate from Carnegie-Mellon University for research into human memory.
Watch the video of the talk here
University of Illinois at Urbana-Champaign Series - The Yahoo!-Data and Information Systems (DAIS) Seminar
Description: The Yahoo!-DAIS Seminar will be held on Tuesdays at 4 PM in 3403 SC. As in other semesters, we will have a few visiting speakers who must be scheduled at a different day or time, due to their travel schedules. Students who take the DAIS Seminar for credit can miss up to two seminars. Speakers are announced on the DAIS mailing list (as are other items of interest to the DAIS community).
-
Talk Title: Towards Contextual Text Mining
Date: February 10, 2009
Abstract: Text is generally associated with all kinds of contextual information. Contextual information can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when he/she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of patterns of topics over different contexts. For instance, analysis of search logs in the context of users can reveal how we can improve the quality of a commercial search engine by optimizing the search results according to particular users, while analysis of text in the context of a social network can facilitate discovery of more meaningful topical communities. Since contextual information affects significantly the choices of topics and words made by authors, in general, it is very important to incorporate it in analyzing and mining text data. In this talk, I will present a new paradigm of text mining, called contextual text mining, where context is treated as a "first-class citizen." I will introduce general ways of modeling and analyzing various kinds of context in text, including simple context, implicit context, and complex context, in the framework of probabilistic language models. I will show the effectiveness of these general contextual text mining techniques with a few sample applications in web search and information retrieval.
Speaker: Qiaozhu Mei
Bio: Qiaozhu Mei is a Ph.D. candidate of Department of Computer Science at the University of Illinois at Urbana-Champaign. He has broad research interests in text information management, especially text mining and information retrieval with probabilistic models. He has published extensively in these areas, and has received the Best Student Paper Runner-Up Awards of ACM KDD 2006 and ACM KDD 2007. He is also one of the five recipients of the inaugural Yahoo! Ph.D. Student Fellowship.
Watch the video of the talk here
-
Talk Title: Leveraging Code Comments to Improve Software Reliability
Date: February 24, 2009
Abstract: Software reliability is critically important. This work focuses on addressing fundamental challenges of software reliability: obtaining accurate program specifications and discovering development tools/ languages limitations. In this talk, I will show that comments provide a great data source for obtaining important information, including specifications and problems of current tools/languages. I will present two new approaches, iComment and cComment, to take advantage of underutilized comments to improve software reliability. iComment automatically extracts specifications from comments to detect comment-code inconsistencies, i.e., software bugs and bad comments. Our evaluation on large real-world software including the Linux kernel, Mozilla, and Apache and 2 types of comments shows that iComment effectively extracted 1832 specifications and detected 60 new bugs and bad comments. cComment studies comment semantics and characteristics to further understand what other comments can be utilized, how we can utilize them, and what important problems/ limitations they reveal. We discovered many interesting findings that can guide the design of new languages and tools for improving reliability, programmer productivity, software evolution, etc. iComment and cComment combine techniques from different areas, including natural language processing (NLP), machine learning, information retrieval, program analysis and statistics.
Speaker: Lin Tan
Watch the video of the talk here
-
Talk Title: Information Technology and Intelligent Transportation - A Marriage Made in Heaven
Date: March 5, 2009
Abstract: I will describe our NSF-sponsored IGERT PhD program in Computational Transportation Science. Computational transportation scientists will develop the next generation of intelligent transportation systems, aimed at addressing inefficiencies that cause excessive environmental pollution, fuel consumption, risk to public safety, and congestion. The trainees investigate information technologies in which millions of sensors, mobile devices such as PDA's, in-vehicle computers, and computers in the static infrastructure are integrated into a collaborative environment. Basic research in information management, communications, software architectures, modeling tools, human factors, traffic prediction, and transportation planning is being conducted to found the new discipline of Computational Transportation Science (CTS).
Speaker: Ouri Wolfson, University of Illinois, Chicago
Bio: Ouri Wolfson's main research interests are in database systems, distributed systems, and mobile/pervasive computing. He is currently the Richard and Loan Hill Professor of Computer Science at the University of Illinois at Chicago, and an affiliate professor at the University of Illinois at Urbana Champaign. He is also the founder and former Chief Scientist of Mobitrac, a high-tech startup company that had about forty employees before being acquired. Before joining the University of Illinois he has been on the computer science faculty at the Technion, Columbia University, and he has been a Member of Technical Staff at Bell Laboratories. Ouri Wolfson authored over 150 publications, and holds six patents. He is a Fellow of the Association of Computing Machinery, and serves on the editorial boards of the IEEE Transactions on Mobile Computing, J. Ross Publishing Transportation Letters: The International Journal of Transportation Research, and the Springer's Wireless Networks Journal. He received the best paper award for "Opportunistic Resource Exchange in Inter-vehicle Ad Hoc Networks", at the 2004 Mobile Data Management Conference.
Watch the video of the talk here
-
Talk Title: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis
Date: March 10, 2009
Abstract: As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well. In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.
Speaker: Yizhou Sun
Watch the video of the talk here
-
Talk Title: Text Information Management: Challenges and Opportunities
Date: March 31, 2009
Abstract: Recent years have seen an explosive growth of text data in multiple domains, notably on the Web, demanding powerful tools for managing and exploiting text information. While relatively mature technologies have been developed for managing structured data by the database community, there are still many challenges to be solved in managing the unstructured text data even though a lot of research progress has been made by the information retrieval community in the past decades. Due to the difficulty in precisely understanding natural language and users' information needs, text information management poses significant challenges and requires collaborative research by multiple communities especially information retrieval, natural language processing, databases, machine learning, and data mining. In this talk, I will review the state of the art of text information management and discuss the major challenges in developing general frameworks, algorithms, and systems for managing text information effectively and efficiently. I will present several interdisciplinary research directions where multiple communities can be expected to collaborate with each other to generate high impact research results.
Speaker: ChengXiang Zhai
Bio:ChengXiang Zhai is an Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he also holds a joint appointment at the Institute for Genomic Biology, Statistics, and the Graduate School of Library and Information Science. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and, later, a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and bioinformatics. He serves on the editorial boards of ACM Transactions on Information Systems and Information Retrieval Journal , and is a program co-chair of ACM CIKM 2004 , NAACL HLT 2007, and ACM SIGIR 2009. He received the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), the ACM SIGIR 2004 Best Paper Award, and an Alfred P. Sloan Research Fellowship in 2008.
Watch the video of the talk here
-
Talk Title: Privacy - from accessing databases to location based services
Date: April 7, 2009
Abstract: Over the last years it has become apparent that privacy issues become more and more important when accessing data sources either on the Web or by database management systems. That is, the user does not only want to hide the query, but also the result of that query from others. In the past the problem of querying a database privately was solved by organizational rather than by technical means. In this talk we describe the problem of querying databases privately more formally and discuss existing solutions from the area of private information retrieval (PIR). The lack of efficiency and scalability motivated us look for alternative approaches using a so called ¡°secure co-processor¡± (built by IBM). We introduce a set of algorithms that take advantage of the (physical) properties of the co-processor and show which algorithms are necessary to guarantee privacy for database queries. In the last part of my talk I briefly describe our vision how to extend the current privacy approach to location-based services, in particular to moving objects such as vehicles (cars).
Speaker: Johann-Christoph Freytag (Humboldt-University)
Bio: Johann-Christoph Freytag is a full professor for databases and information systems (DBIS) at the Computer Science Department of Humboldt University in Berlin, Germany. Before joining the department in 1994 he was a research staff member at the IBM Almaden Research Center (1985-1987), a researcher at the European Computer-Industry-Research Centre (ECRC, in Munich, Germany, 1987-1989), and the head of Digital's (DEC) Database Technology Center (also in Munich, 1990-1993). He holds a Ph.D. in Applied Mathematics/Computer Science from Harvard University, MA. Dr. Freytag's research interests include all aspects of query processing and query optimization in object-relational database systems, new developments in the database area (such as semi-structured data, data quality, databases and security, Semantic Web), privacy in database systems, mobile systems and mobility, and applying database technology to applications such as GIS, genomics, and bioinformatics/life science. Dr. Freytag spent two sabbaticals at IBM Research and IBM Development (1997, 2001) and was a regular visitor of Microsoft Research and the SQL Server group, Redmond, as a research scientist (2002, 2005, 2007, 2008). In the last years he received the IBM Faculty Award 4 times for collaborative work in the areas of databases, middleware, and bioinformatics/life science. He was a member of the VLDB Endowment until 2007 organizing VLDB 2003 in Berlin. He heads the German database interest group of the GI (Gesellschaft fur Informatik) since 2007.
Watch the video of the talk here
-
Talk Title: Private Queries in Location Based Services: Anonymizers are not Necessary
Date: April 21, 2009
Abstract: Mobile devices equipped with positioning capabilities (e.g., GPS) can ask location-dependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several drawbacks: (i) All users must trust the third party anonymizer, which is a single point of attack. (ii) A large number of cooperating, trustworthy users is needed. (iii) Privacy is guaranteed only for a single snapshot of user locations; users are not protected against correlation attacks (e.g., history of user movement). We propose a novel framework to support private location-dependent queries, based on the theoretical work on Private Information Retrieval (PIR). Our framework does not require a trusted third party, since privacy is achieved via cryptographic techniques. Compared to existing work, our approach achieves stronger privacy for snapshots of user locations; moreover, it is the first to provide provable privacy guarantees against correlation attacks. We use our framework to implement approximate and exact algorithms for nearest-neighbor search. We optimize query execution by employing data mining techniques, which identify redundant computations. Contrary to common belief, the experimental results suggest that PIR approaches incur reasonable overhead and are applicable in practice.
Speaker: Gabriel Ghinita, Purdue University
Bio: Gabriel Ghinita is currently a Postdoctoral Research Associate with the Dept. of Computer Science, Purdue University. He holds a a PhD degree in Computer Science from the National University of Singapore. Gabriel's research interests focus on access control for collaborative environments, and privacy for spatial and relational data. In the past, he held visiting scientist appointments with the Hong Kong University, and the Chinese University of Hong Kong. Gabriel served as invited reviewer for prestigious conferences and journals, such as VLDB, ICDE, TKDE and ACM GIS.
Watch the video of the talk here
-
Talk Title: IFake Picassos, Tampered History, and Digital Forgery: Protecting the Genealogy of Bits with Secure Provenance
Date: May 5, 2009
Abstract: As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. While significant research has been conducted in this area, the associated security and privacy issues have not been explored, leaving provenance information vulnerable to illicit alteration as it passes through untrusted environments. In this talk, we show how to provide strong integrity and confidentiality assurances for data provenance information in an untrusted distributed environment. We describe our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes it extremely easy to deploy. We present empirical results that show that, for typical real-life workloads, the run-time overhead of our approach to recording provenance with confidentiality and integrity guarantees ranges from 1% - 13%. For more details, please refer to http://dais.cs.uiuc.edu/provenance
Speaker: Ragib Hasan
Bio:I am a PhD candidate at the Department of Computer Science, University of Illinois at Urbana-Champaign. My advisor is Prof. Marianne Winslett. I am also co-advised by Prof. Radu Sion of SUNY-StonyBrook. I am primarily interested in Computer Security, Secure Provenance, Trust Management, and Storage Systems research. My PhD dissertation topic is related to protecting the past (secure provenance,) and present (term-immutability) of data. I got a Bachelor of Computer Science and Engineering from Bangladesh University of Engineering and Technology (BUET) in 2003. I graduated Summa-Cum-Laude in my class and received the Chancellor Award, given by the Prime Minister of Bangladesh, for securing the highest CGPA in my year. I am an active Wikipedia editor and an administrator there since August 2005. I also write technical articles, one of which, on the History of Linux, has been translated to 8 languages and published in mainstream magazines.
Watch the video of the talk here
-
Talk Title: Web-scale Integration, Web-scale Inspiration
Date: September 8, 2009
Speaker: Kevin Chen-Chuan Chang
Bio:I am an Associate Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. I received a Ph.D. in Electrical Engineering (and an M.S. in Computer Science) from Stanford University in 2001, and a B.S. in Electrical Engineering from National Taiwan University. My research focuses on Web-scale information integration and data retrieval: 1) information integration: dynamic integration of myriad Web sources, and 2) data retrieval: ad-hoc retrieval of structured data for Web-based databases
Watch the video of the talk here
-
Talk Title: Data-oriented Content Query System: Searching for Data into Text on the Web
Date: January 19, 2010
Speaker: MianWei Zhou
Bio: MianWei Zhou is a graduate student of Professor Kevin Chang.
Watch the video of the talk here
Yahoo! Hadoop Training (UIUC, February 13, 2009)
Instructors Milind A Bhandarkar and Viraj Bhat from Yahoo’s Grid Computing Group present an overview Hadoop, PIG, and practical problem solving using Hadoop.
Watch the video of the talk here
|