Hadoop Summit - March 25, 2008
The Hadoop Summit brought together leaders from the Hadoop developer and user communities for the first time. Apache Hadoop, an open-source distributed computing project of the Apache Software Foundation, is a distributed file system and parallel execution environment that enables its users to process massive amounts of data. Slides and video from the presentations are available below.
Hadoop Overview: Doug Cutting / Eric Baldeschwieler, Yahoo! - Slides - Video
Pig: Chris Olston, Yahoo! - Slides
JAQL: Kevin Beyer, IBM - Slides - Video
DryadLINQ: Michael Isard, Microsoft - Slides - Video
Monitoring Hadoop using X-Trace: Andy Konwinski, UC Berkeley - Slides - Video
Zookeeper: Ben Reed, Yahoo! - Slides - Video
Hbase: Michael Stack, Powerset - Slides - Video
Hbase at Rapleaf: Bryan Duxbury, Rapleaf - Slides - Video
Hive: Joydeep Sen Sarma / Ashish Thusoo, Facebook - Slides - Video
GrepTheWeb- Hadoop on AWS: Jinesh Varia, Amazon - Slides - Video
Building Ground Models of Southern California: Steve Schlosser / David O'Hallaron, Intel / CMU - Slides - Video
Online search for engineering design content: Mike Haley, Autodesk - Slides - Video
Yahoo – Webmap: Christian Kunz, Yahoo! - Slides - Video
Natural language Processing: Jimmy Lin, U of Maryland / Christophe Bisciglia, Google - Slides - Video
Panel on future directions: Sameer Paranjpye, Sanjay Radia, Owen O’Malley (Yahoo), Chad Walters (Powerset), Jeff Eastman (Mahout) - Video
Data-Intensive Computing Symposium - March 26, 2008
Hosted by Yahoo! and the CCC, the Data-Intensive Computing Symposium brought together experts in system design, programming, parallel algorithms, data management, scientific applications, and information-based applications to better understand existing capabilities in the development and application of large-scale computing systems, and to explore future opportunities. Slides and video from the presentations are available below.
Data-Intensive Scalable Computing: Randy Bryant, Carnegie Mellon - Slides - Video
Text Information Management: Challenges and Opportunities: ChengXiang Zhai, University of Illinois at Urbana-Champaign - Slides - Video
Clouds and ManyCore: The Revolution: Dan Reed, Microsoft Research - Slides - Video
Computational Paradigms for Genomic Medicine: Jill Mesirov, Broad Institute of MIT and Harvard - Video
Simplicity and Complexity in Data Systems at Scale: Garth Gibson, Carnegie Mellon - Slides - Video
Handling Large Datasets at Google: Current Systems and Future Directions: Jeff Dean, Google - Slides - Video
Algorithmic Perspectives on Large-Scale Social Network Data: Jon Kleinberg, Cornell - Slides - Video
Mining the Web Graph: Marc Najork, Microsoft Research - Slides - Video
"What" Goes Around: Joe Hellerstein, U.C. Berkeley - Video
Sherpa: Hosted Data Serving: Raghu Ramakrishnan, Yahoo! Research - Slides - Video
Scientific Applications of Large Databases: Alex Szalay, Johns Hopkins - Slides - Video
Data-Rich Computing: Where It's At : Phil Gibbons, Intel Research - Slides - Video
NSF Plans for Supporting Data Intensive Computing: Jeannette Wing, NSF - Slides - Video
The Google/IBM data center: Christophe Bisciglia, Google - Video
The Computing Community Consortium: Stimulating Bigger Thinking: Ed Lazowska, University of Washington and CCC - Slides