Gearing for Exabyte Storage with Hadoop Distributed Filesystem

Dec 31, 1969

The expectations from scaling Hadoop grids are continuously growing.  Systems managing exabytes of storage are envisioned close. However, this growth is challenged by the scalability limits of Hadoop’s namenode - the distributed filesystem's metadata service. We present the recent efforts by the Hadoop community to re-architect the namenode for the next-generation scalability requirements. We focus on (1) splitting the namenode service into distributed filesystem and block management tiers, (2) scaling the new services in a workload-optimized way, and (3) re-designing the concurrency control for high performance. Implementing these changes while guaranteeing consistent and high-speed access to all storage metadata is a nontrivial task. We present the algorithmic principles behind the design, and demonstrate performance results.