Distributed computing is sort of like herding cats. If not properly managed, chaos results.
In a distributed computing environment, different parts of an application run simultaneously across thousands of machines. Without someone or something in charge to manage all these machines, utter confusion can ensue.
A small group of Yahoo! Research scientists led by Benjamin Reed is trying to bring order to this chaos with a new service they developed called, appropriately enough, Zookeeper.
Zookeeper’s goal is to become an easy-to-use service that any developer can plug into his or her distributed computing project.
Large distributed computing systems typically require a master server, or manager, that coordinates and directs all the other machines. The manager makes sure that each instance of the applicaton has the right configuration, and that if a particular machine fails, there is another one waiting to pick up the slack.
The problem is that developers don't put much thought into writing a manager because it is not the main goal of their projects. What usually happens is one of two things. Either they take their time to build good managers, and deviate from their primary goals. Or, the manager is so simple that it lacks important reliability and scalability features.
“We thought, why don’t we make life easier for developers by creating a service that does this managing task really well and can be used over and over again,” says Reed, who worked on the project along with Yahoo! Research scientists Brian Cooper and Flavio Junqueira. Zookeeper has a supporting cast of engineers including Lawrence
Ramontianu, who worked on an early prototype, as well as Mahadev Konar and Runping Qi, who are part of a group managed by Pete Wyckoff.
Zookeeper, which is packaged as a simple interface, is highly available and can tolerate failures of individual servers. Currently, the Yahoo! message broker is using the service for its configuration needs as well as for failure detection and recovery. If a particular machine fails, zookeeper alerts interested parties about the failure so that other parties can then recover.
Reed and his team also have plans to open source Zookeeper so that applications outside of Yahoo! can benefit from the service. “We really designed it to be a very general service for any distributed system with a large number of machines,” Reed explains. “I’d like to think Zookeeper is applicable to any project.”