Muhammad Anis Uddin Nasir presents "Load Balancing in Stream Processing Systems"

Muhammad Anis Uddin Nasir
Title: "Load Balancing in Stream Processing Systems"   Muhammad Anis Uddin Nasir ABSTRACT            Load Balancing in Distributed Stream Processing Engines (DSPEs) is a significant problem, as it directly effects the hardware utilization and throughput of the system. However, current solutions, e.g., key grouping and shuffle grouping, are unable to provide sufficient guarantees for load balancing for DSPEs. Therefore, we introduce Partial Key Grouping (PKG), a new stream partitioning strategy that adapts the classical “power of two choices” to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. In so doing, it achieves better load balancing than key grouping while being more scalable than shuffle grouping. Key splitting leverages both choices by relaxing the atomicity constraints of key grouping. Local load estimation solves the problem of gauging the load of downstream servers without any communication overhead. We test PKG on several large datasets, both real-world and synthetic. Compared to standard hashing, PKG reduces the load imbalance by up to seven orders of magnitude, and often achieves nearly-perfect load balance. This result translates into an improvement of up to 60% in throughput and up to 45% in latency when deployed on a real Storm cluster. BIOGRAPHICAL NOTE Currently, Anis is a PhD student at KTH Royal Institute of Technology, working under the Marie Curie Initial Training Network Project called iSocial. Prior to his PhD studies, he finished a European Master in Distributed Computing from KTH Royal Institute of Technology and UPC, Polytechnic University of Catalunya. He holds a Bachelors in Computer Engineering from National University of Science and Technology, Pakistan.