Project

Graph Partitioning



What would be a good way to split this graph into clusters?

Yahoo! Research is hard at work creating innovative and scalable graph partitioning algorithms that make it possible to efficiently extract latent structures from large complex data networks.

A graph is a representation of a set of objects, or “nodes,” and the relationships between them, known as “links.” An example of a graph is a social network, with the nodes representing the people and the links representing some form of relationship or interaction between them. A graph can also represent the spending patterns in sponsored search, where the nodes represent query phrases and advertisers, and the link between advertisers and phrases indicates that the advertiser spends money on the phrase.

Graph clustering is the partitioning of nodes into groups that share a common trait. In a social network, a cluster might be a smaller community, or set of people, that interacts more within itself than with other communities. In sponsored search, each cluster is a submarket that represents a specific group of advertisers that do most of their spending on a group of query phrases.

Knowing the submarkets in a sponsored search graph could help in several ways: suggesting phrases to advertisers that could potentially be more profitable; rewriting queries at run time to other phrases in the same submarket; and allowing economists and statisticians to test
economic theories and proposed policy changes in relatively self-contained
submarkets.



The node colors show the clusters selected by the Research team’s algorithm

The design of efficient algorithms for graph clustering is a long-standing and active research topic in computer science. Yahoo! researcher Kevin Lang, together with Reid Anderson, Fan Chung, Anirban Dasgupa, Jure Leskovec, Michael Mahoney, Lorenzo Orecchia and Satish Rao, is leading the effort in finding the clusters of Yahoo!’s sponsored search spending graph.

Initially, using expensive partitioning algorithms, the team at Yahoo! Research discovered that even when the graph is large, containing 100 million nodes, the clusters that are typically found are much smaller, with a size range of one hundred to one million nodes – implying that the time it takes to calculate the clusters should be much faster. (“Statistical Properties of Community Structure in Large Social and Information Networks”)

This discovery led the team to study a new class of improved algorithms, namely “local partitioning” algorithms, which have the unusual property that their run time is proportional to the size of their resulting clusters (the output), and not the size of the original graph (the input).
(“Local Graph Partitioning using PageRank Vectors”)

“The conceptual shift from global to local partitioning algorithms has permitted much larger graphs to be processed than before,” said Lang. “We now have the world’s most scalable graph partitioning algorithms that are especially applicable to large complex networks.”