Mining large networks with subgraph counting
Source:
International Conference on Data Mining (ICDM) (2008)
Abstract:
The problem of mining frequent patterns in networks has many
applications, including analysis of complex networks, clustering of
graphs, finding communities in social networks, and indexing of
graphical and biological databases.
Despite this wealth of applications, the current state of the art
lacks algorithmic tools for counting the number of subgraphs contained
in a large network.
In this paper we develop data-stream algorithms that approximate the
number of all subgraphs of three and four vertices in directed and
undirected networks. We use the frequency of occurrence of all subgraphs to prove
their significance in order to characterize different kinds of networks:
we achieve very good precision in clustering networks with
similar structure.
The significance of our method is supported by the fact that such high
precision cannot be achieved when performing clustering based on simpler
topological properties, such as degree, assortativity, and eigenvector
distributions.
We have also tested our techniques using swap randomization.
Download: