Fast Reliability Search in Uncertain Graphs

Publication
Mar 24, 2014
Abstract

Uncertain, or probabilistic, graphs have been increasingly used to represent noisy linked data in many emerging application scenarios, and have recently attracted the attention of the database research community. A fundamental problem on uncertain graphs is reliability, which deals with the probability of nodes being reachable one from another. Existing literature has exclusively focused on reliability detection, which asks to compute the probability that two given nodes are connected. In this paper we study reliability search on uncertain graphs, which we define as the problem of computing all nodes reachable from a set of query nodes with probability no less than a given threshold. Existing reliability-detection approaches are not well-suited to efficiently handle the reliability-search problem. We propose RQ-tree, a novel index which is based on a hierarchical clustering of the nodes in the graph, and further optimized using a balanced-minimum-cut criterion. Based on RQ-tree, we define a fast filtering-and-verification online query-evaluation strategy that relies on a maximum-flow-based candidate-generation phase, followed by a verification phase consisting of either a lower-bounding method or a sampling technique. The first verification method returns no incorrect nodes, thus guaranteeing perfect precision, completely avoids sampling, and is more efficient. The second verification method ensures instead better recall. Extensive experiments on real-world uncertain graphs show that our methods are very efficient—over state-of-the-art reliability-detection methods, we obtain a speed-up up to five orders of magnitude; as well as accurate—our techniques achieve precision > 0.95 and recall usually higher than 0.75.

  • Proceedings of the International Conference on Extending Database Technology (EDBT ’14)
  • Conference/Workshop Paper

BibTeX