Temporal evolution of the UK Web
Source:
Workshop on Analysis of Dynamic Networks (ICDM-ADN'08), Pisa, Italy (2008)
Abstract:
Recently, a new temporal dataset has been made public: it is made of a series
of twelve 100M pages snapshots of the .uk domain. The Web graphs
of the twelve snapshots have been merged into a single time-aware graph that provide
constant-time access to temporal information. In this paper we present the first
statistical analysis performed on this graph, with the goal of checking whether
the information contained in the graph is reliable (i.e., whether it depends essentially
on appearance and disappearance of pages and links, or on the crawler behaviour). We
perform a number of tests that show that the graph is actually reliable, and provide the first
public data on the evolution of the Web that use a large scale and a significant diversity
in the sites considered.
Download: