Link-Based Characterization and Detection of {Web} {Spam}
Source:
Second International Workshop on Adversarial Information Retrieval on the Web ({AIRWeb}), Seattle, USA (2006)
URL:
http://www.dcc.uchile.cl/%7Eccastill/papers/becchetti_2006_link_based_characterization_detection_web_spam.pdf
Keywords:
adversarial-ir
Abstract:
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4\% of the Web spam inour sample, with an error rate of 1.1\% of false positives.