Webspam Identification Through Content and Hyperlinks
Source:
Fourth International Workshop on Adversarial Information Retrieval on the Web, ACM Press, Beijing, China (2008)
ISBN:
978-1-60558-159-0
URL:
http://airweb.cse.lehigh.edu/2008/submissions/abernethy_2008_witch_content_links.pdf
Abstract:
We present an algorithm, WITCH, that learns to detect spam
hosts or pages on the Web. Unlike most other approaches,
it simultaneously exploits the structure of the Web graph
as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a
standard Web spam benchmark.