A Bayesian Network for XML Information Retrieval: Searching and Learning
with the INEX Collection
Source:
Information Retrieval, Volume 8, Number 4, p.655-681 (2005)
URL:
http://www.ingentaconnect.com/content/klu/inrt/2005/00000008/00000004/00000751
Keywords:
selection, piwowarski
Abstract:
Most recent document standards like XML rely on structured representations.
On the other hand, current information retrieval systems have been
developed for flat document representations and cannot be easily
extended to cope with more complex document types. The design of
such systems is still an open problem. We present a new model for
structured document retrieval which allows computing scores of document
parts. This model is based on Bayesian networks whose conditional
probabilities are learnt from a labelled collection of structured
documents– which is composed of documents, queries and their
associated assessments. Training these models is a complex machine
learning task and is not standard. This is the focus of the paper:
we propose here to train the structured Bayesian Network model using
a cross-entropy training criterion. Results are presented on the
INEX corpus of XML documents.