A Machine Learning Model for Information Retrieval with Structured
Documents
Source:
Machine Learning and Data Mining in Pattern Recognition, Springer Verlag, Leipzig, Germany, p.425–438 (2003)
Keywords:
piwowarski
Abstract:
Most recent document standards rely on structured representations.
On the other hand, current information retrieval systems have been
developed for flat document representations and cannot be easily
extended to cope with more complex document types. Only a few models
have been proposed for handling structured documents, and the design
of such systems is still an open problem. We present here a new model
for structured document retrieval which allows to compute and to
combine the scores of document parts. It is based on bayesian networks
and allows for learning the model parameters in the presence of incomplete
data. We present an application of this model for ad-hoc retrieval
and evaluate its performances on a small structured collection. The
model can also be extended to cope with other tasks such as interactive
navigation in structured documents or corpus