Semi-Parametric and Non-parametric Term Weighting for Information Retrieval
Source:
International Conference on the Theory of Information Retrieval (ICTIR) (2009)
Abstract:
Most of the previous research on term weighting for information retrieval has focused on developing specialized parametric
term weighting functions. Examples include TF.IDF vector-space formulations, BM25, and language modeling weighting. Each
of these term weighting functions takes on a specific parametric form. While these weighting functions have proven to be
highly effective, they impose strict constraints on the functional form of the term weights. Such constraints
may possibly degrade retrieval effectiveness. In this paper we propose two new classes of term weighting schemes that we
call semi-parametric and non-parametric weighting. These weighting schemes make fewer assumptions about the underlying term
weights and allow the data to speak for itself. We argue that these robust weighting schemes have the potential to be
significantly more effective compared to existing parametric schemes, especially with the growing amount of training data
becoming available.
Download: