Saturday, February 15, 2014

Week 6 Readings

Week 6 Readings

Hiemstra and de Vries
This paper presents the possibilities of the statistical language retrieval model.  This is a great text to introduce us to the model, because it is framed in a comparison to similar concepts from the three retrieval models we have already learned about.  It seems that this model will work much better on longer queries, for a short query of only one or two words, it will essentially perform exactly like tf idf weighting in the vector space model.
It's really interesting that a Boolean query may not be valid when using language retrieval.  An AND joined with an OR will weigh a two term query versus a one term query and this does not work in the language retrieval model.  Instead of stemming, the query can use all variants of a stem joined with OR to perform the same function.  This is pretty cool, and not just useful in the language model, but it seems like it will be slower and use more resources to perform.

IIR Chapter 11
Probabilistic retrieval is possible because users have to give an estimation of their query, and the system has to give an estimation of the document representation.  This means that matching these estimations can be itself estimated.

IIR Chapter 12
The language retrieval models are all based on establishing a language model and comparing the words in the other part of the model to this language.  Which models the words are most likely to have come from are ranked highest.  There are three basic ways of doing these comparisons:  make each document a language, or make each query a language, or do both and compare them.  Smoothing is important here too, because a word that does not appear in the small sample of the language model (the document) may still be a part of the language.  A real world equivalent to this would be hearing someone speak sounds and trying to guess what language these sounds are part of.

No comments:

Post a Comment