Friday, February 21, 2014

Week 8 Readings

Week 8 Readings

IIR 9
Synonymy is a problem that I am really interested in!  I look forward to reading this chapter, especially the sections on query expansion.
Relevance feedback has a fundamental problem to me.  How can you separate query reformulation from an evolving query?  I think that the end result of either will be bettered through relevance feedback, but it seems theoretically unsound. I'm really glad they begin with the Rocchio algorithm, because my first thought was that relevance feedback is an extension of statistical language models.  This would work through computing a new language model incorporating the relevant documents in the query's language model.  Rocchio's algorithm was designed for vector space retrieval, and can also be applied simply to a probabilistic retrieval model.  However, one major difficulty of using relevance feedback in general is that users do not want to spend a long time interacting with their search engine.  Also, if a user does perform relevance feedback and the newly generated results set still contains some poor matches, this is a big failure to most users.  Global relevance feedback works with a thesaurus to expand concepts.  But each of these thesaurii has major challenges, and this is not a very good expansion method.

Xu and Croft
Term mismatch problem has been identified as an issue for a very long time.  The two solutions, global and local each have an issue.  For this reason, they have submitted a new concept called local context analysis.  This takes the idea of cooccurence of terms in the query and returned documents.  If the term is infrequent in the collection but cooccurs highly it is a good term to expand the query with.

Wang, Fang, and Zhai
Using negative feedback (no clickthroughs) to load better results on pages 2 and beyond is such an awesome idea!  Four different negative feedback models are presented, and two ideas for how to make the TREC collection contain a good number of negative results queries.  This is so future researchers can perform similar experiments and compare to these initial results.

Harman
The author found three issues in relevance feedback.
1- The probablistic model could not be extended with relevance judgments.  Through modifying Sparck Jones this is possible.
2- How to decide what terms to include from documents selected as relevant.  Harman recommends 20 most relevant terms, though this weighting is left up in the air.
3- Diminishing returns in multiple feedback passes.  Harman actually argues against this, and in a very pre-WWW mindset recommends looking through many non-relevant documents for thoroughness.

No comments:

Post a Comment