TY  - JOUR
TI  - The Probabilistic Relevance Framework: BM25 and Beyond
AU  - Robertson, Stephen
AU  - Zaragoza, Hugo
T2  - Foundations and Trends® in Information Retrieval
AB  - The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.
DA  - 2009/12/17/
PY  - 2009
DO  - 10.1561/1500000019
DP  - www.nowpublishers.com
VL  - 3
IS  - 4
SP  - 333
EP  - 389
J2  - INR
LA  - en
SN  - 1554-0669, 1554-0677
ST  - The Probabilistic Relevance Framework
UR  - https://www.nowpublishers.com/article/Details/INR-019
Y2  - 2019/01/18/20:09:44
ER  - 

TY  - CONF
TI  - Diversifying Search Results
AU  - Agrawal, Rakesh
AU  - Gollapudi, Sreenivas
AU  - Halverson, Alan
AU  - Ieong, Samuel
T3  - WSDM '09
AB  - We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. We demonstrate empirically that our algorithm scores higher in these generalized metrics compared to results produced by commercial search engines.
C1  - New York, NY, USA
C3  - Proceedings of the Second ACM International Conference on Web Search and Data Mining
DA  - 2009///
PY  - 2009
DO  - 10.1145/1498759.1498766
DP  - ACM Digital Library
SP  - 5
EP  - 14
LA  - en
PB  - ACM
SN  - 978-1-60558-390-7
UR  - http://doi.acm.org/10.1145/1498759.1498766
Y2  - 2019/01/27/21:41:12
ER  - 

TY  - CONF
TI  - Optimizing search engines using clickthrough data
AU  - Joachims, Thorsten
T2  - KDD '02
AB  - This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below.  While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts.  This makes them difficult and expensive to apply.  The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework.  Furthermore, it is shown to be feasible even for large sets of queries and features.  The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
C1  - Edmonton, Alberta, Canada
C3  - Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
DA  - 2002/07/23/
PY  - 2002
DO  - 10.1145/775047.775067
DP  - dl.acm.org
SP  - 133
EP  - 142
LA  - en
PB  - ACM
SN  - 978-1-58113-567-1
UR  - http://dl.acm.org/citation.cfm?id=775047.775067
Y2  - 2019/01/18/20:54:23
ER  -