TY - JOUR TI - The Probabilistic Relevance Framework: BM25 and Beyond AU - Robertson, Stephen AU - Zaragoza, Hugo T2 - Foundations and Trends® in Information Retrieval AB - The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters. DA - 2009/12/17/ PY - 2009 DO - 10.1561/1500000019 DP - www.nowpublishers.com VL - 3 IS - 4 SP - 333 EP - 389 J2 - INR LA - en SN - 1554-0669, 1554-0677 ST - The Probabilistic Relevance Framework UR - https://www.nowpublishers.com/article/Details/INR-019 Y2 - 2019/01/18/20:09:44 ER - TY - CONF TI - Diversifying Search Results AU - Agrawal, Rakesh AU - Gollapudi, Sreenivas AU - Halverson, Alan AU - Ieong, Samuel T3 - WSDM '09 AB - We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. We demonstrate empirically that our algorithm scores higher in these generalized metrics compared to results produced by commercial search engines. C1 - New York, NY, USA C3 - Proceedings of the Second ACM International Conference on Web Search and Data Mining DA - 2009/// PY - 2009 DO - 10.1145/1498759.1498766 DP - ACM Digital Library SP - 5 EP - 14 LA - en PB - ACM SN - 978-1-60558-390-7 UR - http://doi.acm.org/10.1145/1498759.1498766 Y2 - 2019/01/27/21:41:12 ER - TY - CONF TI - Optimizing search engines using clickthrough data AU - Joachims, Thorsten T2 - KDD '02 AB - This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples. C1 - Edmonton, Alberta, Canada C3 - Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining DA - 2002/07/23/ PY - 2002 DO - 10.1145/775047.775067 DP - dl.acm.org SP - 133 EP - 142 LA - en PB - ACM SN - 978-1-58113-567-1 UR - http://dl.acm.org/citation.cfm?id=775047.775067 Y2 - 2019/01/18/20:54:23 ER -