TY - JOUR TI - On Relevance, Probabilistic Indexing and Information Retrieval AU - Maron, M. E. AU - Kuhns, J. L. T2 - Journal of the ACM AB - This paper reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called “Probabilistic Indexing,” allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the “relevance number”) for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing (“see” and “see also”) is based solely on the “semantical closeness” between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can elaborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggests an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user. DA - 1960/07// PY - 1960 DO - 10.1145/321033.321035 DP - ACM Digital Library VL - 7 IS - 3 SP - 216 EP - 244 LA - en SN - 0004-5411 UR - http://doi.acm.org/10.1145/321033.321035 Y2 - 2019/01/27/23:02:51 ER - TY - JOUR TI - Relevance: The whole history AU - Mizzaro, Stefano T2 - Journal of the American Society for Information Science AB - Relevance is a fundamental, though not completely understood, concept for documentation, information science, and information retrieval. This article presents the history of relevance through an exhaustive review of the literature. Such history being very complex (about 160 papers are discussed), it is not simple to describe it in a comprehensible way. Thus, first of all a framework for establishing a common ground is defined, and then the history itself is illustrated via the presentation in chronological order of the papers on relevance. The history is divided into three periods (“Before 1958,” “1959–1976,” and “1977–present”) and, inside each period, the papers on relevance are analyzed under seven different aspects (methodological foundations, different kinds of relevance, beyond-topical criteria adopted by users, modes for expression of the relevance judgment, dynamic nature of relevance, types of document representation, and agreement among different judges). © 1997 John Wiley & Sons, Inc. DA - 1997/// PY - 1997 DO - 10.1002/(SICI)1097-4571(199709)48:9<810::AID-ASI6>3.0.CO;2-U DP - Wiley Online Library VL - 48 IS - 9 SP - 810 EP - 832 LA - en SN - 1097-4571 ST - Relevance UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-4571%28199709%2948%3A9%3C810%3A%3AAID-ASI6%3E3.0.CO%3B2-U Y2 - 2019/01/27/23:04:52 ER - TY - CONF TI - Optimizing search engines using clickthrough data AU - Joachims, Thorsten T2 - KDD '02 AB - This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples. C1 - Edmonton, Alberta, Canada C3 - Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining DA - 2002/07/23/ PY - 2002 DO - 10.1145/775047.775067 DP - dl.acm.org SP - 133 EP - 142 LA - en PB - ACM SN - 978-1-58113-567-1 UR - http://dl.acm.org/citation.cfm?id=775047.775067 Y2 - 2019/01/18/20:54:23 ER - TY - CONF TI - Accurately Interpreting Clickthrough Data As Implicit Feedback AU - Joachims, Thorsten AU - Granka, Laura AU - Pan, Bing AU - Hembrooke, Helene AU - Gay, Geri T2 - SIGIR'05 AB - This paper examines the reliability of implicit feedback generated from clickthrough data in WWW search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average. C3 - Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005 DA - 2005/// PY - 2005 DP - ACM Digital Library SP - 154 EP - 161 LA - en Y2 - 2019/01/18/20:45:44 ER - TY - JOUR TI - Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance AU - Saracevic, Tefko T2 - Journal of the American Society for Information Science and Technology AB - All is flux. —Plato on Knowledge in the Theaetetus (about 369 BC) Relevance is a, if not even the, key notion in information science in general and information retrieval in particular. This two-part critical review traces and synthesizes the scholarship on relevance over the past 30 years or so and provides an updated framework within which the still widely dissonant ideas and works about relevance might be interpreted and related. It is a continuation and update of a similar review that appeared in 1975 under the same title, considered here as being Part I. The present review is organized in two parts: Part II addresses the questions related to nature and manifestations of relevance, and Part III addresses questions related to relevance behavior and effects. In Part II, the nature of relevance is discussed in terms of meaning ascribed to relevance, theories used or proposed, and models that have been developed. The manifestations of relevance are classified as to several kinds of relevance that form an interdependent system of relevancies. In Part III, relevance behavior and effects are synthesized using experimental and observational works that incorporated data. In both parts, each section concludes with a summary that in effect provides an interpretation and synthesis of contemporary thinking on the topic treated or suggests hypotheses for future research. Analyses of some of the major trends that shape relevance work are offered in conclusions. DA - 2007/11/01/ PY - 2007 DO - 10.1002/asi.20681 DP - Wiley Online Library VL - 58 IS - 13 SP - 2126 EP - 2144 LA - en SN - 1532-2890 ST - Relevance UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.20681 Y2 - 2018/04/25/22:24:47 ER - TY - JOUR TI - Redundancy, diversity and interdependent document relevance AU - Radlinski, Filip AU - Bennett, Paul N. AU - Carterette, Ben AU - Joachims, Thorsten T2 - ACM SIGIR Forum AB - The goal of the Redundancy, Diversity, and Interdependent Document Relevance workshop was to explore how ranking, performance assessment and learning to rank can move beyond the assumption that the relevance of a document is independent of other documents. In particular, the workshop focussed on three themes: the effect of redundancy on information retrieval utility (for example, minimizing the wasted effort of users who must skip redundant information), the role of diversity (for example, for mitigating the risk of misinterpreting ambiguous queries), and algorithms for set-level optimization (where the quality of a set of retrieved documents is not simply the sum of its parts). This workshop built directly upon the Beyond Binary Relevance: Preferences, Diversity and Set-Level Judgments workshop at SIGIR 2008 [3], shifting focus to address the questions left open by the discussions and results from that workshop. As such, it was the first workshop to explicitly focus on the related research challenges of redundancy, diversity, and interdependent relevance – all of which require novel performance measures, learning methods, and evaluation techniques. The workshop program committee consisted of 15 researchers from academia and industry, with experience in IR evaluation, machine learning, and IR algorithmic design. Over 40 people attended the workshop. This report aims to summarize the workshop, and also to systematize common themes and key concepts so as to encourage research in the three workshop themes. It contains our attempt to summarize and organize the topics that came up in presentations as well as in discussions, pulling out common elements. Many audience members contributed, yet due to the free-flowing discussion, attributing all the observations to particular audience members is unfortunately impossible. Not all audience members would necessarily agree with the views presented, but we do attempt to present a consensus view as far as possible. DA - 2009/12/14/ PY - 2009 DO - 10.1145/1670564.1670572 DP - dl.acm.org VL - 43 IS - 2 SP - 46 EP - 52 LA - en SN - 0163-5840 UR - http://dl.acm.org/citation.cfm?id=1670564.1670572 Y2 - 2019/01/27/19:48:40 ER - TY - JOUR TI - The Probabilistic Relevance Framework: BM25 and Beyond AU - Robertson, Stephen AU - Zaragoza, Hugo T2 - Foundations and Trends® in Information Retrieval AB - The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters. DA - 2009/12/17/ PY - 2009 DO - 10.1561/1500000019 DP - www.nowpublishers.com VL - 3 IS - 4 SP - 333 EP - 389 J2 - INR LA - en SN - 1554-0669, 1554-0677 ST - The Probabilistic Relevance Framework UR - https://www.nowpublishers.com/article/Details/INR-019 Y2 - 2019/01/18/20:09:44 ER - TY - JOUR TI - The foundation of the concept of relevance AU - Hjørland, Birger T2 - Journal of the American Society for Information Science and Technology AB - In 1975 Tefko Saracevic declared “the subject knowledge view” to be the most fundamental perspective of relevance. This paper examines the assumptions in different views of relevance, including “the system's view” and “the user's view” and offers a reinterpretation of these views. The paper finds that what was regarded as the most fundamental view by Saracevic in 1975 has not since been considered (with very few exceptions). Other views, which are based on less fruitful assumptions, have dominated the discourse on relevance in information retrieval and information science. Many authors have reexamined the concept of relevance in information science, but have neglected the subject knowledge view, hence basic theoretical assumptions seem not to have been properly addressed. It is as urgent now as it was in 1975 seriously to consider “the subject knowledge view” of relevance (which may also be termed “the epistemological view”). The concept of relevance, like other basic concepts, is influenced by overall approaches to information science, such as the cognitive view and the domain-analytic view. There is today a trend toward a social paradigm for information science. This paper offers an understanding of relevance from such a social point of view. DA - 2010/02/01/ PY - 2010 DO - 10.1002/asi.21261 DP - Wiley Online Library VL - 61 IS - 2 SP - 217 EP - 237 LA - en SN - 1532-2890 UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.21261 Y2 - 2018/04/25/22:21:19 ER -