TY - JOUR TI - Analysis of a Very Large Web Search Engine Query Log AU - Silverstein, Craig AU - Marais, Hannes AU - Henzinger, Monika AU - Moricz, Michael T2 - SIGIR Forum AB - In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. We also present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques may not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such. DA - 1999/09// PY - 1999 DO - 10.1145/331403.331405 DP - ACM Digital Library VL - 33 IS - 1 SP - 6 EP - 12 LA - en SN - 0163-5840 UR - http://doi.acm.org/10.1145/331403.331405 Y2 - 2018/03/29/00:38:59 ER - TY - BOOK TI - Modern Information Retrieval AU - Baeza-Yates, Ricardo AU - Ribeiro-Neto, Berthier AB - This is a rigorous and complete textbook for a first course on information retrieval from the computer science perspective. It provides an up-to-date student oriented treatment of information retrieval including extensive coverage of new topics such as web retrieval, web crawling, open source search engines and user interfaces. From parsing to indexing, clustering to classification, retrieval to ranking, and user feedback to retrieval evaluation, all of the most important concepts are carefully introduced and exemplified. The contents and structure of the book have been carefully designed by the two main authors, with individual contributions coming from leading international authorities in the field, including Yoelle Maarek, Senior Director of Yahoo! Research Israel; Dulce Poncele´on IBM Research; and Malcolm Slaney, Yahoo Research USA. This completely reorganized, revised and enlarged second edition of Modern Information Retrieval contains many new chapters and double the number of pages and bibliographic references of the first edition, and a companion website www.mir2ed.org with teaching material. It will prove invaluable to students, professors, researchers, practitioners, and scholars of this fascinating field of information retrieval. DA - 1999/// PY - 1999 DP - Google Books SP - 548 LA - en PB - ACM Press SN - 978-0-201-39829-8 ER - TY - JOUR TI - Relevance: The whole history AU - Mizzaro, Stefano T2 - Journal of the American Society for Information Science AB - Relevance is a fundamental, though not completely understood, concept for documentation, information science, and information retrieval. This article presents the history of relevance through an exhaustive review of the literature. Such history being very complex (about 160 papers are discussed), it is not simple to describe it in a comprehensible way. Thus, first of all a framework for establishing a common ground is defined, and then the history itself is illustrated via the presentation in chronological order of the papers on relevance. The history is divided into three periods (“Before 1958,” “1959–1976,” and “1977–present”) and, inside each period, the papers on relevance are analyzed under seven different aspects (methodological foundations, different kinds of relevance, beyond-topical criteria adopted by users, modes for expression of the relevance judgment, dynamic nature of relevance, types of document representation, and agreement among different judges). © 1997 John Wiley & Sons, Inc. DA - 1997/// PY - 1997 DO - 10.1002/(SICI)1097-4571(199709)48:9<810::AID-ASI6>3.0.CO;2-U DP - Wiley Online Library VL - 48 IS - 9 SP - 810 EP - 832 LA - en SN - 1097-4571 ST - Relevance UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-4571%28199709%2948%3A9%3C810%3A%3AAID-ASI6%3E3.0.CO%3B2-U Y2 - 2019/01/27/23:04:52 ER - TY - JOUR TI - Ranganathan in the Perspective of Advanced Information Retrieval AU - Ingwersen, Peter AU - Wormell, Irene T2 - Libri DA - 1992/// PY - 1992 DP - ProQuest VL - 42 IS - 3 LA - en SN - 0024-2667 UR - https://search.proquest.com/docview/1304366227?pq-origsite=gscholar Y2 - 2018/03/20/19:51:13 ER - TY - JOUR TI - The design of browsing and berrypicking techniques for the online search interface AU - Bates, Marcia J. T2 - Online Review AB - First, a new model of searching in online and other information systems, called ‘berrypicking’, is discussed. This model, it is argued, is much closer to the real behavior of information searchers than the traditional model of information retrieval is, and, consequently, will guide our thinking better in the design of effective interfaces. Second, the research literature of manual information seeking behavior is drawn on for suggestions of capabilities that users might like to have in online systems. Third, based on the new model and the research on information seeking, suggestions are made for how new search capabilities could be incorporated into the design of search interfaces. Particular attention is given to the nature and types of browsing that can be facilitated. DA - 1989/05/01/ PY - 1989 DP - emeraldinsight.com (Atypon) VL - 13 IS - 5 SP - 407 EP - 424 J2 - Online Review LA - en SN - 0309-314X UR - http://www.emeraldinsight.com/doi/abs/10.1108/eb024320 Y2 - 2017/04/06/17:54:48 ER - TY - JOUR TI - Term-weighting approaches in automatic text retrieval AU - Salton, Gerard AU - Buckley, Christopher T2 - Information Processing & Management AB - The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained in automatic term weighting, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared. DA - 1988/01/01/ PY - 1988 DO - 10.1016/0306-4573(88)90021-0 DP - ScienceDirect VL - 24 IS - 5 SP - 513 EP - 523 J2 - Information Processing & Management LA - en SN - 0306-4573 UR - http://www.sciencedirect.com/science/article/pii/0306457388900210 Y2 - 2016/10/15/20:58:32 ER - TY - JOUR TI - A study of information seeking and retrieving. III. Searchers, searches, and overlap AU - Saracevic, Tefko AU - Kantor, Paul T2 - Journal of the American Society for Information Science AB - The objectives of the study were to conduct a series of observations and experiments under as real-life situation as possible related to: (1) user context of questions in information retrieval; (2) the structure and classification of questions; (3) cognitive traits and decision making of searchers; and (4) diferent searches of the same question. The study is presented in three parts: Part I presents the background of the study and describes the models, measures, methods, procedures and statistical analyses used. Part II is devoted to results related to users, questions and effectiveness measures, and Part III to results related to searchers, searches and overlap studies. A concluding summary of all results is presented in Part III. © 1988 John Wiley & Sons, Inc. DA - 1988/// PY - 1988 DO - 10.1002/(SICI)1097-4571(198805)39:3<197::AID-ASI4>3.0.CO;2-A DP - Wiley Online Library VL - 39 IS - 3 SP - 197 EP - 216 LA - en SN - 1097-4571 UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-4571%28198805%2939%3A3%3C197%3A%3AAID-ASI4%3E3.0.CO%3B2-A Y2 - 2019/01/21/23:49:43 ER - TY - JOUR TI - A Vector Space Model for Automatic Indexing AU - Salton, G. AU - Wong, A. AU - Yang, C. S. T2 - Commun. ACM AB - In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model. DA - 1975/11// PY - 1975 DO - 10.1145/361219.361220 DP - ACM Digital Library VL - 18 IS - 11 SP - 613 EP - 620 LA - en SN - 0001-0782 UR - http://doi.acm.org/10.1145/361219.361220 Y2 - 2017/11/08/22:43:01 ER - TY - JOUR TI - On Relevance, Probabilistic Indexing and Information Retrieval AU - Maron, M. E. AU - Kuhns, J. L. T2 - Journal of the ACM AB - This paper reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called “Probabilistic Indexing,” allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the “relevance number”) for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing (“see” and “see also”) is based solely on the “semantical closeness” between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can elaborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggests an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user. DA - 1960/07// PY - 1960 DO - 10.1145/321033.321035 DP - ACM Digital Library VL - 7 IS - 3 SP - 216 EP - 244 LA - en SN - 0004-5411 UR - http://doi.acm.org/10.1145/321033.321035 Y2 - 2019/01/27/23:02:51 ER -