Full bibliography 63 resources
Yee, K.-P., Swearingen, K., Li, K., & Hearst, M. (2003). Faceted Metadata for Image Search and Browsing. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 401–408. https://doi.org/10.1145/642611.642681
There are currently two dominant interface types for searching and browsing large image collections: keyword-based search, and searching by overall similarity to sample images. We present an alternative based on enabling users to navigate along conceptual dimensions that describe the images. The interface makes use of hierarchical faceted metadata and dynamically generated query previews. A usability study, in which 32 art history students explored a collection of 35,000 fine arts images, compares this approach to a standard image search interface. Despite the unfamiliarity and power of the interface (attributes that often lead to rejection of new search interfaces), the study results show that 90% of the participants preferred the metadata approach overall, 97% said that it helped them learn more about the collection, 75% found it more flexible, and 72% found it easier to use than a standard baseline system. These results indicate that a category-based approach is a successful way to provide access to image collections.
Broder, A. (2002). A Taxonomy of Web Search. SIGIR Forum, 36(2), 3–10. https://doi.org/10.1145/792550.792552
Classic IR (information retrieval) is inherently predicated on users searching for information, the so-called "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop, download a file, or find a map). We explore this taxonomy of web searches and discuss how global search engines evolved to deal with web-specific needs.
Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., & Yee, K.-P. (2002). Finding the flow in web site search. Communications of the ACM, 45(9), 42–49. https://doi.org/10.1145/567498.567525
Designing a search system and interface may best be served (and executed) by scrutinizing usability studies.
Joachims, T. (2002). Optimizing search engines using clickthrough data. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 133–142. https://doi.org/10.1145/775047.775067
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
English, J., Hearst, M., Sinha, R., Swearingen, K., & Yee, K.-P. (2002). Hierarchical Faceted Metadata in Site Search Interfaces. CHI ’02 Extended Abstracts on Human Factors in Computing Systems, 628–639. https://doi.org/10.1145/506443.506517
One of the most pressing usability issues in the design of large web sites is that of the organization of search results. A previous study on a moderate-sized web site indicated that users understood and preferred dynamically organized faceted metadata over standard search. We are now examining how to scale this approach to very large collections, since it is difficult to present hierarchical faceted metadata in a manner appealing and understandable to general users. We have iteratively designed and tested interfaces that address these design challenges; the most recent version is receiving enthusiastic responses in ongoing usability studies.
Spink, A., Wolfram, D., Jansen, M. B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234. https://doi.org/10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.0.CO;2-R
In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms are used with high frequency, and a great many terms are unique; the language of Web queries is distinctive. Queries about recreation and entertainment rank highest. Findings are compared to data from two other large studies of Web queries. This study provides an insight into the public practices and choices in Web searching.
Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing & Management, 36(2), 207–227. https://doi.org/10.1016/S0306-4573(99)00056-4
We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions — changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries — the number of search terms, and the use of logic and modifiers; and (iii) terms — their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure analysis, identifying trends among user mistakes. We conclude with a summary of findings and a discussion of the implications of these findings.
Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a Very Large Web Search Engine Query Log. SIGIR Forum, 33(1), 6–12. https://doi.org/10.1145/331403.331405
In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. We also present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques may not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such.
This is a rigorous and complete textbook for a first course on information retrieval from the computer science perspective. It provides an up-to-date student oriented treatment of information retrieval including extensive coverage of new topics such as web retrieval, web crawling, open source search engines and user interfaces. From parsing to indexing, clustering to classification, retrieval to ranking, and user feedback to retrieval evaluation, all of the most important concepts are carefully introduced and exemplified. The contents and structure of the book have been carefully designed by the two main authors, with individual contributions coming from leading international authorities in the field, including Yoelle Maarek, Senior Director of Yahoo! Research Israel; Dulce Poncele´on IBM Research; and Malcolm Slaney, Yahoo Research USA. This completely reorganized, revised and enlarged second edition of Modern Information Retrieval contains many new chapters and double the number of pages and bibliographic references of the first edition, and a companion website www.mir2ed.org with teaching material. It will prove invaluable to students, professors, researchers, practitioners, and scholars of this fascinating field of information retrieval.
Maniez, J. (1999). Des classifications aux thésaurus : du bon usage des facettes. Documentaliste, 36(4/5), 249–264.
L'usage du terme facette est bien intégré au vocabulaire de la science de l'information, mais les acceptions du mot sont si variables selon les auteurs que la perception de son contenu en devient problématique. L'A. montre ici que ces difficultés remontent au fondateur de la théorie des facettes, Ranganathan, qui a malencontreusement choisi un terme métaphorique du vocabulaire courant déjà chargé de sens, et dont la théorie des facettes est toujours restée ambiguë. L'auteur de cette étude en montre les inconsistances à partir du modèle linguistique des deux axes du langage, puis repère les grandes étapes de l'évolution qui a conduit les partisans des facettes du schéma analytico-synthétique de Ranganathan au schéma analytique du thésaurus à facettes. II plaide enfin pour un usage plus rigoureux du terme et de l'outil, qui fasse clairement le partage entre la classification des concepts et le classement des sujets.
Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643–675. https://doi.org/10.1037/0033-295X.106.4.643
Information foraging theory is an approach to understanding how strategies and technologies for information seeking, gathering, and consumption are adapted to the flux of information in the environment. The theory assumes that people, when possible, will modify their strategies or the structure of the environment to maximize their rate of gaining valuable information. The theory is developed by (a) adaptation (rational) analysis of information foraging problems and (b) a detailed process model (adaptive control of thought in information foraging [ACT-IF]). The adaptation analysis develops (a) information patch models, which deal with time allocation and information filtering and enrichment activities in environments in which information is encountered in clusters; (b) information scent models, which address the identification of information value from proximal cues; and (c) information diet models, which address decisions about the selection and pursuit of information items. ACT-IF is instantiated as a production system model of people interacting with complex information technology.
Wilson, T. D. (1999). Models in information behaviour research. Journal of Documentation, 55(3), 249–270. https://doi.org/10.1108/EUM0000000007145
This paper presents an outline of models of information seeking and other aspects of information behaviour, showing the relationship between communication and information behaviour in general with information seeking and information searching in information retrieval systems. It is suggested that these models address issues at various levels of information behaviour and that they can be related by envisaging a ‘nesting’ of models. It is also suggested that, within both information seeking research and information searching research, alternative models address similar issues in related ways and that the models are complementary rather than conflicting. Finally, an alternative, problem-solving model is presented, which, it is suggested, provides a basis for relating the models in appropriate research strategies.
Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48(9), 810–832. https://doi.org/10.1002/(SICI)1097-4571(199709)48:9<810::AID-ASI6>3.0.CO;2-U
Relevance is a fundamental, though not completely understood, concept for documentation, information science, and information retrieval. This article presents the history of relevance through an exhaustive review of the literature. Such history being very complex (about 160 papers are discussed), it is not simple to describe it in a comprehensible way. Thus, first of all a framework for establishing a common ground is defined, and then the history itself is illustrated via the presentation in chronological order of the papers on relevance. The history is divided into three periods (“Before 1958,” “1959–1976,” and “1977–present”) and, inside each period, the papers on relevance are analyzed under seven different aspects (methodological foundations, different kinds of relevance, beyond-topical criteria adopted by users, modes for expression of the relevance judgment, dynamic nature of relevance, types of document representation, and agreement among different judges). © 1997 John Wiley & Sons, Inc.
Kuhlthau, C. C. (1993). A principle of uncertainty for information seeking. Journal of Documentation, 49(4), 339–355. https://doi.org/10.1108/eb026918
Ingwersen, P., & Wormell, I. (1992). Ranganathan in the Perspective of Advanced Information Retrieval. Libri, 42(3). https://search.proquest.com/docview/1304366227?pq-origsite=gscholar
Kuhlthau, C. C. (1991). Inside the Search Process: Information Seeking from the User’s Perspective. Journal of the American Society for Information Science, 42(5). http://search.proquest.com/docview/1301244250/citation/2FBBEAD901A4984PQ/1
Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407–424. http://www.emeraldinsight.com/doi/abs/10.1108/eb024320
First, a new model of searching in online and other information systems, called ‘berrypicking’, is discussed. This model, it is argued, is much closer to the real behavior of information searchers than the traditional model of information retrieval is, and, consequently, will guide our thinking better in the design of effective interfaces. Second, the research literature of manual information seeking behavior is drawn on for suggestions of capabilities that users might like to have in online systems. Third, based on the new model and the research on information seeking, suggestions are made for how new search capabilities could be incorporated into the design of search interfaces. Particular attention is given to the nature and types of browsing that can be facilitated.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0
The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained in automatic term weighting, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared.
Saracevic, T., & Kantor, P. (1988). A study of information seeking and retrieving. II. Users, questions, and effectiveness. Journal of the American Society for Information Science, 39(3), 177–196. https://doi.org/10.1002/(SICI)1097-4571(198805)39:3<177::AID-ASI3>3.0.CO;2-F
The objectives of the study were to conduct a series of observations and experiments under as real-life a situation as possible related to: (1) user context of questions in information retrieval; (2) the structure and classification of questions; (3) cognitive traits and decision making of searchers; and (4) different searches of the same question. The study is presented in three parts: Part I presents the background of the study and describes the models, measures, methods, procedures and statistical analyses used. Part II is devoted to results related to users, questions and effectiveness measures, and Part III to results related to searchers, searches and overlap studies. A concluding summary of all results is presented in Part III. © 1988 John Wiley & Sons, Inc.
- Information behavior (16)
- Faceted search (11)
- Implicit feedback (5)
- Diversity (6)
- Relevance (8)
- Search log analysis (5)
- Facet analysis (11)
- Ontology (1)
Field of study
- Computer science (23)
- Information science (40)
- Algorithm (7)
- Conceptual model (24)
- Empirical study (20)
- Evaluation model (5)
- Literature review (6)
- Methodology (3)
- Primer (7)
- Blog Post (1)
- Book (4)
- Conference Paper (13)
- Journal Article (45)
Between 1900 and 1999
Between 1960 and 1969
- 1960 (1)
Between 1970 and 1979
- 1975 (1)
- Between 1980 and 1989 (4)
- Between 1990 and 1999 (10)
- Between 1960 and 1969 (1)
- Between 2000 and 2023 (47)