Full bibliography 63 resources
Yee, K.-P., Swearingen, K., Li, K., & Hearst, M. (2003). Faceted Metadata for Image Search and Browsing. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 401–408. https://doi.org/10.1145/642611.642681
There are currently two dominant interface types for searching and browsing large image collections: keyword-based search, and searching by overall similarity to sample images. We present an alternative based on enabling users to navigate along conceptual dimensions that describe the images. The interface makes use of hierarchical faceted metadata and dynamically generated query previews. A usability study, in which 32 art history students explored a collection of 35,000 fine arts images, compares this approach to a standard image search interface. Despite the unfamiliarity and power of the interface (attributes that often lead to rejection of new search interfaces), the study results show that 90% of the participants preferred the metadata approach overall, 97% said that it helped them learn more about the collection, 75% found it more flexible, and 72% found it easier to use than a standard baseline system. These results indicate that a category-based approach is a successful way to provide access to image collections.
Wilson, T. D. (1999). Models in information behaviour research. Journal of Documentation, 55(3), 249–270. https://doi.org/10.1108/EUM0000000007145
This paper presents an outline of models of information seeking and other aspects of information behaviour, showing the relationship between communication and information behaviour in general with information seeking and information searching in information retrieval systems. It is suggested that these models address issues at various levels of information behaviour and that they can be related by envisaging a ‘nesting’ of models. It is also suggested that, within both information seeking research and information searching research, alternative models address similar issues in related ways and that the models are complementary rather than conflicting. Finally, an alternative, problem-solving model is presented, which, it is suggested, provides a basis for relating the models in appropriate research strategies.
White, R. W., & Roth, R. A. (2009). Exploratory Search: Beyond the Query-Response Paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services, 1(1), 1–98. https://doi.org/10.2200/S00174ED1V01Y200901ICR003
As information becomes more ubiquitous and the demands that searchers have on search systems grow, there is a need to support search behaviors beyond simple lookup. Information seeking is the process or activity of attempting to obtain information in both human and technological contexts. Exploratory search describes an information-seeking problem context that is open-ended, persistent, and multifaceted, and information-seeking processes that are opportunistic, iterative, and multitactical. Exploratory searchers aim to solve complex problems and develop enhanced mental capacities. Exploratory search systems support this through symbiotic human-machine relationships that provide guidance in exploring unfamiliar information landscapes. Exploratory search has gained prominence in recent years. There is an increased interest from the information retrieval, information science, and human-computer interaction communities in moving beyond the traditional turn-taking interaction model supported by major Web search engines, and toward support for human intelligence amplification and information use. In this lecture, we introduce exploratory search, relate it to relevant extant research, outline the features of exploratory search systems, discuss the evaluation of these systems, and suggest some future directions for supporting exploratory search. Exploratory search is a new frontier in the search domain and is becoming increasingly important in shaping our future world.
Vieira, M. R., Razente, H. L., Barioni, M. C. N., Hadjieleftheriou, M., Srivastava, D., Traina, C., & Tsotras, V. J. (2011). On Query Result Diversification. Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, 1163–1174. https://doi.org/10.1109/ICDE.2011.5767846
In this paper we describe a general framework for evaluation and optimization of methods for diversifying query results. In these methods, an initial ranking candidate set produced by a query is used to construct a result set, where elements are ranked with respect to relevance and diversity features, i.e., the retrieved elements should be as relevant as possible to the query, and, at the same time, the result set should be as diverse as possible. While addressing relevance is relatively simple and has been heavily studied, diversity is a harder problem to solve. One major contribution of this paper is that, using the above framework, we adapt, implement and evaluate several existing methods for diversifying query results. We also propose two new approaches, namely the Greedy with Marginal Contribution (GMC) and the Greedy Randomized with Neighborhood Expansion (GNE) methods. Another major contribution of this paper is that we present the first thorough experimental evaluation of the various diversification techniques implemented in a common framework. We examine the methods' performance with respect to precision, running time and quality of the result. Our experimental results show that while the proposed methods have higher running times, they achieve precision very close to the optimal, while also providing the best result quality. While GMC is deterministic, the randomized approach (GNE) can achieve better result quality if the user is willing to tradeoff running time.
Vickery, B. (2008). Faceted Classification for the Web. Axiomathes, 18(2), 145–160. https://doi.org/10.1007/s10516-007-9025-9
The article describes the nature of a faceted classification, and its application in document retrieval. The kinds of facet used are illustrated. Procedures are then discussed for identifying facets in a subject field, populating the facets with individual subject terms, arranging these in helpful sequences, using the scheme to classify documents, and searching the resultant classified index, with particular reference to Internet search.
Tunkelang, D. (2009). Faceted search. Synthesis Lectures on Information Concepts, Retrieval, and Services, 1(1), 1–80. https://doi.org/10.2200/S00190ED1V01Y200904ICR005
We live in an information age that requires us, more than ever, to represent, access, and use information. Over the last several decades, we have developed a modern science and technology for information retrieval, relentlessly pursuing the vision of a "memex" that Vannevar Bush proposed in his seminal article, "As We May Think." Faceted search plays a key role in this program. Faceted search addresses weaknesses of conventional search approaches and has emerged as a foundation for interactive information retrieval. User studies demonstrate that faceted search provides more effective information-seeking support to users than best-first search. Indeed, faceted search has become increasingly prevalent in online information access systems, particularly for e-commerce and site search. In this lecture, we explore the history, theory, and practice of faceted search. Although we cannot hope to be exhaustive, our aim is to provide sufficient depth and breadth to offer a useful resource to both researchers and practitioners. Because faceted search is an area of interest to computer scientists, information scientists, interface designers, and usability researchers, we do not assume that the reader is a specialist in any of these fields. Rather, we offer a self-contained treatment of the topic, with an extensive bibliography for those who would like to pursue particular aspects in more depth.
Spink, A., Wolfram, D., Jansen, M. B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234. https://doi.org/10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.0.CO;2-R
In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms are used with high frequency, and a great many terms are unique; the language of Web queries is distinctive. Queries about recreation and entertainment rank highest. Findings are compared to data from two other large studies of Web queries. This study provides an insight into the public practices and choices in Web searching.
Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a Very Large Web Search Engine Query Log. SIGIR Forum, 33(1), 6–12. https://doi.org/10.1145/331403.331405
In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. We also present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques may not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such.
Saracevic, T., & Kantor, P. (1988). A study of information seeking and retrieving. II. Users, questions, and effectiveness. Journal of the American Society for Information Science, 39(3), 177–196. https://doi.org/10.1002/(SICI)1097-4571(198805)39:3<177::AID-ASI3>3.0.CO;2-F
The objectives of the study were to conduct a series of observations and experiments under as real-life a situation as possible related to: (1) user context of questions in information retrieval; (2) the structure and classification of questions; (3) cognitive traits and decision making of searchers; and (4) different searches of the same question. The study is presented in three parts: Part I presents the background of the study and describes the models, measures, methods, procedures and statistical analyses used. Part II is devoted to results related to users, questions and effectiveness measures, and Part III to results related to searchers, searches and overlap studies. A concluding summary of all results is presented in Part III. © 1988 John Wiley & Sons, Inc.
Saracevic, T., & Kantor, P. (1988). A study of information seeking and retrieving. III. Searchers, searches, and overlap. Journal of the American Society for Information Science, 39(3), 197–216. https://doi.org/10.1002/(SICI)1097-4571(198805)39:3<197::AID-ASI4>3.0.CO;2-A
The objectives of the study were to conduct a series of observations and experiments under as real-life situation as possible related to: (1) user context of questions in information retrieval; (2) the structure and classification of questions; (3) cognitive traits and decision making of searchers; and (4) diferent searches of the same question. The study is presented in three parts: Part I presents the background of the study and describes the models, measures, methods, procedures and statistical analyses used. Part II is devoted to results related to users, questions and effectiveness measures, and Part III to results related to searchers, searches and overlap studies. A concluding summary of all results is presented in Part III. © 1988 John Wiley & Sons, Inc.
Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. Journal of the American Society for Information Science and Technology, 58(13), 2126–2144. https://doi.org/10.1002/asi.20681
All is flux. —Plato on Knowledge in the Theaetetus (about 369 BC) Relevance is a, if not even the, key notion in information science in general and information retrieval in particular. This two-part critical review traces and synthesizes the scholarship on relevance over the past 30 years or so and provides an updated framework within which the still widely dissonant ideas and works about relevance might be interpreted and related. It is a continuation and update of a similar review that appeared in 1975 under the same title, considered here as being Part I. The present review is organized in two parts: Part II addresses the questions related to nature and manifestations of relevance, and Part III addresses questions related to relevance behavior and effects. In Part II, the nature of relevance is discussed in terms of meaning ascribed to relevance, theories used or proposed, and models that have been developed. The manifestations of relevance are classified as to several kinds of relevance that form an interdependent system of relevancies. In Part III, relevance behavior and effects are synthesized using experimental and observational works that incorporated data. In both parts, each section concludes with a summary that in effect provides an interpretation and synthesis of contemporary thinking on the topic treated or suggests hypotheses for future research. Analyses of some of the major trends that shape relevance work are offered in conclusions.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0
The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained in automatic term weighting, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared.
Salton, G., Wong, A., & Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. Commun. ACM, 18(11), 613–620. https://doi.org/10.1145/361219.361220
In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.
Rose, D. E., & Levinson, D. (2004). Understanding User Goals in Web Search. Proceedings of the 13th International Conference on World Wide Web, 13–19. https://doi.org/10.1145/988672.988675
Previous work on understanding user web search behavior has focused on how people search and what they are searching for, but not why they are searching. In this paper, we describe a framework for understanding the underlying goals of user searches, and our experience in using the framework to manually classify queries from a web search engine. Our analysis suggests that so-called navigational" searches are less prevalent than generally believed while a previously unexplored "resource-seeking" goal may account for a large fraction of web searches. We also illustrate how this knowledge of user search goals might be used to improve future web search engines.
Robertson, S., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends® in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019
The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.
Radlinski, F., Bennett, P. N., Carterette, B., & Joachims, T. (2009). Redundancy, diversity and interdependent document relevance. ACM SIGIR Forum, 43(2), 46–52. https://doi.org/10.1145/1670564.1670572
The goal of the Redundancy, Diversity, and Interdependent Document Relevance workshop was to explore how ranking, performance assessment and learning to rank can move beyond the assumption that the relevance of a document is independent of other documents. In particular, the workshop focussed on three themes: the effect of redundancy on information retrieval utility (for example, minimizing the wasted effort of users who must skip redundant information), the role of diversity (for example, for mitigating the risk of misinterpreting ambiguous queries), and algorithms for set-level optimization (where the quality of a set of retrieved documents is not simply the sum of its parts). This workshop built directly upon the Beyond Binary Relevance: Preferences, Diversity and Set-Level Judgments workshop at SIGIR 2008 , shifting focus to address the questions left open by the discussions and results from that workshop. As such, it was the first workshop to explicitly focus on the related research challenges of redundancy, diversity, and interdependent relevance – all of which require novel performance measures, learning methods, and evaluation techniques. The workshop program committee consisted of 15 researchers from academia and industry, with experience in IR evaluation, machine learning, and IR algorithmic design. Over 40 people attended the workshop. This report aims to summarize the workshop, and also to systematize common themes and key concepts so as to encourage research in the three workshop themes. It contains our attempt to summarize and organize the topics that came up in presentations as well as in discussions, pulling out common elements. Many audience members contributed, yet due to the free-flowing discussion, attributing all the observations to particular audience members is unfortunately impossible. Not all audience members would necessarily agree with the views presented, but we do attempt to present a consensus view as far as possible.
Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643–675. https://doi.org/10.1037/0033-295X.106.4.643
Information foraging theory is an approach to understanding how strategies and technologies for information seeking, gathering, and consumption are adapted to the flux of information in the environment. The theory assumes that people, when possible, will modify their strategies or the structure of the environment to maximize their rate of gaining valuable information. The theory is developed by (a) adaptation (rational) analysis of information foraging problems and (b) a detailed process model (adaptive control of thought in information foraging [ACT-IF]). The adaptation analysis develops (a) information patch models, which deal with time allocation and information filtering and enrichment activities in environments in which information is encountered in clusters; (b) information scent models, which address the identification of information value from proximal cues; and (c) information diet models, which address decisions about the selection and pursuit of information items. ACT-IF is instantiated as a production system model of people interacting with complex information technology.
Nasir Uddin, M., & Janecek, P. (2007). Performance and usability testing of multidimensional taxonomy in web site search and navigation. Performance Measurement and Metrics, 8(1), 18–33. https://doi.org/10.1108/14678040710748058
Purpose – Development of an effective search system and interface largely depends on usability studies. The aim of this paper is to present the results of an empirical evaluation of a prototype web site search and browsing tool based on multidimensional taxonomies derived from the use of faceted classification. Design/methodology/approach – A prototype Faceted Classification System (FCS), which classifies and organizes web documents under different facets (orthogonal sets of categories), was implemented on the domain of an academic institute. Facet are created from content oriented metadata, and then assembled into multiple taxonomies that describe alternative classifications of the web site content, such as by subject and location. The search and browsing interfaces use these taxonomies to enable users to access information in multiple ways. The paper compares the FCS interfaces to the existing single‐classification system to evaluate the usability of the facets in typical navigation and searching tasks. Findings – The findings suggest that performance and usability are significantly better with the FCS in the areas of efficient access, search success, flexibility, understanding of content, relevant search result, and satisfaction. These results are especially promising since unfamiliarity often leads users to reject new search interfaces. Originality/value – The results of the study in this paper can significantly contribute to interface research in the IR community, emphasizing the advantages of multidimensional taxonomies in online information collections.
Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48(9), 810–832. https://doi.org/10.1002/(SICI)1097-4571(199709)48:9<810::AID-ASI6>3.0.CO;2-U
Relevance is a fundamental, though not completely understood, concept for documentation, information science, and information retrieval. This article presents the history of relevance through an exhaustive review of the literature. Such history being very complex (about 160 papers are discussed), it is not simple to describe it in a comprehensible way. Thus, first of all a framework for establishing a common ground is defined, and then the history itself is illustrated via the presentation in chronological order of the papers on relevance. The history is divided into three periods (“Before 1958,” “1959–1976,” and “1977–present”) and, inside each period, the papers on relevance are analyzed under seven different aspects (methodological foundations, different kinds of relevance, beyond-topical criteria adopted by users, modes for expression of the relevance judgment, dynamic nature of relevance, types of document representation, and agreement among different judges). © 1997 John Wiley & Sons, Inc.
- Information behavior (16)
- Faceted search (11)
- Implicit feedback (5)
- Diversity (6)
- Relevance (8)
- Search log analysis (5)
- Facet analysis (11)
- Ontology (1)
Field of study
- Computer science (23)
- Information science (40)
- Algorithm (7)
- Conceptual model (24)
- Empirical study (20)
- Evaluation model (5)
- Literature review (6)
- Methodology (3)
- Primer (7)
- Blog Post (1)
- Book (4)
- Conference Paper (13)
- Journal Article (45)
Between 1900 and 1999
Between 1960 and 1969
- 1960 (1)
Between 1970 and 1979
- 1975 (1)
- Between 1980 and 1989 (4)
- Between 1990 and 1999 (10)
- Between 1960 and 1969 (1)
- Between 2000 and 2023 (47)