Search
Full bibliography 63 resources
-
This paper reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called “Probabilistic Indexing,” allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the “relevance number”) for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing (“see” and “see also”) is based solely on the “semantical closeness” between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can elaborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggests an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user.
-
L'usage du terme facette est bien intégré au vocabulaire de la science de l'information, mais les acceptions du mot sont si variables selon les auteurs que la perception de son contenu en devient problématique. L'A. montre ici que ces difficultés remontent au fondateur de la théorie des facettes, Ranganathan, qui a malencontreusement choisi un terme métaphorique du vocabulaire courant déjà chargé de sens, et dont la théorie des facettes est toujours restée ambiguë. L'auteur de cette étude en montre les inconsistances à partir du modèle linguistique des deux axes du langage, puis repère les grandes étapes de l'évolution qui a conduit les partisans des facettes du schéma analytico-synthétique de Ranganathan au schéma analytique du thésaurus à facettes. II plaide enfin pour un usage plus rigoureux du terme et de l'outil, qui fasse clairement le partage entre la classification des concepts et le classement des sujets.
-
Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.
-
This study examined how searchers interacted with a web-based, faceted library catalog when conducting exploratory searches. It applied eye tracking, stimulated recall interviews, and direct observation to investigate important aspects of gaze behavior in a faceted search interface: what components of the interface searchers looked at, for how long, and in what order. It yielded empirical data that will be useful for both practitioners (e.g., for improving search interface designs), and researchers (e.g., to inform models of search behavior). Results of the study show that participants spent about 50 seconds per task looking at (fixating on) the results, about 25 seconds looking at the facets, and only about 6 seconds looking at the query itself. These findings suggest that facets played an important role in the exploratory search process.
-
This study examined how searchers interact with a web-based, faceted library catalog when conducting exploratory searches. It applied multiple methods, including eye tracking and stimulated recall interviews, to investigate important aspects of faceted search interface use, specifically: (a) searcher gaze behavior—what components of the interface searchers look at; (b) how gaze behavior differs when training is and is not provided; (c) how gaze behavior changes as searchers become familiar with the interface; and (d) how gaze behavior differs depending on the stage of the search process. The results confirm previous findings that facets account for approximately 10–30% of interface use. They show that providing a 60-second video demonstration increased searcher use of facets. However, searcher use of the facets did not evolve during the study session, which suggests that searchers may not, on their own, rapidly apply the faceted interfaces. The findings also suggest that searcher use of interface elements varied by the stage of their search during the session, with higher use of facets during decision-making stages. These findings will be of interest to librarians and interface designers who wish to maximize the value of faceted searching for patrons, as well as to researchers who study search behavior.
-
Introduction. This paper examines the continued usefulness of Kuhlthau's Information Search Process as a model of information behaviour in new, technologically rich information environments. Method. A comprehensive review of research that has explored the model in various settings and a study employing qualitative and quantitative methods undertaken in the context of an inquiry project among school students (n=574). Students were interviewed at three stages of the information search process, during which nine feelings were identified and tracked. Results. Findings show individual patterns, but confirm the Information Search Process as a valid model in the changing information environment for describing information behaviour in tasks that require knowledge construction. The findings support the progression of feelings, thoughts and actions as suggested by the search process model. Conclusions. The information search process model remains useful for explaining students' information behaviour. The model was found to have value as a research tool as well as for practical application.
-
With the increasing number and diversity of search tools available, interest in the evaluation of search systems, particularly from a user perspective, has grown among researchers. More researchers are designing and evaluating interactive information retrieval (IIR) systems and beginning to innovate in evaluation methods. Maturation of a research specialty relies on the ability to replicate research, provide standards for measurement and analysis, and understand past endeavors. This article presents a historical overview of 40 years of IIR evaluation studies using the method of systematic review. A total of 2,791 journal and conference units were manually examined and 127 articles were selected for analysis in this study, based on predefined inclusion and exclusion criteria. These articles were systematically coded using features such as author, publication date, sources and references, and properties of the research method used in the articles, such as number of subjects, tasks, corpora, and measures. Results include data describing the growth of IIR studies over time, the most frequently occurring and cited authors and sources, and the most common types of corpora and measures used. An additional product of this research is a bibliography of IIR evaluation research that can be used by students, teachers, and those new to the area. To the authors' knowledge, this is the first historical, systematic characterization of the IIR evaluation literature, including the documentation of methods and measures used by researchers in this specialty.
-
This paper provides overview and instruction regarding the evaluation of interactive information retrieval systems with users. The primary goal of this article is to catalog and compile material related to this topic into a single source. This article (1) provides historical background on the development of user-centered approaches to the evaluation of interactive information retrieval systems; (2) describes the major components of interactive information retrieval system evaluation; (3) describes different experimental designs and sampling strategies; (4) presents core instruments and data collection techniques and measures; (5) explains basic data analysis techniques; and (4) reviews and discusses previous studies. This article also discusses validity and reliability issues with respect to both measures and methods, presents background information on research ethics and discusses some ethical issues which are specific to studies of interactive information retrieval (IIR). Finally, this article concludes with a discussion of outstanding challenges and future research directions.
-
This paper examines the reliability of implicit feedback generated from clickthrough data in WWW search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average.
-
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
-
We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions — changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries — the number of search terms, and the use of logic and modifiers; and (iii) terms — their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure analysis, identifying trends among user mistakes. We conclude with a summary of findings and a discussion of the implications of these findings.
-
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.
-
The use of data stored in transaction logs of Web search engines, Intranets, and Web sites can provide valuable insight into understanding the information-searching process of online searchers. This understanding can enlighten information system design, interface development, and devising the information architecture for content collections. This article presents a review and foundation for conducting Web search transaction log analysis. A methodology is outlined consisting of three stages, which are collection, preparation, and analysis. The three stages of the methodology are presented in detail with discussions of goals, metrics, and processes at each stage. Critical terms in transaction log analysis for Web searching are defined. The strengths and limitations of transaction log analysis as a research method are presented. An application to log client-side interactions that supplements transaction logs is reported on, and the application is made available for use by the research community. Suggestions are provided on ways to leverage the strengths of, while addressing the limitations of, transaction log analysis for Web-searching research. Finally, a complete flat text transaction log from a commercial search engine is available as supplementary material with this manuscript.
-
The facet-analytic paradigm is probably the most distinct approach to knowledge organization within Library and Information Science, and in many ways it has dominated what has be termed “modern classification theory”. It was mainly developed by S.R. Ranganathan and the British Classification Research Group, but it is mostly based on principles of logical division developed more than two millennia ago. Colon Classification (CC) and Bliss 2 (BC2) are among the most important systems developed on this theoretical basis, but it has also influenced the development of other systems, such as the Dewey Decimal Classification (DDC) and is also applied in many websites. It still has a strong position in the field and it is the most explicit and “pure” theoretical approach to knowledge organization (KO) (but it is not by implication necessarily also the most important one). The strength of this approach is its logical principles and the way it provides structures in knowledge organization systems (KOS). The main weaknesses are (1) its lack of empirical basis and (2) its speculative ordering of knowledge without basis in the development or influence of theories and socio-historical studies. It seems to be based on the problematic assumption that relations between concepts are a priori and not established by the development of models, theories and laws.
Explore
Topic
- Information behavior (16)
-
Information retrieval
(47)
- Faceted search (11)
- Implicit feedback (5)
-
Ranking
(8)
- Diversity (6)
- Relevance (8)
- Search log analysis (5)
-
Knowledge organization
(13)
- Facet analysis (11)
- Ontology (1)
Field of study
- Computer science (23)
- Information science (40)
Contribution
- Algorithm (7)
- Conceptual model (24)
- Empirical study (20)
- Evaluation model (5)
- Literature review (6)
- Methodology (3)
- Primer (7)
Resource type
- Blog Post (1)
- Book (4)
- Conference Paper (13)
- Journal Article (45)
Publication year
-
Between 1900 and 1999
(16)
-
Between 1960 and 1969
(1)
- 1960 (1)
-
Between 1970 and 1979
(1)
- 1975 (1)
- Between 1980 and 1989 (4)
- Between 1990 and 1999 (10)
-
Between 1960 and 1969
(1)
- Between 2000 and 2024 (47)