TY - JOUR TI - The Search Value Added by Professional Indexing to a Bibliographic Database AU - Hider, Philip T2 - Official Journal of the International Society for Knowledge Organization AB - Gross et al. (2015) have demonstrated that about a quarter of hits would typically be lost to keyword searchers if contemporary academic library catalogs dropped their controlled subject headings. This article re- ports on an investigation of the search value that subject descriptors and identifiers assigned by professional indexers add to a bibliographic database, namely the Australian Education Index (AEI). First, a similar methodology to that developed by Gross et al. (2015) was applied, with keyword searches representing a range of educational topics run on the AEI database with and without its subject indexing. The results indicated that AEI users would also lose, on average, about a quarter of hits per query. Second, an alternative research design was applied in which an experienced literature searcher was asked to find resources on a set of educational topics on an AEI database stripped of its subject indexing and then asked to search for additional resources on the same topics after the subject indexing had been reinserted. In this study, the proportion of additional resources that would have been lost had it not been for the subject indexing was again found to be about a quarter of the total resources found for each topic, on average. DA - 2018/// PY - 2018 VL - 45 IS - 1 SP - 23 EP - 32 LA - en SN - 0943-7444 ER - TY - JOUR TI - Influence of training and stage of search on gaze behavior in a library catalog faceted search interface AU - Kules, Bill AU - Capra, Robert T2 - Journal of the American Society for Information Science and Technology AB - This study examined how searchers interact with a web-based, faceted library catalog when conducting exploratory searches. It applied multiple methods, including eye tracking and stimulated recall interviews, to investigate important aspects of faceted search interface use, specifically: (a) searcher gaze behavior—what components of the interface searchers look at; (b) how gaze behavior differs when training is and is not provided; (c) how gaze behavior changes as searchers become familiar with the interface; and (d) how gaze behavior differs depending on the stage of the search process. The results confirm previous findings that facets account for approximately 10–30% of interface use. They show that providing a 60-second video demonstration increased searcher use of facets. However, searcher use of the facets did not evolve during the study session, which suggests that searchers may not, on their own, rapidly apply the faceted interfaces. The findings also suggest that searcher use of interface elements varied by the stage of their search during the session, with higher use of facets during decision-making stages. These findings will be of interest to librarians and interface designers who wish to maximize the value of faceted searching for patrons, as well as to researchers who study search behavior. DA - 2012/01/01/ PY - 2012 DO - 10.1002/asi.21647 DP - Wiley Online Library VL - 63 IS - 1 SP - 114 EP - 138 LA - en SN - 1532-2890 UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.21647 Y2 - 2018/08/03/17:38:37 ER - TY - JOUR TI - Learning to Rank for Information Retrieval AU - Liu, Tie-Yan T2 - Foundations and Trends® in Information Retrieval AB - Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank. DA - 2009/06/27/ PY - 2009 DO - 10.1561/1500000016 DP - www.nowpublishers.com VL - 3 IS - 3 SP - 225 EP - 331 J2 - INR LA - en SN - 1554-0669, 1554-0677 UR - https://www.nowpublishers.com/article/Details/INR-016 Y2 - 2019/01/18/20:05:20 ER - TY - CONF TI - Diversifying Search Results AU - Agrawal, Rakesh AU - Gollapudi, Sreenivas AU - Halverson, Alan AU - Ieong, Samuel T3 - WSDM '09 AB - We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. We demonstrate empirically that our algorithm scores higher in these generalized metrics compared to results produced by commercial search engines. C1 - New York, NY, USA C3 - Proceedings of the Second ACM International Conference on Web Search and Data Mining DA - 2009/// PY - 2009 DO - 10.1145/1498759.1498766 DP - ACM Digital Library SP - 5 EP - 14 LA - en PB - ACM SN - 978-1-60558-390-7 UR - http://doi.acm.org/10.1145/1498759.1498766 Y2 - 2019/01/27/21:41:12 ER - TY - CONF TI - An Axiomatic Approach for Result Diversification AU - Gollapudi, Sreenivas AU - Sharma, Aneesh T3 - WWW '09 AB - Understanding user intent is key to designing an effective ranking system in a search engine. In the absence of any explicit knowledge of user intent, search engines want to diversify results to improve user satisfaction. In such a setting, the probability ranking principle-based approach of presenting the most relevant results on top can be sub-optimal, and hence the search engine would like to trade-off relevance for diversity in the results. In analogy to prior work on ranking and clustering systems, we use the axiomatic approach to characterize and design diversification systems. We develop a set of natural axioms that a diversification system is expected to satisfy, and show that no diversification function can satisfy all the axioms simultaneously. We illustrate the use of the axiomatic framework by providing three example diversification objectives that satisfy different subsets of the axioms. We also uncover a rich link to the facility dispersion problem that results in algorithms for a number of diversification objectives. Finally, we propose an evaluation methodology to characterize the objectives and the underlying axioms. We conduct a large scale evaluation of our objectives based on two data sets: a data set derived from the Wikipedia disambiguation pages and a product database. C1 - New York, NY, USA C3 - Proceedings of the 18th International Conference on World Wide Web DA - 2009/// PY - 2009 DO - 10.1145/1526709.1526761 DP - ACM Digital Library SP - 381 EP - 390 LA - en PB - ACM SN - 978-1-60558-487-4 UR - http://doi.acm.org/10.1145/1526709.1526761 Y2 - 2019/01/27/22:06:28 ER - TY - CONF TI - What Do Exploratory Searchers Look at in a Faceted Search Interface? AU - Kules, Bill AU - Capra, Robert AU - Banta, Matthew AU - Sierra, Tito T3 - JCDL '09 AB - This study examined how searchers interacted with a web-based, faceted library catalog when conducting exploratory searches. It applied eye tracking, stimulated recall interviews, and direct observation to investigate important aspects of gaze behavior in a faceted search interface: what components of the interface searchers looked at, for how long, and in what order. It yielded empirical data that will be useful for both practitioners (e.g., for improving search interface designs), and researchers (e.g., to inform models of search behavior). Results of the study show that participants spent about 50 seconds per task looking at (fixating on) the results, about 25 seconds looking at the facets, and only about 6 seconds looking at the query itself. These findings suggest that facets played an important role in the exploratory search process. C1 - New York, NY, USA C3 - Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries DA - 2009/// PY - 2009 DO - 10.1145/1555400.1555452 DP - ACM Digital Library SP - 313 EP - 322 LA - en PB - ACM SN - 978-1-60558-322-8 UR - http://doi.acm.org/10.1145/1555400.1555452 Y2 - 2018/08/07/18:20:12 ER - TY - JOUR TI - Determining the informational, navigational, and transactional intent of Web queries AU - Jansen, Bernard J. AU - Booth, Danielle L. AU - Spink, Amanda T2 - Information Processing & Management AB - In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching. DA - 2008/05/01/ PY - 2008 DO - 10.1016/j.ipm.2007.07.015 DP - ScienceDirect VL - 44 IS - 3 SP - 1251 EP - 1266 J2 - Information Processing & Management LA - en SN - 0306-4573 UR - http://www.sciencedirect.com/science/article/pii/S030645730700163X Y2 - 2018/03/28/23:33:46 ER - TY - JOUR TI - Performance and usability testing of multidimensional taxonomy in web site search and navigation AU - Nasir Uddin, Mohammad AU - Janecek, Paul T2 - Performance Measurement and Metrics AB - Purpose – Development of an effective search system and interface largely depends on usability studies. The aim of this paper is to present the results of an empirical evaluation of a prototype web site search and browsing tool based on multidimensional taxonomies derived from the use of faceted classification. Design/methodology/approach – A prototype Faceted Classification System (FCS), which classifies and organizes web documents under different facets (orthogonal sets of categories), was implemented on the domain of an academic institute. Facet are created from content oriented metadata, and then assembled into multiple taxonomies that describe alternative classifications of the web site content, such as by subject and location. The search and browsing interfaces use these taxonomies to enable users to access information in multiple ways. The paper compares the FCS interfaces to the existing single‐classification system to evaluate the usability of the facets in typical navigation and searching tasks. Findings – The findings suggest that performance and usability are significantly better with the FCS in the areas of efficient access, search success, flexibility, understanding of content, relevant search result, and satisfaction. These results are especially promising since unfamiliarity often leads users to reject new search interfaces. Originality/value – The results of the study in this paper can significantly contribute to interface research in the IR community, emphasizing the advantages of multidimensional taxonomies in online information collections. DA - 2007/03/27/ PY - 2007 DO - 10.1108/14678040710748058 DP - emeraldinsight.com (Atypon) VL - 8 IS - 1 SP - 18 EP - 33 J2 - Performance Measurement Metric LA - en SN - 1467-8047 UR - https://www.emeraldinsight.com/doi/full/10.1108/14678040710748058 Y2 - 2018/08/03/17:47:56 ER - TY - CONF TI - Improving Web Search Ranking by Incorporating User Behavior Information AU - Agichtein, Eugene AU - Brill, Eric AU - Dumais, Susan T3 - SIGIR '06 AB - We show that incorporating user behavior data can significantly improve ordering of top results in real web search setting. We examine alternatives for incorporating feedback into the ranking process and explore the contributions of user feedback compared to other common web search features. We report results of a large scale evaluation over 3,000 queries and 12 million user interactions with a popular web search engine. We show that incorporating implicit feedback can augment other features, improving the accuracy of a competitive web search ranking algorithms by as much as 31% relative to the original performance. C1 - New York, NY, USA C3 - Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval DA - 2006/// PY - 2006 DO - 10.1145/1148170.1148177 DP - ACM Digital Library SP - 19 EP - 26 LA - en PB - ACM SN - 978-1-59593-369-0 UR - http://doi.acm.org/10.1145/1148170.1148177 Y2 - 2019/01/18/18:14:12 ER - TY - JOUR TI - Evaluating Implicit Measures to Improve Web Search AU - Fox, Steve AU - Karnawat, Kuldeep AU - Mydland, Mark AU - Dumais, Susan AU - White, Thomas T2 - ACM Trans. Inf. Syst. AB - Of growing interest in the area of improving the search experience is the collection of implicit user behavior measures (implicit measures) as indications of user interest and user satisfaction. Rather than having to submit explicit user feedback, which can be costly in time and resources and alter the pattern of use within the search experience, some research has explored the collection of implicit measures as an efficient and useful alternative to collecting explicit measure of interest from users.This research article describes a recent study with two main objectives. The first was to test whether there is an association between explicit ratings of user satisfaction and implicit measures of user interest. The second was to understand what implicit measures were most strongly associated with user satisfaction. The domain of interest was Web search. We developed an instrumented browser to collect a variety of measures of user activity and also to ask for explicit judgments of the relevance of individual pages visited and entire search sessions. The data was collected in a workplace setting to improve the generalizability of the results.Results were analyzed using traditional methods (e.g., Bayesian modeling and decision trees) as well as a new usage behavior pattern analysis (“gene analysis”). We found that there was an association between implicit measures of user activity and the user's explicit satisfaction ratings. The best models for individual pages combined clickthrough, time spent on the search result page, and how a user exited a result or ended a search session (exit type/end action). Behavioral patterns (through the gene analysis) can also be used to predict user satisfaction for search sessions. DA - 2005/04// PY - 2005 DO - 10.1145/1059981.1059982 DP - ACM Digital Library VL - 23 IS - 2 SP - 147 EP - 168 LA - en SN - 1046-8188 UR - http://doi.acm.org/10.1145/1059981.1059982 Y2 - 2019/01/18/19:48:10 ER - TY - CONF TI - Accurately Interpreting Clickthrough Data As Implicit Feedback AU - Joachims, Thorsten AU - Granka, Laura AU - Pan, Bing AU - Hembrooke, Helene AU - Gay, Geri T2 - SIGIR'05 AB - This paper examines the reliability of implicit feedback generated from clickthrough data in WWW search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average. C3 - Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005 DA - 2005/// PY - 2005 DP - ACM Digital Library SP - 154 EP - 161 LA - en Y2 - 2019/01/18/20:45:44 ER - TY - CONF TI - Faceted Metadata for Image Search and Browsing AU - Yee, Ka-Ping AU - Swearingen, Kirsten AU - Li, Kevin AU - Hearst, Marti T3 - CHI '03 AB - There are currently two dominant interface types for searching and browsing large image collections: keyword-based search, and searching by overall similarity to sample images. We present an alternative based on enabling users to navigate along conceptual dimensions that describe the images. The interface makes use of hierarchical faceted metadata and dynamically generated query previews. A usability study, in which 32 art history students explored a collection of 35,000 fine arts images, compares this approach to a standard image search interface. Despite the unfamiliarity and power of the interface (attributes that often lead to rejection of new search interfaces), the study results show that 90% of the participants preferred the metadata approach overall, 97% said that it helped them learn more about the collection, 75% found it more flexible, and 72% found it easier to use than a standard baseline system. These results indicate that a category-based approach is a successful way to provide access to image collections. C1 - New York, NY, USA C3 - Proceedings of the SIGCHI Conference on Human Factors in Computing Systems DA - 2003/// PY - 2003 DO - 10.1145/642611.642681 DP - ACM Digital Library SP - 401 EP - 408 LA - en PB - ACM SN - 978-1-58113-630-2 UR - http://doi.acm.org/10.1145/642611.642681 Y2 - 2018/08/09/19:17:02 ER - TY - JOUR TI - Finding the flow in web site search AU - Hearst, Marti AU - Elliott, Ame AU - English, Jennifer AU - Sinha, Rashmi AU - Swearingen, Kirsten AU - Yee, Ka-Ping T2 - Communications of the ACM AB - Designing a search system and interface may best be served (and executed) by scrutinizing usability studies. DA - 2002/09// PY - 2002 DO - 10.1145/567498.567525 DP - ACM Digital Library VL - 45 IS - 9 SP - 42 EP - 49 J2 - Communications of the ACM LA - en SN - 0001-0782 UR - http://doi.acm.org/10.1145/567498.567525 ER - TY - CONF TI - Optimizing search engines using clickthrough data AU - Joachims, Thorsten T2 - KDD '02 AB - This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples. C1 - Edmonton, Alberta, Canada C3 - Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining DA - 2002/07/23/ PY - 2002 DO - 10.1145/775047.775067 DP - dl.acm.org SP - 133 EP - 142 LA - en PB - ACM SN - 978-1-58113-567-1 UR - http://dl.acm.org/citation.cfm?id=775047.775067 Y2 - 2019/01/18/20:54:23 ER - TY - CONF TI - Hierarchical Faceted Metadata in Site Search Interfaces AU - English, Jennifer AU - Hearst, Marti AU - Sinha, Rashmi AU - Swearingen, Kirsten AU - Yee, Ka-Ping T3 - CHI EA '02 AB - One of the most pressing usability issues in the design of large web sites is that of the organization of search results. A previous study on a moderate-sized web site indicated that users understood and preferred dynamically organized faceted metadata over standard search. We are now examining how to scale this approach to very large collections, since it is difficult to present hierarchical faceted metadata in a manner appealing and understandable to general users. We have iteratively designed and tested interfaces that address these design challenges; the most recent version is receiving enthusiastic responses in ongoing usability studies. C1 - New York, NY, USA C3 - CHI '02 Extended Abstracts on Human Factors in Computing Systems DA - 2002/// PY - 2002 DO - 10.1145/506443.506517 DP - ACM Digital Library SP - 628 EP - 639 LA - en PB - ACM SN - 978-1-58113-454-4 UR - http://doi.acm.org/10.1145/506443.506517 Y2 - 2018/07/06/01:46:11 ER - TY - JOUR TI - Searching the web: The public and their queries AU - Spink, Amanda AU - Wolfram, Dietmar AU - Jansen, Major B. J. AU - Saracevic, Tefko T2 - Journal of the American Society for Information Science and Technology AB - In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms are used with high frequency, and a great many terms are unique; the language of Web queries is distinctive. Queries about recreation and entertainment rank highest. Findings are compared to data from two other large studies of Web queries. This study provides an insight into the public practices and choices in Web searching. DA - 2001/// PY - 2001 DO - 10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.0.CO;2-R DP - Wiley Online Library VL - 52 IS - 3 SP - 226 EP - 234 LA - en SN - 1532-2890 ST - Searching the web UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/1097-4571%282000%299999%3A9999%3C%3A%3AAID-ASI1591%3E3.0.CO%3B2-R Y2 - 2019/01/21/23:57:39 ER - TY - JOUR TI - Real life, real users, and real needs: a study and analysis of user queries on the web AU - Jansen, Bernard J. AU - Spink, Amanda AU - Saracevic, Tefko T2 - Information Processing & Management AB - We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions — changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries — the number of search terms, and the use of logic and modifiers; and (iii) terms — their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure analysis, identifying trends among user mistakes. We conclude with a summary of findings and a discussion of the implications of these findings. DA - 2000/03/01/ PY - 2000 DO - 10.1016/S0306-4573(99)00056-4 DP - ScienceDirect VL - 36 IS - 2 SP - 207 EP - 227 J2 - Information Processing & Management LA - en SN - 0306-4573 ST - Real life, real users, and real needs UR - http://www.sciencedirect.com/science/article/pii/S0306457399000564 Y2 - 2019/01/27/22:52:32 ER -