TY - JOUR TI - On Relevance, Probabilistic Indexing and Information Retrieval AU - Maron, M. E. AU - Kuhns, J. L. T2 - Journal of the ACM AB - This paper reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called “Probabilistic Indexing,” allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the “relevance number”) for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing (“see” and “see also”) is based solely on the “semantical closeness” between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can elaborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggests an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user. DA - 1960/07// PY - 1960 DO - 10.1145/321033.321035 DP - ACM Digital Library VL - 7 IS - 3 SP - 216 EP - 244 LA - en SN - 0004-5411 UR - http://doi.acm.org/10.1145/321033.321035 Y2 - 2019/01/27/23:02:51 ER - TY - JOUR TI - A Vector Space Model for Automatic Indexing AU - Salton, G. AU - Wong, A. AU - Yang, C. S. T2 - Commun. ACM AB - In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model. DA - 1975/11// PY - 1975 DO - 10.1145/361219.361220 DP - ACM Digital Library VL - 18 IS - 11 SP - 613 EP - 620 LA - en SN - 0001-0782 UR - http://doi.acm.org/10.1145/361219.361220 Y2 - 2017/11/08/22:43:01 ER - TY - JOUR TI - Term-weighting approaches in automatic text retrieval AU - Salton, Gerard AU - Buckley, Christopher T2 - Information Processing & Management AB - The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained in automatic term weighting, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared. DA - 1988/01/01/ PY - 1988 DO - 10.1016/0306-4573(88)90021-0 DP - ScienceDirect VL - 24 IS - 5 SP - 513 EP - 523 J2 - Information Processing & Management LA - en SN - 0306-4573 UR - http://www.sciencedirect.com/science/article/pii/0306457388900210 Y2 - 2016/10/15/20:58:32 ER -