Lessons Learned from Indexing Close Word Pairs

Abstract

We describe experiments with proximity-aware ranking functions that use indexing of word pairs. Our goal is to evaluate a method of "mild" pruning of proximity information, which would be appropriate for a moderately loaded retrieval system, e.g., an enterprise search engine. We create an index that includes occurrences of close word pairs, where one of the words is frequent. This allows one to efficiently restore relative positional information for all non-stop words within a certain distance. It is also possible to answer phrase queries promptly. We use two functions to evaluate relevance: a modification of a classic proximity-aware function and a logistic function that includes a linear combination of relevance features. Additionally, we use the spam scores provided by the University of Waterloo.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2010
Accession Number
ADA547189

Entities

People

  • Anna Belova
  • Leonid Boytsov

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Computations
  • Computer Science
  • Dictionaries
  • Equations
  • Information Retrieval
  • Information Science
  • Knowledge Management
  • Lessons Learned
  • Monte Carlo Method
  • New York
  • Precision
  • Simplex Method
  • Standards
  • Statistical Analysis
  • Statistics

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Library and Information Science
  • Regression Analysis.