Effective Structured Query Formulation for Session Search

Abstract

In this work, we emphasize on formulating effective structured queries for session search. For a given query, phrase-like text nuggets are identified and formulated into Lemur queries to feed into the Lemur search engine. Nuggets are substrings in qn, similar to phrases but not necessarily as semantically coherent as phrases. We assume that a valid nugget appears frequently in top returned snippets for qn. In this work the longest sequences of words consisting of frequent bigrams within the top returned snippets are identified as nuggets and are used to formulate a new query. By formulating structured query using the nuggets, we greatly boost the search accuracy than just using qn. We experiment both strict and relaxed forms of structured query formulation. The strict form of query formulation achieves an improvement of 13.5% and the relaxed form achieves an improvement of 17.8% on nDCG at 10 on TREC 2011 query sets. We further combine the nuggets generated from all queries q1, ..., qn-1, qn, to formulate one structured session query for the entire session. Nuggets from each query are weighed by various weighting schemes to indicate their relations to the current query and their potential contributions to the retrieval performance. We experiment three weighting schemes, uniform (all queries share the same weight), previous vs. current (previous queries q1, ..., qn-1 share the same weight while qn uses a different and higher weight), and distance-based (the weights are distributed based on how far a query;s position in the session is from the current query). We find that previous vs. current achieves the best search accuracy. For retrieval, we first retrieve a large pool of documents for qn. We then employ a re-ranking model that considers document similarity between clicked documents and documents in the pool as well as dwell time.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA581235

Entities

People

  • Dongyi Guan
  • Hui Yang
  • Nazli Goharian

Organizations

  • Georgetown University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accuracy
  • Computer Science
  • Computers
  • Dwell Time
  • Frequency
  • Information Operations
  • Language
  • Leg Injuries
  • Paralysis
  • Real Estate
  • Sequences
  • Spinal Cord
  • Spinal Injuries
  • Standards
  • Test And Evaluation
  • Word Lists

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Information Retrieval
  • Theoretical Analysis.