Question Answering on a Case Insensitive Corpus

Abstract

Most question answering (QA) systems rely on both keyword index and Named Entity (NE) tagging. The corpus from which the QA systems attempt to retrieve answers is usually mixed case text. However, there are numerous corpora that consist of case insensitive documents, e.g. speech recognition results. This paper presents a successful approach to QA on a case insensitive corpus, whereby a preprocessing module is designed to restore the case-sensitive form. The document pool with the restored case then feeds the QA system, which remains unchanged. The case restoration preprocessing is implemented as a Hidden Markov Model trained on a large raw corpus of case sensitive documents. It is demonstrated that this approach leads to very limited degradation in QA benchmarking (2.8%), mainly due to the limited degradation in the underlying information extraction support.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA457756

Entities

People

  • Cheng Niu
  • Rohini Srihari
  • Wei Li
  • Xiaoge Li

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Computer Languages
  • Hidden Markov Models
  • Identification
  • Language
  • Machine Learning
  • Markov Models
  • Models
  • Named Entity Recognition
  • Natural Language Processing
  • Natural Languages
  • Probability
  • Recognition
  • Standards
  • Supervised Machine Learning
  • Text Processing

Fields of Study

  • Engineering

Readers

  • Database Systems and Applications
  • Distributed Systems and Data Platform Development
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval