Instance-Based Question Answering

Abstract

During recent years, question answering (QA) has grown from simple passage retrieval and information extraction to very complex approaches that incorporate deep question and document analysis, reasoning, planning, and sophisticated uses of knowledge resources. Most existing QA systems combine rule-based, knowledge-based and statistical components, and are highly optimized for a particular style of questions in a given language. Typical question answering approaches depend on specific ontologies, resources, processing tools, document sources, and very often rely on expert knowledge and rule-based components. Furthermore, such systems are very difficult to re-train and optimize for different domains and languages, requiring considerable time and human effort. We present a fully statistical, data-driven, instance-based approach to question answering (IBQA) that learns how to answer new questions from similar training questions and their known correct answers. We represent training questions as points in a multi-dimensional space and cluster them according to different granularity, scatter, and similarity metrics. From each individual cluster we automatically learn an answering strategy for finding answers to questions. When answering a new question that is covered by several clusters, multiple answering strategies are simultaneously employed. The resulting answer confidence combines elements such as each strategy's estimated probability of success, cluster similarity to the new question, cluster size, and cluster granularity. The IBQA approach obtains good performance on factoid and definitional questions, comparable to the performance of top systems participating in official question answering evaluations.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2006
Accession Number
ADA462538

Entities

People

  • Lucian V. Lita

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Biomedical
  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Data Mining
  • Feature Extraction
  • Hidden Markov Models
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Markov Models
  • Named Entity Recognition
  • Natural Language Processing
  • Ontologies
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Instructional Design and Training Evaluation.
  • Parallel and Distributed Computing.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Space