Exploring Evidence Aggregation Methods and External Expansion Sources for Medical Record Search

Abstract

This paper describes and analyzes experiments we performed for the Medical Records track in the 2012 Text REtrieval Conference (TREC). We mainly investigated three research problems: 1. Evidence Aggregation: In last year's track there were two different methods in general for obtaining a visit ranking out of reports (smaller document units), i.e., (A) using reports as indexing and retrieval units and then converting a report ranking into a visit ranking, and (B) using visits as indexing and retrieval units by concatenating reports at the very first stage and then obtain a visit ranking directly. Method A avoids the potential problem of varying visit document length, while Method B naturally aggregates evidence scatter over multiple reports and easily obtains a visit ranking. It is unclear which method is better based on all reported results. Thus, we compared the two approaches, tried various score aggregation methods for (A), and combined both approaches in a way that further improved the system performance. 2. Expansion Sources: We tested a variety of external collections (ranging from general web datasets to domain-specific thesauri, and from Megabyte datasets to Terabyte datasets) for query expansion, compared their effectiveness, and obtained useful insights into the data. 3. Retrieval Models: We tested several statistical IR models (proven to be effective on news and web collections) on this medical collection, and explored ways to combine these models to address different aspects of task. For instance, we used MRF model to model term proximity since most medical concepts are phrases. We also used a mixture of relevance models to obtain various relevant expansion terms covered by different expansion collections respectively, which is expect to significantly alleviate the vocabulary mismatch between medical terminologies.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA581308

Entities

People

  • Ben Carterette
  • Dongqing Zhu

Organizations

  • University of Delaware

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Base Lines
  • Classification
  • Computers
  • Data Science
  • Equations
  • Frequency
  • Genomics
  • Hearing
  • Hearing Loss
  • Information Operations
  • Information Science
  • Language
  • Standards
  • Test And Evaluation
  • Test Sets
  • Vocabulary

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Information Retrieval