UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting

Abstract

For the TREC 2014 Clinical Decision Support track, participants were given a set of 30 patient cases in the form of a short natural language description and a data set of over 700,000 full-text articles from PubMed Central. The task was to retrieve articles relevant to the patient cases and one of three types of clinical question: diagnosis, test, and treatment. This paper describes the retrieval system developed by the Medical Imaging Informatics group at the University of California, Los Angeles. One manual run and four automatic runs were submitted. For the automatic runs, a variety of retrieval strategies were explored. Two retrieval methods were compared: the vector space model with TF-IDF similarity, and a unigram language model with Jelinek-Mercer smoothing. The performance of retrieving on abstracts alone was compared to that of full-text. Finally, a simple set of rules for query expansion and term boosting was developed based on recommendations from domain experts. Submissions for 26 groups were pooled and evaluated by a team of medical librarians and physicians at the National Institute of Standards and Technology. The results showed that 1) the language model outperformed the vector space model for automatically-constructed queries, 2) searching full-text was more effective than searching abstracts alone, and 3) boosting improved the ranking of retrieved documents for "test" topics, but not "diagnosis" topics. Our best automatic run used the language model, full-text search, query expansion, and no boosting.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2014
Accession Number
ADA618620

Entities

People

  • Frank Meng
  • Jean I. Garcia-gathright
  • William Hsu

Organizations

  • University of California, Los Angeles

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Arteries
  • Automatic
  • Blood Cells
  • California
  • Catheterization
  • Cell Count
  • Diagnostic Imaging
  • Laboratory Procedures
  • Language
  • Leukocytes
  • Pain
  • Relational Databases
  • Standards
  • Universities
  • Vector Spaces

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Information Retrieval
  • Medical or Health Care Field.

Technology Areas

  • Space