Identifying Patients for Clinical Studies from Electronic Health Records: TREC 2012 Medical Records Track at OHSU

Abstract

The goal of the TREC 2012 Medical Records Track was to search medical record documents to identify patients as possible candidates for clinical studies based on diagnosis, age, and other attributes. For TREC 2012, the Oregon Health & Science University (OHSU) group experimented with both manual and automated techniques. We used a derivative of Lucene to build an interactive retrieval system that can process queries in one of two ways. Users can manually specify Boolean queries whose terms may include words as well as ICD-9 codes. Alternatively, the system features an automated query parser that transforms free-text queries into structured Boolean queries. The query parser is built on top of MetaMap and the UMLS Metathesaurus. We submitted both automatic runs (which relied solely on the automated query parser) as well as manual runs consisting of queries built by an expert clinician. Overall, our automated query parser performed below the mean of other groups, although there were individual topics for which it performed very well. This irregular performance was in part due to our parser's tendency to over-specify queries, leading to reduced recall. There were, however, several topics for which our parser performed very well, suggesting that our fundamental approach has merit. In contrast, our manual runs performed very well, scoring second-best among official manual runs. With further modification of the manual queries, we were able to achieve even better performance. Query of electronic health records for the use case of identifying patients as candidates for clinical studies still requires manual query development, at least until better automated methods can be developed that outperform them.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA581326

Entities

People

  • Aaron Cohen
  • Steven Bedrick
  • Tracy Edinger
  • William Hersh

Organizations

  • Oregon Health & Science University

Tags

DTIC Thesaurus Topics

  • Cardiovascular Physiological Phenomena
  • Chemotherapy
  • Colon Cancer
  • Depression
  • Dermatologic Agents
  • Drug Abuse
  • Health Services
  • Hematologic Diseases
  • Hypertension
  • Liver Diseases
  • Myocardial Ischemia
  • Pain

Readers

  • Computer Science.
  • Information Retrieval
  • Medical or Health Care Field.

Technology Areas

  • Microelectronics