PRIS at 2012 TREC Medical Track: Query Expansion, Retrieval and Ranking

Abstract

The official datasets are XML format so we have to parse them before indexing. We choose Lucene as our tool for indexing and searching, we select the Jakarta-commons-Digester (the following we referred to as digester) to parse the xml documents. The xml document is processed by the Digester to be a java object and then we can get the fields that we would use from the java object. In addition, we also process the tag "report_text" in the xml documents so that we can get the age and sexuality information which are very important fields for searching task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA581494

Entities

People

  • Jiayue Zhang
  • Jun Guo
  • Lin Lin
  • Runnan Liu
  • Shudang Diao
  • Weiran Xu
  • Yukun Li

Organizations

  • Beijing University of Posts and Telecommunications

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Cardiovascular Physiological Phenomena
  • Communication Systems
  • Diseases And Disorders
  • Governments
  • Hypertension
  • Information Operations
  • Information Retrieval
  • Language
  • Learning
  • Pain
  • Standards
  • Text Mining
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Information Retrieval