Literature Mining of Pathogenesis-Related Proteins in Human Pathogens for Database Annotation

Abstract

Biomedical literature represents the primary source of experimental data and biological knowledge. This project aims to develop a text mining system for pathogens of biodefense relevance, focusing on mining pathogen-host protein-protein interactions (PH-PPI). We developed a Support Vector Machine (SVM)-based system to identify abstracts containing PH-PPI information using an annotated corpus of 1360 MEDLINE abstracts as the training set. It achieved good performance on document classification with a precision of over 80 among top 50 ranked abstracts. The SVM-based method is further augmented with other text mining tools (such as PIE) for mining and tagging PPI information. As part of an effort in enabling text mining tools for real-world applications, we are developing a basic framework, iProLINK, to connect text mining tools with ontology and systems biology for the biomedical research community. The PH-PPI text mining system developed in the first year will be coupled with the iProXpress proteomic data analysis system into a "Pathogen Mining System" for the analysis of pathogen proteomics data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2008
Accession Number
AD1041380

Entities

People

  • Cathy H. Wu
  • Zhang-zhi Hu

Organizations

  • Georgetown University Medical Center

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Bayesian Networks
  • Computational Biology
  • Computational Science
  • Control Systems
  • Data Analysis
  • Data Curation
  • Data Mining
  • Information Science
  • Machine Learning
  • Network Science
  • Ontologies
  • Protein-Protein Interactions
  • Proteins
  • Supervised Machine Learning
  • Systems Biology
  • Text Mining
  • User Interface

Readers

  • Computational Linguistics
  • Data Mining and Knowledge Discovery.
  • Molecular Genetics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Biotechnology