Learning to Extract Gene-Protein Names from Weakly-Labeled Text

Abstract

Training a named entity recognizer (NER) has always been a difficult task due to the effort required to generate a significant amount of annotated training data. In this paper, we reduce or eliminate the effort required to create training data by automatically converting other sources of data into annotated training data. The performance of this approach is tested on a gene-protein name extractor by using the mouse and fly data obtained from the BioCreAtIvE challenge. Results show that our methods are effective and that our trained NER system outperforms all of our baseline results.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 2008
Accession Number: ADA531043

Entities

People

Anthony Tomasic
Isaac Simmons
Richard C. Wang
Robert E. Frederking
William W. Cohen

Organizations

Carnegie Mellon University

Learning to Extract Gene-Protein Names from Weakly-Labeled Text

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers