HowtogetaChineseName(Entity): Segmentation and Combination Issues

Abstract

When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, identify problems in segmentation and suggest possible solutions, presenting our observations, analysis, and experimental results. The second topic of this paper is classifier combination. We present and describe four classifiers for Chinese named entity recognition and describe various methods for combining their outputs. The results demonstrate that classifier combination is an effective technique of improving system performance: experiments over a large annotated corpus of fine-grained entity types exhibit a 10% relative reduction in F-measure error.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2003
Accession Number
ADA457910

Entities

People

  • Abraham Ittycheriah
  • Hongyan Jing
  • Radu Florian
  • Tong Zhang
  • Xiaoqiang Luo

Organizations

  • IBM Thomas J. Watson Research Center

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Abstracts
  • Applied Computer Science
  • Boundaries
  • Classification
  • Computer Vision
  • Errors
  • Information Retrieval
  • Language
  • Machine Learning
  • Named Entity Recognition
  • Natural Language Processing
  • Natural Languages
  • Personality
  • Probability
  • Probability Distributions
  • Recognition
  • Training

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computer Programming and Software Development.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval