An Interpretative Data Analysis of Chinese Named Entity Subtypes

Abstract

"In assessing the performance of information extraction systems, we are interested in knowing the classes of errors made and the circumstances in which they are made."[!] However, to date the Tipster scoring categories (correct, partial, incorrect, spurious, missing, and noncommitta[) have not been applied to classes of data based on structural distinctions in the language, or on semantic subclasses more finely differentiated than the NE types (person, location, organization, time, date, money, and percent). For example, there has been no attempt to score the extraction of transliterated foreign person names, or of short-form aliases of corporation names. or of Julian dates as opposed to Gregorian dates as opposed to dates of the Chinese lunar calendar.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1996
Accession Number
ADA631330

Entities

People

  • Thomas A. Keenan

Organizations

  • United States Department of Defense

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Corporations
  • Data Analysis
  • Department Of Defense
  • Error Analysis
  • Errors
  • Extraction
  • Frequency
  • Governments
  • Information Operations
  • Language
  • Residuals
  • Syllables

Readers

  • Computational Linguistics
  • Educational Psychology
  • Psychometric Testing or Psychological Assessment.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Space