Automatic Pattern Acquisition for Japanese Information Extraction

Abstract

One of the central issues for information extraction is the cost of customization from one scenario to another. Research on the automated acquisition of patterns is important for portability and scalability. In this paper, we introduce Tree-Based Pattern representation where a pattern is denoted as a path in the dependency tree of a sentence. We outline the procedure to acquire Tree-Based Patterns in Japanese from un-annotated text. The system extracts the relevant sentences from the training data based on TF/IDF scoring and the common paths in the parse tree of relevant sentences are taken as extracted patterns.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2001
Accession Number
ADA460210

Entities

People

  • Kiyoshi Sudo
  • Ralph David Grishman
  • Satoshi Sekine

Organizations

  • New York University

Tags

Communities of Interest

  • Air Platforms
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Abstracts
  • Accidents
  • Acquisition
  • Aircrafts
  • Airplanes
  • Automatic
  • Boundaries
  • Computer Science
  • Extraction
  • Information Retrieval
  • Language
  • Motor Vehicle Accidents
  • Natural Languages
  • New York
  • Precision
  • Test Sets
  • Training

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Software Engineering.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval