Customized Information Extraction as a Basis for Resource Discovery

Abstract

Indexing file contents is a powerful means of helping users locate documents, software, and other types of data among large repositories. In environments that contain many different types of data, content indexing requires type-specific processing to extract information effectively. In this paper we present a model for type- specific, user-customizable information extraction, and a system implementation called Essence. This software structure allows users to associate specialized extraction methods with ordinary flies, providing the illusion of an object-oriented file system that encapsulates specialized indexing methods within files. By exploiting semantics of common file types, Essence generates compact yet representative file summaries that can be used to improve both browsing and indexing in resource discovery systems. Essence can extract information from most of the types of files found in common file systems, including files with nested structure (such as compressed "tar" files). Essence interoperates with the Wide Area Information Servers (WAIS) system, allowing WAIS users to take advantage of the Essence information extraction methods.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1994
Accession Number
ADA454616

Entities

People

  • Darren R. Hardy
  • Michael F. Schwartz

Organizations

  • University of Colorado Boulder

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Availability
  • Buildings And Structures
  • Classification
  • Colorado
  • Computer Science
  • Computers
  • Computing Devices
  • Contracts
  • Environment
  • Extraction
  • Information Operations
  • Instructions
  • Monitoring
  • Neurobehavioral Manifestations
  • Security
  • Semantics

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Database Systems and Applications
  • Geospatial Intelligence and Artificial Intelligence Analytics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval