Ontology-Driven Information Extraction with OntoSyphon

Abstract

The Semantic Web's need for machine understandable content has led researchers to attempt to automatically acquire such content from a number of sources, including the web. To date, such research has focused on "document-driven" systems that individually process a small set of documents, annotating each with respect to a given ontology. This paper introduces OntoSyphon, an alternative that strives to more fully leverage existing ontological content while scaling to extract comparatively shallow content from millions of documents. OntoSyphon operates in an "ontology-driven" manner: taking any ontology as input, OntoSyphon uses the ontology to specify web searches that identify possible semantic instances, relations, and taxonomic information. Redundancy in the web, together with information from the ontology, is then used to automatically verify these candidate instances and relations, enabling OntoSyphon to operate in a fully automated, unsupervised manner. A prototype of OntoSyphon is fully implemented and we present experimental results that demonstrate substantial instance learning in a variety of domains based on independently constructed ontologies. We also introduce new methods for improving instance verification, and demonstrate that they improve upon previously known techniques.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2006
Accession Number
ADA480453

Entities

People

  • Luke K. Mcdowell
  • Michael Cafarella

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Animals
  • Computer Science
  • Data Sets
  • Extraction
  • Frequency
  • Language
  • Learning
  • Machine Learning
  • Models
  • Natural Languages
  • Ontologies
  • Precision
  • Probability
  • Redundancy
  • Test And Evaluation
  • Verification

Fields of Study

  • Computer science

Readers

  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Database Systems and Applications
  • Software Engineering.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks