A Domain Independent Framework for Extracting Linked Semantic Data from Tables

Abstract

Vast amounts of information is encoded in tables found in documents, on the Web, and in spreadsheets or databases. Integrating or searching over this information bene ts from understanding its intended meaning and making it explicit in a semantic representation language like RDF. Most current approaches to generating Semantic Web representations from tables requires human input to create schemas and often results in graphs that do not follow best practices for linked data. Evidence for a table's meaning can be found in its column headers, cell values, implicit relations between columns, caption and surrounding text but also requires general and domain-specific background knowledge. Approaches that work well for one domain, may not necessarily work well for others. We describe a domain independent framework for interpreting the intended meaning of tables and representing it as Linked Data. At the core of the framework are techniques grounded in graphical models and probabilistic reasoning to infer meaning associated with a table. Using background knowledge from resources in the Linked Open Data cloud we jointly infer the semantics of column headers, table cell values (e.g. strings and numbers) and relations between columns and represent the inferred meaning as graph of RDF triples. A table's meaning is thus captured by mapping columns to classes in an appropriate ontology, linking cell values to literal constants, implied measurements, or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2012
Accession Number
ADA563941

Entities

People

  • Anupam Joshi
  • Tim Finin
  • Varish Mulwad

Organizations

  • University of Maryland, Baltimore

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Best Practices
  • Computer Science
  • Databases
  • Information Science
  • Language
  • Machine Learning
  • Models
  • New York
  • Ontologies
  • Probability
  • Reasoning
  • Semantic Models
  • Semantics
  • Statistical Analysis
  • United States

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Database Systems and Applications
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval