Populating the Semantic Web

Abstract

The vision of the Semantic Web is that a vast store of online information "meaningful to computers will unleash a revolution of new possibilities". Unfortunately, the vast majority of information on the Web is formatted to be easily read by human users, not computer applications. In order to make the vision of the Semantic Web a reality, tools for automatically annotating Web content with semantic labels will be required. We describe the ADEL system that automatically extracts records from Web sites and semantically labels the fields. The system exploits similarities in the layout of Web pages in order to learn the grammar that generated these pages. It then uses this grammar to extract structured records from these Web pages. ADEL system also exploits the fact that sites in the same domain will provide the same, or similar data. By collecting labeled examples of data during the training stage, we are able to learn structural descriptions of data fields and later use these descriptions to semantically label new data fields. We show that on a Used Car shopping domain, ADEL achieves precision of 64% and recall of 89% on extracting and labeling data columns.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 01, 2004
Accession Number: ADA457907

Entities

People

Cenk Gazen
Craig Knoblock
Kristina Lerman
Steven Minton

Organizations

University of Southern California

Populating the Semantic Web

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers