Preprocessing and Integration of Data from Multiple Sources for Knowledge Discovery
Abstract
The explosive growth in the generation and collection of data has generated an urgent need for a new generation of techniques and tools that can assist in transforming these data intelligently and automatically into useful knowledge. Knowledge discovery is an emerging multidisciplinary field that attempts to fulfill this need. Knowledge discovery is a large process that includes data selection, cleaning, preprocessing, integration, transformation and reduction, data mining, model selection, evaluation and interpretation, and finally consolidation and use of the extracted knowledge. This paper addresses the issues of data cleaning and integration for knowledge discovery by proposing a systematic approach for resolving semantic conflicts that are encountered during the integration of data from multiple sources. Illustrated with examples derived from military databases, the paper presents a heuristics based algorithm for identifying and resolving semantic conflicts at different levels of information granularity.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1999
- Accession Number
- ADA364414
Entities
People
- Magdi N. Kamel
- Marion G. Ceruti
Organizations
- Naval Information Warfare Systems Command