Parameter-Free Spatial and Stream Mining
Abstract
Data mining is the extraction of knowledge from large amounts of data and brings together the fields of databases, machine learning and statistics. From a database perspective, the emphasis is often placed on scalability and efficiency. Recently, in addition to the data warehouse model where data from multiple sources are integrated into a large store, the streaming model is emerging as an alternative data processing paradigm. In this thesis we develop spatial and stream mining tools for discovery of interesting patterns. These patterns summarize the data, enable forecasting of future trends and spotting of anomalies or outliers. Beyond the emphasis on efficiency and scalability, we focus on simplifying or eliminating user intervention. Data mining algorithms must make the discovery task easy for average users. Eliminating the requirement for user intervention should be a top priority in designing data mining methods. We show that multi-resolution analysis (i.e., examining the data at multiple resolutions or scales) is a powerful tool towards these goals. In particular, for spatial data we employ the correlation integral. For time series streams we use the wavelet transform and related techniques. Furthermore, we leverage tools from signal processing (again wavelets and, also, subspace tracking algorithms) to extract patterns from streams. Finally, we also employ compression principles coupled with multi-level partitionings to automatically cluster spatial data.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2005
- Accession Number
- ADA457141
Entities
People
- Spiros Papadimitriou
Organizations
- Carnegie Mellon University