Tri-Plots: Scalable Tools for Multidimensional Data Mining
Abstract
We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from? We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2001
- Accession Number
- ADA459873
Entities
People
- Agma Traina
- Caetano Traina
- Christos Faloutsos
- Spiros Papadimitriou
Organizations
- Carnegie Mellon University