Tri-Plots: Scalable Tools for Multidimensional Data Mining

Abstract

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from? We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2001
Accession Number
ADA459873

Entities

People

  • Agma Traina
  • Caetano Traina
  • Christos Faloutsos
  • Spiros Papadimitriou

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Air Platforms
  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • California
  • Classification
  • Clustering
  • Computer Science
  • Computers
  • Data Mining
  • Data Science
  • Data Sets
  • Databases
  • Engineering
  • Grids
  • Information Science
  • Machine Learning
  • New York
  • Two Dimensional
  • United States

Fields of Study

  • Computer science

Readers

  • Educational Psychology
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML