Visualizing Mixed Variable-Type Multidimensional Data Using Tree Distances

Abstract

This research explores the use of the tree distances of Buttrey and Whitaker to visualize multidimensional data of mixed-variable types, having both numerical and categorical data. Tree distances measure dissimilarities among observations in a data set while exploiting desirable properties of classification and regression trees: ease of handling of most variable types, indifference to variable scaling, resistance to noise and outliers, accommodations for missing values, and computational ease. In this research, we map the dissimilarities using Classical Multidimensional Scaling to a lower-dimensional Euclidean space in order to provide an analyst with a comfortable framework, which supplies visual cues in order to help find patterns and gain insights about the data. We offer in this thesis several algorithms for coloring observations in the lower-dimensional mappings in order to focus the analysts attention on the most important and interesting relationships in the data set. In addition, through our visualization, we gain a deeper understanding of the properties of tree distances and propose a modification. Our framework can be used on any military data set that involves mixed or non-mixed variables and is valuable for analysts who wish to shed light on data during the exploratory phase of analysis.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2015
Accession Number
AD1009313

Entities

People

  • Yoav Shaham

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Classification
  • Computer Science
  • Data Mining
  • Data Science
  • Data Sets
  • Dimensionality Reduction
  • Factor Analysis
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Network Science
  • Three Dimensional
  • Two Dimensional
  • Visualizations

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Graph Algorithms and Convex Optimization.
  • Regression Analysis.

Technology Areas

  • Space