Modern Approaches and Theoretical Extensions to the Multivariate Kolmogorov Smirnov Test

Abstract

Most statistical tests are fully developed for univariate data, but when inference is required for multivariate data, univariate tests risk information loss and interpretability. This research 1) derives and extends the multivariate Komolgorov Smirnov test for 2 and into m-dimensions, 2) derives small sample critical values for the KS test that are not reliant on sample size simulations or correlation between variables, 3) extends large sample estimations and current KS implementations, and 4) provides sample size and power calculations in order to enable experimental design with respect to testing for differences in distributions. Through extensive simulation, we demonstrate that our new multidimensional KS test generally has more power for smaller sample sizes and comparable power for larger sample sizes and maintains desirable statistical properties. Furthermore, we improve and extend current implementations of the KS test to sample sizes upwards of n = 5000. Finally, we demonstrate how to compute critical values to any size dimensional data and provide power and sample size criterion for designing studies using 2 and 3 dimensional distributions. These results enable statistical testing of multidimensional features, irrespective of correlation, thus improving our ability to understand large data sets for rapid and efficient decision making and analysis.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2022
Accession Number
AD1181176

Entities

People

  • Gonzalo Hernando

Organizations

  • Air Force Institute of Technology

Tags

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Data Analysis
  • Data Mining
  • Data Science
  • Data Sets
  • Distribution Functions
  • Information Science
  • Knowledge Management
  • Machine Learning
  • Probability Distributions
  • Random Variables
  • Simulations
  • Standards
  • Statistical Tests
  • Statistics
  • Surveys
  • Test Methods
  • Three Dimensional
  • Two Dimensional

Readers

  • Calculus or Mathematical Analysis
  • Regression Analysis.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference