L1-Based Major Component Detection and Analysis (L1 MCDA) for n-Dimensional Data Clouds Based on Linear Conic Programming

Abstract

The objective of this research is to establish new theory and efficient algorithms for identifying the major component(s) of statistically distributed data clouds in n-dimensional Euclidean space. Principal Component Analysis (PCA) is a method widely used for identifying the spread of data in mutually orthogonal directions. Conventional PCA is based on the l2 norm and Gaussian statistics and has excellent performance as long as there are no or extremely few outliers in the data. However, data clouds obtained under any conditions other than benign laboratory conditions often contain significant numbers of outliers (i.e., the error is mostly from a heavy-tailed distribution) which strongly limits the accuracy of PCA. To remedy this situation, robust PCAs that involve the l1 norm have been proposed over the past few years. In most of the l1 reformulations of PCA that have been proposed, l2-based steps (inner products, averages, singular values) and Gaussian-based concepts (averages, variances and covariances) are retained and use of the l1 norm is limited. A 2D/3D l1-based Major Component Detection and Analysis (l1 MCDA) method that does not use any of the conventional l2 or Gaussian steps or concepts has recently been developed and tested by the PI and his research group. The computational experiments support the superiority of l1 MCDA in detecting and analyzing the major components embedded in statistically distributed data clouds. The effectiveness of l1 MCDA for handling higher dimensional data clouds hinges upon the computation speed of solving some l1 norm constrained nonlinear optimization problems. It has recently been shown that a nonlinear l1-norm constrained optimization problem can be converted into a linear optimization problem over an l1-norm based first-order cone, but, the resulting problem may become intractable (NP-hard) in theory.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 31, 2020
Accession Number
AD1110902

Entities

People

  • Shu-cherng Fang

Organizations

  • North Carolina State University

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Computational Science
  • Data Mining
  • Data Science
  • Industrial Engineering
  • Information Science
  • Integer Programming
  • Linear Programming
  • Mathematical Models
  • Operations Research
  • Optimization
  • Quadratic Programming
  • Statistics
  • Supervised Machine Learning
  • Systems Engineering

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Regression Analysis.
  • Systems Analysis and Design

Technology Areas

  • Space