Algorithms that Defy the Gravity of Learning Curve

Abstract

Conventional wisdom posits that the learning behavior of all data mining algorithms follows a typical learning curve, where more data is expected to produce better performing models. We call this behavior the gravity of learning curve which all algorithms are assumed to comply. This project provides theoretical analysis and empirical evidence for the first time that nearest neighbor anomaly detectors defy the gravity of learning curve, i.e., these gravity defiant algorithms can learn a better performing model using a small training set than that using a large training set. The knowledge we uncovered enables algorithms to be utilized in a new way to meet the challenges of big data without ever-increasing demands for big data infrastructures. This project has spent a signicant amount of time perfecting the theory and conducting a rigorous empirical evaluation. As a result, the insight gained is much better than we anticipated. The outcome is a major publication in Machine Learning Journal, published in early 2017. In addition, during this project period, four papers from two previous AOARD supported projects have been published. These include a major work on mass-based dissimilarity which was published in The ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2016. This work has informed one of the investigations in this project.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 28, 2017
Accession Number
AD1037814

Entities

People

  • Kai M. Ting

Organizations

  • Federation University Australia

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Air Force Research Laboratories
  • Anomaly Detection
  • Artificial Intelligence
  • Big Data
  • Change Detection
  • Computer Programs
  • Data Mining
  • Data Sets
  • Detection
  • Detectors
  • Gaussian Distributions
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Pattern Recognition
  • Unsupervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.
  • Seismology

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks