Detecting Patterns of Anomalies

Abstract

An anomaly is an observation that does not conform to the expected normal behavior. With the ever increasing amount of data being collected, automatic surveillance systems are becoming more popular and are increasingly using data mining methods to detect patterns of anomalies. The diverse nature of real-world datasets, and the difficulty of obtaining labeled training data make it challenging to develop a universal framework for anomaly detection. We focus on a key feature of most real world scenarios, that multiple anomalous records are usually generated by a common anomalous process. In this thesis we develop methods that utilize the similarity between records in these groups or patterns of anomalies to perform better detection. We also investigate new methods for detection of individual record anomalies, which we then incorporate into the group detection methods. A recurring feature of our methods is combinatorial search over some space (e.g. over all subsets of attributes, or over all subsets of records). We use a variety of computational speedup tricks and approximation techniques to make these methods scalable to large datasets. Since most of our motivating problems involve datasets having categorical or symbolic values, we focus on categorical datasets. Apart from this, we make few assumptions about the data, and our methods are very general and applicable to a wide variety of domains. Additionally, we investigate anomaly pattern detection in data structured by space and time. Our method generalizes the popular method of spatio-temporal scan statistics to learn and detect specific, time-varying spatial patterns in the data. Finally, we show an efficient and easily interpretable technique for anomaly detection in multivariate time series data. We evaluate our methods on a variety of real world data sets including both real and synthetic anomalies.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2009
Accession Number
ADA501930

Entities

People

  • Kaustav Das

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Advanced Electronics
  • C4I
  • Cyber
  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Bayesian Networks
  • Computational Science
  • Data Mining
  • Databases
  • Detectors
  • Health Services
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Network Science
  • Neural Networks
  • Probabilistic Models
  • Probability Distributions
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Distributed Systems and Data Platform Development
  • Sensor Fusion and Tracking Systems.

Technology Areas

  • AI & ML
  • Space
  • Space - Space Objects