Use of Mahalanobis Distance for Detecting Outliers and Outlier Clusters in Markedly Non-Normal Data: A Vehicular Traffic Example

Abstract

Modeling the behavior of interacting humans in routine but complex activities has many challenges, not the least of which is that humans can be both purposive and negligent, and further can encounter unexpected environmental hazards requiring fast action. The challenge is to characterize and model the humdrum routine while at the same time capturing the deviations and anomalies which arise from time to time. Because of the disruptive impact that anomalies (such as accidents) can have and the importance for incorporating them in our models, this report focuses on one technique for identifying anomalies in complex behavior patterns especially when there is no sharp demarcation between routine and unusual activity. The technique we evaluate is that of Mahalanobis distance which is known to be useful for identifying outliers when data is multivariate normal. But, the data we use for evaluation is deliberately markedly non-multivariate normal since that is what we confront in complex human systems. Specifically, we use one year's (2008) hourly traffic-volume data on a major multi-lane road (I-95) in one location in a major city (New York) with a dense population and several alternate routes. The traffic data is rich, large, incomplete, and reflects the effects of bad weather, accidents, routine fluctuations (rush hours versus dead of night), and onetime social events. The results show that Mahalanobis distance is a useful technique for identifying both single-hour outliers and contiguous-time clusters whose component members are not, in themselves, highly deviant.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2011
Accession Number
ADA545834

Entities

People

  • Anne K. Cybenko
  • Rik Warren
  • Robert F. Smith

Organizations

  • SRA International Inc

Tags

Communities of Interest

  • C4I
  • Human Systems
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accidents
  • Air Force
  • Air Force Research Laboratories
  • Anomaly Detection
  • Change Detection
  • Data Mining
  • Data Science
  • Data Sets
  • Detection
  • Detectors
  • Engineering
  • Information Science
  • New York
  • Normal Distribution
  • Statistical Analysis
  • United States
  • Urban Areas

Readers

  • Aviation Safety Risk Assessment.
  • Educational Psychology
  • Regression Analysis.