Clustering Systems with Kolmogorov Complexity and MapReduce

Abstract

In the eld of value management, an important problem is quantifying the processes and capabilities of an organization's network and the machines within. When the orga- nization is large, ever-changing, and responding to new demands, it is di cult to know at any given time what exactly is being run on the machines. Accordingly, one could lose track of approved or, worse, not approved or even malicious software, as the machines become employed for various tasks. Moreover, the level of utilization of the machines may a ect the maintenance and upkeep of the network. Our goal is to develop a tool that can cluster the machines on a network, in a meaningful way, using di erent attributes or features, and it does so autonomously, in an e cient and scalable system. The so- lution developed implements, at its core, a streaming algorithm that in real-time takes meaningful operating data from a network, compresses it, and sends it to a MapReduce clustering algorithm. The clustering algorithm uses a normalized compression distance to measure the similarity of two machines. The goal for this project was to implement the solution and measure the overall e ectiveness of the clusters. The implementation was successful in creating a software tool that can compress, determine the normalized compression distance and cluster the machines. More work however, needs to be done in using our system to extract more quantitative meaning from the clusters generated.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 02, 2011
Accession Number
ADA547540

Entities

People

  • Louis R. Troisi

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Applied Computer Science
  • Clustering
  • Compression
  • Computer Languages
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Data Sets
  • Energy Consumption
  • Language
  • Programming Languages
  • Resilience
  • Supervised Machine Learning
  • Trees (Data Structures)
  • United States Naval Academy

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Distributed Systems and Data Platform Development
  • Systems Analysis and Design