Entropy based file type identification and partitioning

Abstract

The need for file identification and partitioning in the digital forensic, reverse engineering, and security analyst fields cannot be overstated. In this research, we investigate the use of the Shannon entropy profile derived from the file expressed in byte format to characterize specific file types and identify file segments based on entropy-level changes. The process consists of two stages. In the first stage, a binary representation of the file is partitioned into chunks of fixed-length data bytes and processed to extract the entropy profile. In the second stage, the detrended fluctuation analysis (DFA) method is applied to determine the level of structure in the entropy profile. The Haar continuous wavelet transform (CWT) is then used to partition the files identified as highly structured into areas of distinct changes in entropy level. Experimental results show that the proposed approach is effective in identifying file types and partitioning in segments of different entropy levels.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2017
Accession Number
AD1046497

Entities

People

  • Calvin B. Paul

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Cyber

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence Software
  • Computer Languages
  • Cryptography
  • Data Sets
  • Detection
  • Electrical Engineering
  • Identification
  • Information Processing
  • Information Science
  • Information Theory
  • Machine Learning
  • Operating Systems
  • Reverse Engineering
  • Signal Processing
  • Supervised Machine Learning
  • Wavelet Transforms

Readers

  • Approximation Theory.
  • Computational Modeling and Simulation
  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.