File Fragment Classification - The Case for Specialized Approaches

Abstract

Increasingly advances in file carving, memory analysis and network forensics requires the ability to identify the underlying type of a file given only a file fragment. Work to date on this problem has relied on identification of specific byte sequences in file headers and footers, and the use of statistical analysis and machine learning algorithms taken from the middle of the file. We argue that these approaches are fundamentally flawed because they fail to consider the inherent internal structure in widely used file types such as PDF, DOC, and ZIP. We support our argument with a bottom-up examination of some popular formats and an analysis of TK PDF files. Based on our analysis, we argue that specialized methods targeted to each specific file type will be necessary to make progress in this area.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2009
Accession Number
ADA549423

Entities

People

  • Simson Garfinkel
  • Vassil Roussev

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Case Studies
  • Classification
  • Coding
  • Computer Programming
  • Computer Science
  • Computers
  • Identification
  • Information Science
  • Learning
  • Machine Learning
  • Operating Systems
  • Reliability
  • Sequences
  • Statistical Analysis
  • Test Sets
  • Word Processors

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks