Automated Metadata Extraction

Abstract

Metadata is data that describes data. There are many computer forensic uses of metadata and being able to extract metadata automatically provides positive forensic implications. This thesis presents a new technique for batch processing disk images and automatically extracting metadata from files and file contents. The technique is embodied in a program called fiwalk that has a plug-in architecture allowing new metadata extractors to be readily incorporated. Output from fiwalk can be provided in multiple formats such as ARFF and text. The plug-ins created for this thesis include one created by Simson Garfinkel for extracting metadata from .jpeg files, two for Microsoft Office documents (one for prior to Office 2007 release and one for Office 2007 release), and a default plug-in for extracting metadata from .gif, .pdf, and .mp3 files. To better understand the metadata available in common file formats such as .doc, .docx, .odt, .pdf, .mp3, .mp4, .jpeg, .tiff, and .gif, an examination of these formats is provided.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2008
Accession Number
ADA483465

Entities

People

  • James Migletz

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Counter WMD
  • Energy and Power Technologies
  • Engineered Resilient Systems
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Computational Forensics
  • Computer Programming
  • Computer Science
  • Computers
  • Data Mining
  • Databases
  • Domain Specific Programming Languages
  • Html
  • Information Science
  • Information Systems
  • Machine Learning
  • Markup Languages
  • Network Science
  • Operating Systems
  • Relational Database Management Systems
  • Spreadsheet Software
  • Word Processors

Fields of Study

  • Computer science

Readers

  • Aerospace Propulsion Engineering.
  • Database Systems and Applications