Using Distinct Sectors in Media Sampling and Full Media Analysis to Detect Presence of Documents from a Corpus

Abstract

Forensics examiners frequently search for known content by comparing each file from a target media to a known file hash database. We propose using sector hashing to rapidly identify content of interest. Using this method, we hash 512 B or 4 KiB disk sectors of the target media and compare those to a hash database of known file blocks, fixed-sized file fragments of the same size. Sector-level analysis is fast because it can be parallelized and we can sample a sufficient number of sectors to determine with high probability if a known file exists on the target. Sector hashing is also file system agnostic and allows us to identify evidence that a file once existed even if it is not fully recoverable. In this thesis we analyze the occurrence of distinct file blocks-blocks that only occur as a copy of the original file-in three multi-million file corpora and show that most files, including documents, legitimate and malicious software, consist of distinct blocks. We also determine the relative performance of several conventional SQL and NoSQL databases with a set of one billion file block hashes.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2012
Accession Number
ADA570831

Entities

People

  • Kristina Foster

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Cyber
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Computer Science
  • Computers
  • Data Sets
  • Data Storage Systems
  • Database Management Systems
  • Databases
  • Domain Specific Programming Languages
  • Hash Tables
  • Html
  • Malware
  • Operating Systems
  • Relational Database Management Systems
  • Relational Databases
  • Statistical Analysis
  • Throughput
  • Trees (Data Structures)
  • Word Processors

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Cybersecurity.
  • Database Systems and Applications