Using Distinct Sectors in Media Sampling and Full Media Analysis to Detect Presence of Documents from a Corpus

Abstract

Forensics examiners frequently search for known content by comparing each file from a target media to a known file hash database. We propose using sector hashing to rapidly identify content of interest. Using this method, we hash 512 B or 4 KiB disk sectors of the target media and compare those to a hash database of known file blocks, fixed-sized file fragments of the same size. Sector-level analysis is fast because it can be parallelized and we can sample a sufficient number of sectors to determine with high probability if a known file exists on the target. Sector hashing is also file system agnostic and allows us to identify evidence that a file once existed even if it is not fully recoverable. In this thesis we analyze the occurrence of distinct file blocks-blocks that only occur as a copy of the original file-in three multi-million file corpora and show that most files, including documents, legitimate and malicious software, consist of distinct blocks. We also determine the relative performance of several conventional SQL and NoSQL databases with a set of one billion file block hashes.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Sep 01, 2012
Accession Number: ADA570831

Entities

People

Kristina Foster

Organizations

Naval Postgraduate School

Using Distinct Sectors in Media Sampling and Full Media Analysis to Detect Presence of Documents from a Corpus

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers