Using Distinct Sectors in Media Sampling and Full Media Analysis to Detect Presence of Documents from a Corpus
Abstract
Forensics examiners frequently search for known content by comparing each file from a target media to a known file hash database. We propose using sector hashing to rapidly identify content of interest. Using this method, we hash 512 B or 4 KiB disk sectors of the target media and compare those to a hash database of known file blocks, fixed-sized file fragments of the same size. Sector-level analysis is fast because it can be parallelized and we can sample a sufficient number of sectors to determine with high probability if a known file exists on the target. Sector hashing is also file system agnostic and allows us to identify evidence that a file once existed even if it is not fully recoverable. In this thesis we analyze the occurrence of distinct file blocks-blocks that only occur as a copy of the original file-in three multi-million file corpora and show that most files, including documents, legitimate and malicious software, consist of distinct blocks. We also determine the relative performance of several conventional SQL and NoSQL databases with a set of one billion file block hashes.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2012
- Accession Number
- ADA570831
Entities
People
- Kristina Foster
Organizations
- Naval Postgraduate School