On-the-fly Cyber Crime Scene Transcribing

Abstract

Problem Statement. Audit logs such as system call logs and network logs are often at a very low level, describing primitive operations such as reads and writes. It is hence very difficult for downstream applications such as attack detection and forensics to extract useful information. We aim to develop a technique to transcribe these low level logs to high level behavior descriptions such as playing a video and composing/sending an email. Note that each of such high level behaviors correspond to thousands of low level logevents. Motivation. Situation awareness is critical in cyber-warfare. Forensics is an important technique to achieve such awareness. Existing forensics techniques are based on analyzing low level audit logs that record events such as file and network operations and even memory behaviors. Even though they provide abstractions such as provenance graphs that denote causality between these events, information is still represented at a very low level, posing challenges in downstream applications such as on-the-fly attack detection. For example, to detect ransomware attack, analysts often have to compose complex models describing the attacks# low level behavior patterns. State-of-the-art. Existing techniques have been focusing on reducing irrelevant events in audit logs such that space and processing overheads can be substantially mitigated and reasoning on the reduced events can be more effective. Some techniquesfocus on determining if an audit log event is attack relevant. However, there is very limited work on transcribing low level eventsto high level behavior descriptions. Proposed Work. We propose to develop an AI technique that can transcribe low level audit logsto high level behavior descriptions. We will first define a universal behavior description language that can describe high level system/user behaviors that are forensics relevant, such as opening a URL, saving an attachment, playing a video, and chatting with a remote agent. This language will be so general that it can describe behaviors of all popular applications. We will then formulate theproblem as a machine translation problem that translates audit logs in very low level languages to descriptions in the high level language. Training such a model requires substantial data. It is unrealistic to manually label such data. Therefore, we propose a semi-supervised learning method that automatically approximates data labels via program analysis. We will hence train a machine translation model by extending NLP Transformer techniques with a novel attention mechanism. The transcription models will enable many new forensics capabilities. For example, we will develop on-the-fly attack detection that can be performed by writing rules in our description language, which is much easier than modeling low level attack behaviors; audit logs can be removed once they are transcribed, saving a huge amount of space (from 10GB per day to less than 10MB). Novelty Claim and Impacts.The proposed research will explore the synergy between AI techniques and rigorous program analysis and formal reasoning, in order to bridge the gap between high level behaviors and log level audit logs. If it succeeds, it will be the first AI driven cyber crime scene transcription technique that canproduces highly perceptible attack reports. It will substantially reduce the workload of analysts and the response time to imminentcyber threats. It will also enable highly effective downstream applications such as online attack detection and prevention, ensuring our winning position in cyber warfare. Besides the transcription models and the online attack detection methods that can be directly transitioned to related Navy cyber security tasks, a number of the enabling techniques are highly transferable too. For example, the (transcribed) log dataset will substantially facilitate future research on AI based log analysis.

Document Details

Document Type
DoD Grant Award
Publication Date
Jan 12, 2023
Source ID
N000142312081

Entities

People

  • Xiangyu Zhang

Organizations

  • Office of Naval Research
  • Purdue University
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Cybersecurity.
  • Database Systems and Applications
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • Cyber
  • Space