DECoDE Deriving Equations from Control Device Executables
Abstract
Implementing embedded control software in industrial control system (ICS) involves compiling mathematical equations into binary code, to execute on diverse controller hardware. While this process works well in the forwards direction (i.e., from mathematical model t,o controller code), reversing this process is very difficult. Reverse engineering a given binary to recover the original mathematica,l equations for a controller requires extensive subject matter expertise and manual effort in identifying and clustering different a,ssembly code sequences, associating them to mathematical primitives, and intuiting how blocks are combined. Hence, an automated tool, that reverse engineers mathematical expressions from binary code is pivotal. In addition to the base engineering value, for instanc,e in recovering of control system dynamics from legacy hardware and software, it will have extensive applications in the cybersecuri,ty domain, with uses in (a) reverse-engineering adversary control systems without extensive manual effort by experts, (b) determinin,g the nature of changes introduced by malicious third parties after a cyber-attack, and (c) could aid in new real-time defensive str,ategies that continuously monitor control code and system parameters.--The proposed fundamental research develops the DECoDE (Derivi,ng Equations from Control Device Executables) framework to recover mathematical equations from control system binaries. DECoDE addre,sses the aforementioned challenges by combining ML-based translation with con- temporary reverse engineering for analyzing, clusteri,ng, and translating binary code to symbolic control equations in an automated end-to-end flow. We leverage our strong background in, real-time control implementations and embedded programming for CPSs. DECoDE will focus on controller software in C and the IEC 6113,1-3 family of languages, allowing for the design of complex plant and control models using mathematical constructs destined for embe,dded PLC platforms.--The first task of DECoDE will produce a corpus of industrial code inspired by proof-of-concept industrial contr,ol domains (e.g., power systems, process control) using a template-based engine and adversarial sample generation, in tandem with th,e development of our ML-based translation models. Then, we will apply binary reverse engineering techniques including static analysi,s andsymbolic execution to decompile binaries and extract a control and data flow graph (CDFG) using an NYU-developed framework cust,omized to the idiosyncrasies of ICS code for PLC targets. The CDFG is refined and translated by our ML-based components: clustering, and enrichment, mathe- matical expression recovery, iterative compositional recovery and correctness/compactness evalu- ations usin,g static analysis and symbolic execution. Enabling technologies include state-of-the-art Neural Network architectures (including lon,g short-term memory and graph convolution network). --PLCs from Allen Bradley and other embedded devices are found on Naval systems, from vehicle auto-pilots to fire control systems and unmanned systems. DECoDE can aid defenses during run- time or after a cyber-at,tack to determine what parameters or control dynamics are modified. It can aid in the recovery of control system dynamics from legac,y hardware or recovered adversaries control computers without extensive manual effort by experts.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 08, 2022
- Source ID
- N000142212153
Entities
People
- Farshad Khorrami
Organizations
- New York University
- Office of Naval Research
- United States Navy