(NEPTUNE) Operationalizing Machine Learning for Navy Analysts with Data Programming
Abstract
Abstract: United States Navy (USN) intelligence analysts require automated tools to improve the qualityand quantity of their output. Machine learning (ML) can help achieve these goals, but requireslabeled data, which is expensive and time-consuming to collect. The problem we address isdeveloping and deploying methods to rapidly create operationally useful ML models withlimited labeled data and human resources. We propose a technical solution to this problembased on our research groups work developing data programming, a technique that combinesknowledge from domain experts, existing knowledge bases, and other models to create weaklylabeled training sets. In practice, data programming has enabled analysts and researchers tocreate ML models using person-days rather than person-years of labeling resources. Weleverage the Hacking for Defense (H4D) process to define a clear plan for translating dataprogramming -- and the award-winning Snorkel software that supports it -- into practice forUSN users. We describe the problem curation and customer discovery processes, provideseveral technical objectives formulated from end user feedback, and describe a series ofproposed Minimum Viable Products (MVPs) we plan to test under the NEPTUNE program.We anticipate outcomes that include successful identification of USN use cases where dataprogramming can improve analysis outcomes via rapid training set creation; development ofSnorkel software applications to support these use cases; extension of the underlying dataprogramming techniques to analysis contexts that require the use of multi-modal data; andintegration of passively collected, observational signal from analysts into the process ofsupervising ML models. If successful, we expect that the outputs of this work will enable USNanalysts to rapidly create training datasets for training ML models using a combination ofexisting unlabeled data, their own domain expertise, and observational signals passivelycollected during usual analysis processes. Integrating such ML models into USN analysis anddecision support processes would enable analysts to make operationally important assessmentsboth more accurately and more rapidly by leveraging vast amounts of available data.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Apr 29, 2020
- Source ID
- N000142012275
Entities
People
- Christopher RĂ©
Organizations
- Office of Naval Research
- Stanford University
- United States Navy