A3: Audio analytics, Artificial intelligence & Autonomous systems
Abstract
With recent innovations in technology and particularly artificial intelligence, autonomous systems are slowly becoming a reality that will impact all facets of life ranging from entertainment, commercial and healthcare systems to defense capabilities. Unlike traditional systems whose actions are predetermined at inception, an autonomous system has to exhibit ~intelligence~. It should be able to handle large amounts of data incoming from its sensors, parse and interpret the incoming information based on its prior beliefs, as well as adapt to unknown situations or unexpected events, in order to ultimately make or inform complex decisions. With sound signals incoming from audio sensors, such systems are tasked with interpreting the physical world from complex sensory streams that represent complex soundscapes with nested information at multiple timescales (from milliseconds to days and years). Translating this nested multiscale information to data-driven analytics (bottom-up) as well as cognitive processes (memory and attentional selection) continues to challenge current systems. Recent advances in artificial intelligence and particularly deep learning offer an opportunity to inform our thinking of how this multiscale bottom-up/top-down interaction takes place in order to better understand auditory cognition as well as translate that knowledge into better autonomous systems. In the present project, we aim to focus on multiscale analytics of complex soundscapes, considering the interplay between knowns (memory) and unknowns (salience) in the context of target selection (attention). We argue that a distributed scheme in which selectivity and invariance are simultaneously achieved is more in line with cortical-like processes and leads to truly ~deep~ inference that results in robust performance under novel listening conditions. Specifically, we leverage recent advances in deep neural networks to develop a parallel architecture that boasts a number of design elements: the multiscale nature of sound is tackled in a distributed fashion; this distributed representation gives rise to a distributed memory network also diffused along many granularities that consider different time constants and contexts; attentional feedback guides processing driven by both knowns and unknowns in the environment.The proposal makes explicit predictions about brain strategies for interpreting auditory scenes and ultimately for integrating audio analytics in autonomous systems. The computational framework develops artificial intelligence capabilities that will seamlessly integrate audio analytics with autonomous system platforms, allowing them to augment complex decision-making by humans. Such systems can in fact compensate for deficiencies in human perception permitting a partnership of automated and biological systems that is superior to either alone. These systems have direct impact in a broad array of cross-service military applications, including sensing systems for autonomous platforms operating in noisy and hostile environments, robust battlefield sensors, clutter-resistant passive sonar processors, and dynamically reconfigurable sensor webs. Over and above its engineering objective, this proposal also seeks to push the scientific understanding of auditory cognition which will be instrumental in guiding new approaches to enhance human performance in challenging settings and push forth new advances in the human-machine interface; as well as new visions for adaptive communication aids for the sensory-impaired.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jan 23, 2019
- Source ID
- N000141912014
Entities
People
- Mounya Elhilali
Organizations
- Johns Hopkins University
- Office of Naval Research
- United States Navy