Next generation machine vision: Instantiating compositional strategies from biological vision in deep networks

Abstract

RESEARCH PROBLEM: Deep convolutional networks (DCNs) are the current state of the art in machine vision, having surpassed previous performance on standard object recognition benchmarks. However, they remain fragile under challenging visual conditions (occlusion, viewpoint, context, pose, color, texture, and lighting changes), vulnerable to attack (minimal changes in pixel patterns), show limited transference to tasks beyond object labeling, and require enormous datasets and extensive training to learn new information or tasks. Biological vision, in contrast, is robust to all of these viewing conditions, not vulnerable to attack through minor image changes, astoundingly flexible in extracting an enormous amount and variety of information from images, and able to learn from one or a few examples. The next generation of machine vision systems will require major advances toward the capabilities of biological vision, which will likely depend on leveraging understanding of how biological vision operates.OBJECTIVES: We aim to build new machine vision systems that make major advances toward the robustness, invulnerability, flexibility, and learning power of biological vision. We will do this through a combination of (i) first principle designs, (ii) analyses of operational differences between those designs and biological vision, and (iii) redesign based on those analyses, recursively iterating steps (ii) and (iii). We will demonstrate advances in robustness and flexibility by measuring changes in performance on standard benchmarks and on custom benchmarks designed to probe hazardous conditions (occlusion, etc.), and by testing vulnerabilities to adversarial attacks.TECHNICAL APPROACHES: (Stage 1) Designing Compositional Deep Networks (CompDNs) from first principles. We propose to develop novel Compositional Deep Networks by building on fundamental principles derived from the human/primate visual systems and, in particular, recent findings about the primate ventral stream, which is responsible for processing object and object-in-scene information. Our Compositional Deep Networks have some commonalities with existing Deep Networks but differ by having explicit compositional structure (representations of object parts, their 2D and 3D geometry, their spatial relationships and connectivity) and recurrent processing (bottom-up/top-down and lateral). This format for compositional representation is directly based on information coding by neurons and neural populations in ventral pathway visual cortex (area V4 and multiple stages and channels in inferotemporal cortex/IT). (Stage 2) Comparative Analysis of Artificial and Biological Compositional Networks. We will use empirical evolutionary stimulus strategies based on genetic algorithms to explore and compare object information in successive layers of our CompDNs and in homologous stages of the primate ventral pathway. We will also use multi-node electrical (in visual cortex) and virtual (in CompDNs) stimulation to establish how neurons contribute to compositional coding at higher levels and behavioral decisions (by animals or expressed through CompDN outputs). Finally, we will use these methods to analyze how learning progresses in CompDNs, and how this progression differs from human learning. In this way we will ascertain what kinds of information operations in the ventral pathway underlie the robustness and flexibility of human/primate vision, and we will identify the specific information gaps and coding strategy differences between ventral pathway and CompDNs. (Stage 3) Refining Compositional Deep Networks Based on Biological Vision. We will use what we learn from these analysesto re-design and retrain CompDNs, shifting their function and performance toward what we have observed in ventral pathway cortex. We will alter the explicit compositional operations in CompDNs to mirror the precise information patterns observed in visual cortex. We will directly instantiate cortex-like information by

Document Details

Document Type: DoD Grant Award
Publication Date: Mar 11, 2020
Source ID: N000142012206

Entities

People

Charles Connor

Organizations

Johns Hopkins University
Office of Naval Research
United States Navy

Next generation machine vision: Instantiating compositional strategies from biological vision in deep networks

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas