Action and Object Semantic Role Inference for Video Understanding

Abstract

The goal of this project is semantic understanding of the contents of a video which is a fundamental topic incomputer vision researc h. Actions and objects are the obvious semantic entities of highest interestand considerable progress has been made in detection of these entities in the last few years butunderstanding of relations between these entities remains limited. There is a need to unders tandthat semantic roles played by objects in an action (such as which object is the actor, whichis the subject of action and whi ch serves as the instrument). Semantic-Role Labels (SRLs)capture the high-level meaning of who did what to whom (occasionally al so how, when andwhere). Furthermore, in a longer video, there is need to co-reference entities participating in asequence of a ctions. The computed representations must be formal enough for other programs toact on them and yet communicate in natural language to human users. The converse, of humanstasking the analysis programs in natural language is also desirable. Here, we propose a unifi edapproach to these tasks.Our approach includes incorporating state-of-art object detectors in a sequence-to-sequence framework as m ost such frameworks consider temporal relations only. A complementary approach will explore event graph representation inference fro m videos. The graph will include event-nodes and object-nodes referring to actions and objects respectively. The approach will inclu de co-referencing of objects in multiple frames. Finally, the methods will infer relations between events and provide probabilistic predictions of following events. Automated methods for semantic understanding of entities in videos and natural communicationwith h uman users are obvious and critical relevance and importance to the U.S. Navy and DoD in general. Theapplications include video inde xing, browsing, forensic analysis and threat monitoring. Advancesin automating such tasks or providing semi-automated assistance to human analysts willgreatlyincrease their productivity and more timely availability of critical analytic data that may be actedupon.

Document Details

Document Type: DoD Grant Award
Publication Date: Aug 20, 2021
Source ID: N000142112802

Entities

People

Ramakant Nevatia

Organizations

Office of Naval Research
United States Navy
University of Southern California

Action and Object Semantic Role Inference for Video Understanding

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas