Modeling Spatiotemporal Contextual Dynamics with Sparse-Coded Transfer Learning
Abstract
An important problem of visual understanding is how to recognize and predict human actions or imminent events from video. The ultimate intelligent systems should be able to detect/track suspicious subjects, predict actions and events, and raise alarms for emergencies before happening. From this STIR project, we have created a new algorithmic tool set of modeling spatiotemporal contextual dynamics. For low-level and middle-level visual representation, we proposed a class of Schatten norm based discriminative metrics, locality-constrained low-rank coding, discriminative analysis by multiple principal angles, and clustering based fast low-rank approximation for large scale analysis. We also proposed decomposed contour prior and a stub feature based level set method for shape recognition in images and videos. For high-level understanding and inference, we proposed the ARMA-HMM model for early recognition of human activity and the complex temporal composition model of actionlets for activity prediction. Effectiveness and efficiency have been extensively tested for human action and activity recognition and prediction. The evaluation results and outcomes of this research have been published in 8 peer-reviewed conference proceedings along with a best paper ward, and 1 peer-reviewed journal paper.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 08, 2012
- Accession Number
- ADA587078
Entities
People
- Yun Fu
Organizations
- University at Buffalo