Scalable Person Re-Identification: Lessons Learned from Image Search - Research Topic 5.2

Abstract

Person re-identification implies important applications in video surveillance, e.g., cross-camera visual tracking, multi-camera event detection, etc. It saves a great portion of human efforts from exhaustively searching for a person from a vast amount of video data. Given a probe image (query), our task is to search in a gallery (database) for images that contain the same person. This task is very challenging due to two aspects. First, for the same person captured in difference cameras, images often undergo significant variation in poses, view-point, illumination, occlusions, background, camera setting, etc. Second, the database is often very large, and many pedestrians have similar appearance. Therefore, person re-identification algorithms should be both discriminative for each identity and efficient for large-scale applications. For a long time, person re-identification and image search are two separately studied tasks. However, for person re-identification, the effectiveness of local features and the Òquery-searchÓ mode make it well posed for image search techniques. In the light of recent advances in image search, this project proposes to formulate person re-identification as an image search problem, and design efficient yet effective person re-identification systems by investigating recognition accuracy and speed in the framework of image search. More specifically, we aim to accomplish three sub-tasks. First, we will develop descriptive appearance models based on the Bag-of-Words (BoW) representation. Second, we will investigate spatial-temporal models for complementary analysis. Finally, we will explore efficient fusion techniques that combine both models. The method to be employed in this project consists of several modules. First, we propose that person re-identification be addressed in the framework of image search. The usage of the BoW model makes fast yet accurate recognition feasible. Second, body parts are detected by strong supervision, and a part-to-part matching scheme is resulted, which leads to a stricter matching criteria. Third, to exploit the spatial-temporal information conveyed in the surveillance videos, a video ranking prototype system is developed to capture complementary evidences to appearance models. Finally, by fusing the complementary cues from both appearance and spatial-temporal models, we believe the recognition accuracy can be further improved.

Document Details

Document Type
DoD Grant Award
Publication Date
Apr 29, 2019
Source ID
W911NF1510290

Entities

People

  • Qi Tian

Organizations

  • Army Contracting Command
  • United States Army
  • University of Texas at San Antonio

Tags

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Distributed Systems and Data Platform Development