A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification
Kan Liu1, Bingpeng Ma2, Wei Zhang1, and Rui Huang3
1 School of Control Science and Engineering, Shandong University, China 2 School of Computer and Control Engineering, University of Chinese Academy of Sciences, China 3 NEC Laboratories, China
[PDF] [Video Spotlight] [Slides] [Poster Presentation] [Supplement] [Code] [Dataset]
The benefits of our representation are:
1) It describes a person's appearance during a walking cycle, hence covers almost the entire variety of poses and shapes;
2) It aligns the appearance of different people both spatially and temporally;
3) The formation of each body-action unit can be very flexible and different for each person, while Fisher vectors can work with any volume topologies, so the final representation is a consistent feature vector.
Spatio-temporal body-action model
Walking cycle extraction
(a) A video sequence of a pedestrian (only key frames).
(b) The original FEP (blue curve) and the regulated FEP (red one).
(c) The pedestrian poses corresponding to the FEP, based on which the walking cycle is extracted.
Spatial-temporal body-action units
Temporal segmentation combined with a fixed body part model.
Color encodes the body parts.
Intensity encodes the action primitives.
Fisher vector learning and extraction
Extract Fisher vectors built upon low-level feature descriptors.
A very concise local descriptor that combines color, texture, and gradient information:
Experimental Results
Evaluation of the low-level descriptor
Comparison to other representations
Comparison to the state of the art
Reference