The person reidentification task applied in a real-world scenario is addressed. Finding people in a network of cameras is challenging due to significant variations in lighting conditions, different color responses, and different camera viewpoints. State-of-the-art algorithms are likely to fail due to serious perspective and pose changes. Most of the existing approaches try to cope with all these changes by applying metric learning tools to find a transfer function between a camera pair while ignoring the body alignment issue. Additionally, this transfer function usually depends on the camera pair and requires labeled training data for each camera. This might be unattainable in a large camera network. We employ three-dimensional scene information for minimizing perspective distortions and estimating the target pose. The estimated pose is further used for splitting a target trajectory into reliable chunks, each one with a uniform pose. These chunks are matched through a network of cameras using a previously learned metric pool. However, instead of learning transfer functions that cope with all appearance variations, we propose to learn a generic metric pool that only focuses on pose changes. This pool consists of metrics, each one learned to match a specific pair of poses and not being limited to a specific camera pair. Automatically estimated poses determine the proper metric, thus improving matching. We show that metrics learned using only a single camera can significantly improve the matching across the whole camera network, providing a scalable solution. We validated our approach on publicly available datasets, demonstrating increase in the reidentification performance.
The person re-identification problem is a well known retrieval task that requires finding a person of interest in a network of cameras. In a real-world scenario, state of the art algorithms are likely to fail due to serious perspective and pose changes as well as variations in lighting conditions across the camera network. The most effective approaches try to cope with all these changes by applying metric learning tools to find a transfer function between a camera pair. Unfortunately, this transfer function is usually dependent on the camera pair and requires labeled training data for each camera. This might be unattainable in a large camera network. In this paper, instead of learning the transfer function that addresses all appearance changes, we propose to learn a generic metric pool that only focuses on pose changes. This pool consists of metrics, each one learned to match a specific pair of poses. Automatically estimated poses determine the proper metric, thus improving matching. We show that metrics learned using a single camera improve the matching across the whole camera network, providing a scalable solution. We validated our approach on a publicly available dataset demonstrating increase in the re-identification performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.