Spatial attention mechanism is widely used to extract local feature in person re-identification. However, some existing multi-stage spatial attention structures lack flexibility and require complicated training process. In this paper, a plug-and-play LSTM-based Attention Module(LAM) is proposed to enhance flexibility of the multi-attention mechanism. First, we employ the single-stage multi-attention structure to replace the traditional multi-stage multi-attention structure. Our structure encapsulates multiple attention machines in single module and thus the module can be added to any backbone networks without any modification directly. Then, correlation is introduced to spatial attention machines through LSTM. Correlation between different attention machines preserves diversity of the local feature and exploit the capacity of multi-attention mechanism. Moreover, the LAM is added to the backbone network in the form of residual, which enables the LAM to be trained with the backbone network synchronously. Therefore, the training process is simplified effectively. Experiments on CUHK03, Market-1501 and DukeMTMC-ReID datasets demonstrate the advantage of the proposed method.
In previous works, the channel attention mechanism has been widely used in person re-identification. However, the channel attention mechanism completely compresses the spatial dimension during calculation, which harms the diversity of the channel information over different pixels. In this paper, a channel convolution residual block is proposed for more detailed inter-channel correlation modeling. First, we preserve spatial context information when introducing the channel dependency, which enables pixel-wise inter-channel correlation modeling. At the same time, a bottleneck strategy is used to reduce parameters in the spatial dimension. Second, the channel convolution instead of the fully connected layer is employed to reduce the parameters in the channel dimension. In addition, the inter-channel correlation is merged into the backbone network directly in the form of residual, and thus the block can be embedded in any deep neural networks. Experiments on Market1501 and DukeMTMC-ReID datasets demonstrate that the channel convolution residual block improves the accuracy of person re-identification task effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.