Paper
12 October 2022 Spatio-temporal dual-attention network for view-invariant human action recognition
Kumie Gedamu, Getinet Yilma, Maregu Assefa, Melese Ayalew
Author Affiliations +
Proceedings Volume 12342, Fourteenth International Conference on Digital Image Processing (ICDIP 2022); 123420Q (2022) https://doi.org/10.1117/12.2643446
Event: Fourteenth International Conference on Digital Image Processing (ICDIP 2022), 2022, Wuhan, China
Abstract
Due to the action occlusion and information loss caused by the view changes, view-invariant human action recognition is challenging in plenty of real-world applications. One possible solution to this problem is minimizing representation discrepancy in different views while learning discriminative feature representation for view-invariant action recognition. To solve the problem, we propose a Spatio-temporal Dual-Attention Network (SDA-Net) for view-invariant human action recognition. The SDA-Net is composed of a spatial/temporal self-attention and spatial/temporal cross-attention modules. The spatial/temporal self-attention module captures global long-range dependencies of action features. The cross-attention module is designed to learn view-invariant co-occurrence attention maps and generates discriminative features for a semantic representation of actions in different views. We exhaustively evaluate our approach on the NTU- 60, NTU-120, and UESTC datasets with multi-type evaluations, i.e., Cross-Subject, Cross-View, Cross-Set, and Arbitrary-view. Extensive experiment results demonstrate that our approach exceeds the state-of-the-art approaches with a significant margin in view-invariant human action recognition.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kumie Gedamu, Getinet Yilma, Maregu Assefa, and Melese Ayalew "Spatio-temporal dual-attention network for view-invariant human action recognition", Proc. SPIE 12342, Fourteenth International Conference on Digital Image Processing (ICDIP 2022), 123420Q (12 October 2022); https://doi.org/10.1117/12.2643446
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Cameras

Convolution

Video

RGB color model

Data modeling

Feature extraction

Matrices

Back to Top