17 June 2022 LGST-Drop: label-guided structural dropout for spatial–temporal convolutional neural networks
Hu Cui, Renjing Huang, Ruoyu Zhang, Chuhua Huang
Author Affiliations +
Abstract

Region dropout regularization strategies have proven to be highly effective at improving the generalization performance of convolutional neural networks (CNNs) in a variety of computer vision tasks including image classification, object detection, and semantic segmentation because these strategies enable models to focus on a wider range of image region information. However, for action recognition, models need to be able to extract not only useful spatial information but also important temporal and motion information, which cannot be satisfied by traditional regularization strategies. We propose a spatiotemporal dropout strategy to meet the need for regularization in spatial–temporal CNNs. We call it label guided spatial–temporal drop (LGST-Drop); it not only provides effectively structured dropout in the spatial dimension but also regularizes motion information in the temporal dimension. In addition, LGST-Drop’s mask is guided by the predicted categories of the model itself, which we called temporary labels. Extensive experiments on several standard datasets from action recognition domains show the usefulness of the proposed technique in comparison with the previous methods and theirstate-of-the-art variant algorithms.

© 2022 SPIE and IS&T 1017-9909/2022/$28.00 © 2022 SPIE and IS&T
Hu Cui, Renjing Huang, Ruoyu Zhang, and Chuhua Huang "LGST-Drop: label-guided structural dropout for spatial–temporal convolutional neural networks," Journal of Electronic Imaging 31(3), 033036 (17 June 2022). https://doi.org/10.1117/1.JEI.31.3.033036
Received: 27 February 2022; Accepted: 30 May 2022; Published: 17 June 2022
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Convolutional neural networks

Convolution

Data modeling

Motion models

Visual process modeling

Machine vision

RELATED CONTENT

Prediction of visual saliency in video with deep CNNs
Proceedings of SPIE (September 28 2016)
Learning deep similarity in fundus photography
Proceedings of SPIE (February 24 2017)
Canny-based palm vein recognition algorithm
Proceedings of SPIE (March 27 2024)

Back to Top