Mixed 3D-(2+1)D convolution for action recognition

Bin Yang; Ping Zhou

doi:10.1117/12.2540276

14 August 2019 Mixed 3D-(2+1)D convolution for action recognition

Bin Yang, Ping Zhou

Proceedings Volume 11179, Eleventh International Conference on Digital Image Processing (ICDIP 2019); 1117949 (2019) https://doi.org/10.1117/12.2540276
Event: Eleventh International Conference on Digital Image Processing (ICDIP 2019), 2019, Guangzhou, China

Abstract

2D CNNS for video-based action modeling ignore the temporal information and treat the multiple frames analogously to channels. In view of this, a mixed convolution structure implemented with ResNet-18 residual network is designed for video feature extracting. The 3D convolution and the (2+1)D convolution are interleaved in sequence throughout the network. Firstly, 2D convolution is performed on input multiple video frames one by one in the spatial. Then, 1D convolution of temporal is performed on the output of 2D convolution. Finally, 3D convolution is performed for spatiotemporal modeling simultaneously. Results show that the mixed convolution structure enhances the transmission of temporal information, improves the ability of video feature extraction and the action recognition accuracy obviously

Citation Download Citation

Bin Yang and Ping Zhou "Mixed 3D-(2+1)D convolution for action recognition", Proc. SPIE 11179, Eleventh International Conference on Digital Image Processing (ICDIP 2019), 1117949 (14 August 2019); https://doi.org/10.1117/12.2540276

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Convolution

Video

RGB color model

Feature extraction

Data modeling

Neural networks

Video processing

Keywords/Phrases

Search In:

Publication Years