Regular Articles

3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos

[+] Author Affiliations
Jun Wan

Beijing Jiaotong University, Institute of Information Science, Beijing 100044, China

Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

Qiuqi Ruan

Beijing Jiaotong University, Institute of Information Science, Beijing 100044, China

Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

Wei Li

Beijing Jiaotong University, Institute of Information Science, Beijing 100044, China

Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

Gaoyun An

Beijing Jiaotong University, Institute of Information Science, Beijing 100044, China

Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

Ruizhen Zhao

Beijing Jiaotong University, Institute of Information Science, Beijing 100044, China

Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

J. Electron. Imaging. 23(2), 023017 (Apr 08, 2014). doi:10.1117/1.JEI.23.2.023017
History: Received November 19, 2013; Revised March 4, 2014; Accepted March 13, 2014
Text Size: A A A

Abstract.  Human activity recognition based on RGB-D data has received more attention in recent years. We propose a spatiotemporal feature named three-dimensional (3D) sparse motion scale-invariant feature transform (SIFT) from RGB-D data for activity recognition. First, we build pyramids as scale space for each RGB and depth frame, and then use Shi-Tomasi corner detector and sparse optical flow to quickly detect and track robust keypoints around the motion pattern in the scale space. Subsequently, local patches around keypoints, which are extracted from RGB-D data, are used to build 3D gradient and motion spaces. Then SIFT-like descriptors are calculated on both 3D spaces, respectively. The proposed feature is invariant to scale, transition, and partial occlusions. More importantly, the running time of the proposed feature is fast so that it is well-suited for real-time applications. We have evaluated the proposed feature under a bag of words model on three public RGB-D datasets: one-shot learning Chalearn Gesture Dataset, Cornell Activity Dataset-60, and MSR Daily Activity 3D dataset. Experimental results show that the proposed feature outperforms other spatiotemporal features and are comparative to other state-of-the-art approaches, even though there is only one training sample for each class.

Figures in this Article
© 2014 SPIE and IS&T

Citation

Jun Wan ; Qiuqi Ruan ; Wei Li ; Gaoyun An and Ruizhen Zhao
"3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos", J. Electron. Imaging. 23(2), 023017 (Apr 08, 2014). ; http://dx.doi.org/10.1117/1.JEI.23.2.023017


Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.