quotation:		[Copy]
		[Copy]

This Paper:Browse 7594 Download 1839
基于3D ResNet-LSTM的多视角人体动作识别方法
杨思佳,辛山,刘悦,张雷
0 Fontlarge +\|Default\|Small
(北京建筑大学电气与信息工程学院,北京 100044)

摘要:

在基于视频图像的动作识别中，由于固定视角相机所获取的不同动作视频存在视角差异，会造成识别准确率降低等问题。使用多视角视频图像是提高识别准确率的方法之一，提出基于三维残差网络（3D Residual Network，3D ResNet）和长短时记忆（Long Short-term Memory,LSTM）网络的多视角人体动作识别算法，通过3D ResNet学习各视角动作序列的融合时空特征，利用多层LSTM网络继续学习视频流中的长期活动序列表示并深度挖掘视频帧序列之间的时序信息。在NTU RGB+D 120数据集上的实验结果表明，该模型对多视角视频序列动作识别的准确率可达83.2%。

关键词: 多视角动作识别大姿态人脸识别三维残差网络长短时记忆(LSTM)网络

DOI：10.20079/j.issn.1001-893x.220211002

基金项目:北京市重点实验室项目（BZ0337）；智能机器人与系统高精尖创新中心建设项目（00921917001）

A multi-view human action recognition method based on 3D ResNet-LSTM

YANG Sijia,XIN Shan,LIU Yue,ZHANG Lei

(School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China)

Abstract:

In action recognition based on video images,due to the difference in perspective between different action videos obtained by a fixed perspective camera,it will cause problems such as a reduction in the recognition accuracy.Using multi-view video images is one of the methods to improve the recognition accuracy.Therefore,a multi-view human action recognition algorithm based on 3D residual network(3D ResNet) and long short-term memory(LSTM) network is proposed.The fusion spatio-temporal features of action sequences of each view are learned through 3D ResNet and the multi-layer LSTM is used to continue learning the long-term activity sequence representation and mining the timing information between video frame sequences deeply.The experimental results on the NTU RGB+D 120 dataset show that the accuracy of the model for action recognition in multi-view video sequences can reach 83.2%.

Key words: multi-view action recognition large pose face recognition 3D residual neural network long short-term memory(LSTM) network