quotation:		[Copy]
		[Copy]

This Paper:Browse 328 Download 199	码上扫一扫！
动态深度神经网络的硬件加速设计及FPGA实现
王鹏,任轶群,范毓洋,张嘉诚
0 Fontlarge +\|Default\|Small
(中国民航大学a.民航航空器适航审定技术重点实验室;b.电子信息与自动化学院;c.安全科学与工程学院，天津 300300)

摘要:

基于现场可编程门阵列(Field Programmable Gate Array,FPGA)实现的卷积神经网络由于具有优秀的目标识别能力，广泛应用在边缘设备。然而现有的神经网络部署多基于静态模型，因此存在无效特征提取、计算量增大、帧率降低等问题。为此，提出了动态深度神经网络的实现方法。通过引入模型定点压缩技术和并行的卷积分块方法，并结合低延迟的数据调度策略，实现了高效卷积计算。同时对神经网络动态退出机制中引入的交叉熵损失函数，提出便于硬件实现的简化方法，设计专用的加速电路。根据所提方法，在Xilinx xc7z030平台部署了具有动态深度的ResNet110网络，平台最高可完成2.78×104 MOPS（Million Operations per Second）的乘积累加运算，并支持1.25 MOPS的自然指数运算和0.125 MOPS的对数运算，相较于i7-5960x处理器加速比达到287%，相较于NVIDIA TITAN X处理器加速比达到145%。

关键词: 边缘设备动态深度神经网络动态退出机制硬件加速加速电路

DOI：10.20079/j.issn.1001-893x.220819003

基金项目:国家重点研发计划（2021YFB1600600）；中央高校基本科研业务费（XJ2021003601）

Design and FPGA Implementation of Dynamic Deep Neural Network Hardware Acceleration

WANG Peng,c,REN Yiqun,b

(a.Key Laboratory of Civil Aircraft Airworthiness Technology;b.College of Electronic Information and Automation;c.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China)

Abstract:

Convolutional neural network(CNN) based on field programmable gate array(FPGA) has been widely used in edge devices because of their excellent object detection capability.However,existing neural network deployment is mostly based on static models,resulting in invalid feature extraction,computation increase,and frame rate reduction.In order to solve these problems,the authors propose a deployment method for a dynamic depth neural network.By introducing model fixed-point compression technology and parallel convolution blocking method,combined with low latency data scheduling strategy,efficient convolution computation is achieved.At the same time,a simplified hardware implementation method is proposed for the cross-entropy loss function used in the neural network dynamic exit mechanism,and a dedicated acceleration circuit is designed.According to the proposed method,dynamic depth ResNet110 is deployed on the Xilinx xc7z030 platform,which reaches 2.78×104 million operations per second(MOPS) Multiply and Accumulate(MAC),1.25 MOPS natural index operation,and 0.125 MOPS logarithmic operation.The acceleration ratio is 287

Key words: edge device dynamic deep neural network dynamic exit mechanism hardware acceleration accelerating circuit