quotation:		[Copy]
		[Copy]

This Paper:Browse 2450 Download 5
卷积神经网络的FPGA并行加速设计与实现
满涛,郭子豪,曲志坚
0 Fontlarge +\|Default\|Small
(山东理工大学计算机科学与技术学院，山东淄博 255049)

摘要:

为提高目前硬件设备上运行卷积神经网络的速度和能效，针对主流的卷积神经网络提出了一种基于现场可编程门阵列（Field Programmable Gate Array，FPGA）的流水线并行加速方案,设计优化了数据存储模块、卷积计算模块、池化模块以及全连接模块，结合高层次综合技术构建了基于FPGA的卷积神经网络基本单元。为了降低加速系统的硬件开销，在保证卷积神经网络精度损失很小的前提下，采用数据量化的方式将网络参数从32位浮点数转化为16位定点数。系统测试使用MNIST数据集和CIFAR-10数据集，实验结果显示，所提出的卷积神经网络FPGA加速具有更快的识别效果，并且该方案在资源和功耗较少的情况下可以提供更好的性能，同时能够高效地利用FPGA上的硬件资源。

关键词: 卷积神经网络现场可编程门阵列并行加速高层次综合定点数量化

DOI：

基金项目:山东省自然科学基金资助项目(ZR2016FM18,ZR2017LF004)；山东省高等学校青年创新团队发展计划项目(2019KJN48)

FPGA parallel acceleration design and implementation of convolutional neural network

MAN Tao,GUO Zihao,QU Zhijian

(School of Computer Science and Technology,Shandong University of Technology,Zibo 255049,China)

Abstract:

To improve the speed and energy efficiency of convolutional neural network(CNN) running on the current hardware devices,a pipeline parallel acceleration scheme based on field programmable gate array(FPGA) is proposed for the mainstream convolutional neural network.The design optimizes the data storage module,convolutional calculation module,pooling module and fully-connected module,combines high-level synthesis technology to construct the basic unit of CNN based on FPGA.In order to reduce the hardware overhead of the acceleration system,under the premise that the accuracy loss of the CNN is small,the network parameters are converted from 32-bit floating-point numbers to 16-bit fixed-point numbers by data quantization.The system test uses the MNIST data set and the CIFAR-10 data set.The experimental results show that the CNN FPGA acceleration proposed in this paper has a faster recognition effect,and the solution can provide better results with less resources and power consumption.At the same time,it can efficiently use the hardware resources on the FPGA.

Key words: convolutional neural network FPGA parallel acceleration high-level synthesis fixed-point quantization