| 摘要: |
| 针对卷积神经网络(Convolutional Neural Network,CNN)在边缘设备部署时的存储需求大、计算复杂度高以及功耗受限等问题,提出了一种基于ZYNQ的高效卷积神经网络加速器。首先,通过深度可动态配置的行缓冲设计,实现了片上存储资源的高效利用。其次,为了提高计算效率,基于整型量化技术,设计了一种共享数字信号处理器(Digital Signal Processor,DSP)计算方案,实现了单个DSP对两个有符号INT8乘法的支持。然后,提出了一种数据重排方案,有效提升数据传输效率并减少带宽访问,降低了存储空间寻址的复杂度。在ZU5EV部署了VGG16模型以对加速器性能进行测试,结果表明,所提出的加速器能够实现133.35 GOPS的吞吐量、0.45 GOPS/DSP的计算密度和3957 GOPS/W的能效比。 |
| 关键词: 边缘设备 卷积神经网络(CNN) 硬件加速器 数据重排 |
| DOI:10.20079/j.issn.1001-893x.241009004 |
|
| 基金项目:重庆市自然科学基金面上项目(CSTB2023NSCQ-MSX0249,CSTB2023NSCQ-MSX0832,CSTB2023NSCQ-LZX0126,CSTB2023NSCQ-LZX0014);重庆市教委科学技术研究项目(KJQN202300615);重庆市研究生科研创新项目(CYS23454) |
|
| Design of an Efficient Convolutional Neural Network Accelerator Based on ZYNQ |
| GONG Guichuan,XIE Liangbo,HUANG Qian,ZHOU Mu |
| (School of Communications and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China) |
| Abstract: |
| To address the challenges of high memory requirements,computational complexity,and power limitations when deploying convolutional neural network(CNN) on edge devices,an efficient CNN accelerator based on ZYNQ is proposed.Firstly,a depth-configurable line buffer design is introduced to enable efficient use of on-chip memory resources.Secondly,to improve computational efficiency,an integer quantization technique is adopted,and a shared digital signal processor(DSP)computing scheme is designed to allow a single DSP to support two signed INT8 multiplications.Additionally,a data rearrangement scheme is proposed to enhance data transfer efficiency,reduce bandwidth access,and lower memory addressing complexity.The accelerator’s performance is evaluated by deploying the VGG16 model on ZU5EV device.Experimental results demonstrate that the proposed accelerator achieves a throughput of 133.35 GOPS,a computational efficiency of 0.45 GOPS/DSP,and an energy efficiency ratio of 39.57 GOPS/W. |
| Key words: edge device convolutional neural network(CNN) hardware accelerator data rearrangement |