首页期刊视频编委会征稿启事出版道德声明审稿流程读者订阅论文查重联系我们English
引用本文
  • 龚贵川,谢良波,黄倩,等.基于ZYNQ的高效卷积神经网络加速器设计[J].电讯技术,2026,66(2): - .    [点击复制]
  • GONG Guichuan,XIE Liangbo,HUANG Qian,et al.Design of an Efficient Convolutional Neural Network Accelerator Based on ZYNQ[J].,2026,66(2): - .   [点击复制]
【HTML】 【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 15次   下载 5 本文二维码信息
码上扫一扫!
基于ZYNQ的高效卷积神经网络加速器设计
龚贵川,谢良波,黄倩,周牧
0
(重庆邮电大学 通信与信息工程学院,重庆 400065)
摘要:
针对卷积神经网络(Convolutional Neural Network,CNN)在边缘设备部署时的存储需求大、计算复杂度高以及功耗受限等问题,提出了一种基于ZYNQ的高效卷积神经网络加速器。首先,通过深度可动态配置的行缓冲设计,实现了片上存储资源的高效利用。其次,为了提高计算效率,基于整型量化技术,设计了一种共享数字信号处理器(Digital Signal Processor,DSP)计算方案,实现了单个DSP对两个有符号INT8乘法的支持。然后,提出了一种数据重排方案,有效提升数据传输效率并减少带宽访问,降低了存储空间寻址的复杂度。在ZU5EV部署了VGG16模型以对加速器性能进行测试,结果表明,所提出的加速器能够实现133.35 GOPS的吞吐量、0.45 GOPS/DSP的计算密度和3957 GOPS/W的能效比。
关键词:  边缘设备  卷积神经网络(CNN)  硬件加速器  数据重排
DOI:10.20079/j.issn.1001-893x.241009004
基金项目:重庆市自然科学基金面上项目(CSTB2023NSCQ-MSX0249,CSTB2023NSCQ-MSX0832,CSTB2023NSCQ-LZX0126,CSTB2023NSCQ-LZX0014);重庆市教委科学技术研究项目(KJQN202300615);重庆市研究生科研创新项目(CYS23454)
Design of an Efficient Convolutional Neural Network Accelerator Based on ZYNQ
GONG Guichuan,XIE Liangbo,HUANG Qian,ZHOU Mu
(School of Communications and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)
Abstract:
To address the challenges of high memory requirements,computational complexity,and power limitations when deploying convolutional neural network(CNN) on edge devices,an efficient CNN accelerator based on ZYNQ is proposed.Firstly,a depth-configurable line buffer design is introduced to enable efficient use of on-chip memory resources.Secondly,to improve computational efficiency,an integer quantization technique is adopted,and a shared digital signal processor(DSP)computing scheme is designed to allow a single DSP to support two signed INT8 multiplications.Additionally,a data rearrangement scheme is proposed to enhance data transfer efficiency,reduce bandwidth access,and lower memory addressing complexity.The accelerator’s performance is evaluated by deploying the VGG16 model on ZU5EV device.Experimental results demonstrate that the proposed accelerator achieves a throughput of 133.35 GOPS,a computational efficiency of 0.45 GOPS/DSP,and an energy efficiency ratio of 39.57 GOPS/W.
Key words:  edge device  convolutional neural network(CNN)  hardware accelerator  data rearrangement
安全联盟站长平台