《电讯技术》，北大中文核心期刊，中国科技核心期刊！

首页 期刊视频编委会征稿启事出版道德声明审稿流程读者订阅论文查重联系我们English

引用本文：

朱明达,薛济擎,艾纯瑶.SpMV计算的ARM和FPGA异构加速器设计[J].电讯技术，2024，（2）：302 - 309. [点击复制]
ZHU Mingda,XUE Jiqing,AI Chunyao.Design of an ARM and FPGA Heterogeneous Accelerator for SpMV Computation[J].，2024，（2）：302 - 309. [点击复制]

【HTML】【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 2433次下载 1750次	码上扫一扫！
SpMV计算的ARM和FPGA异构加速器设计
朱明达,薛济擎,艾纯瑶
0 字体:加大+\|默认\|缩小-
(中国石油大学北京信息科学与工程学院，北京 102249)

摘要:

针对稀疏矩阵向量乘（Sparse Matrix-Vector Multiplication，SpMV）在边缘端实施效率不高的问题，以稀疏矩阵的存储格式、SpMV的现场可编程门阵列（Field Programmable Gate Array，FPGA）加速为研究对象，提出了一种多端口改进的行压缩存储格式（Modified Compressed Sparse Row Format，MCSR）与ARM+FPGA架构任务级数据级硬件优化相结合的加速方法。使用多个端口并行存取数据来提高计算并行度；使用数据流、循环流水实现循环间、循环内的并行加速；使用数组分割、流传输实现数据的细粒度并行缓存与计算；使用ARM+FPGA架构，ARM完成对系统的控制，将计算卸载到FPGA并行加速。实验结果表明，并行加速优化后的ARM+FPGA方案相较于单ARM方案最高可达10倍的加速效果，而且增加的资源消耗在可接受范围内，矩阵规模越大非零值越多加速效果越明显。研究成果在边缘端实施SpMV计算方面有一定实用价值。

关键词: 稀疏矩阵向量乘（SpMV）异构加速器硬件加速

DOI：10.20079/j.issn.1001-893x.220917001

基金项目:

Design of an ARM and FPGA Heterogeneous Accelerator for SpMV Computation

ZHU Mingda,XUE Jiqing,AI Chunyao

(College of Information Science and Engineering,China University of Petroleum,Beijing 102249,China)

Abstract:

To address the problem of inefficient implementation of sparse matrix-vector multiplication(SpMV) at the edge, the authors study the storage format of sparse matrix and field programmable gate array(FPGA) acceleration method of SpMV and propose a multi-port modified compressed row format(MCSR) acceleration method combined with task-level data-level hardware optimization in ARM+FPGA architecture.Computational parallelism is improved by using multiple ports to access data in parallel.Parallel acceleration between and within loops is achieved using dataflow and pipeline.Fine-grained parallel caching and computation of data is achieved using array partition and stream transfer.The ARM+FPGA architecture is used,with ARM completing the control of the system and offloading the computation to the FPGA for parallel acceleration.Experimental results show that the parallel acceleration optimized ARM+FPGA scheme can achieve up to 10 times acceleration compared with the single ARM scheme.And the increased resource consumption is within the acceptable range.The results also show that the larger the matrix size,the more non-zero value,the more obvious the acceleration effect.The research results are of practical value in the implementation of SpMV computing at the edge.

Key words: sparse matrix-vector multiplication(SpMV) heterogeneous accelerator hardware acceleration