| 引用本文: |
-
何嘉星,李艳文,李柔,等.面向神经网络处理器的FFT算子设计与实现——以昇腾910为例[J].电讯技术,2025,(12):2160 - 2172. [点击复制]
- HE Jiaxing,LI Yanwen,LI Rou,et al.Design and Implementation of FFT Operator for Neural Network Processor—Taking Ascend910 as an Example[J].,2025,(12):2160 - 2172. [点击复制]
|
|
| 摘要: |
| 为实现神经网络处理器快速傅里叶变换(Fast Fourier Transform,FFT)算子,探讨了FFT算法在神经网络处理器中的高性能和高精度并行计算问题。以华为昇腾910(Ascend910)神经网络处理器为例,基于华为公司提出的神经网络计算架构(Compute Architecture for Neural Network,CANN)设计了缓存分片、高效转置、矢量化蝶形计算的FFT高性能计算方案,实现了半精度和单精度任意长度复数序列的FFT计算。实验结果表明,精度和性能在序列达到一定长度后都优于中央处理器(Central Processing Unit,CPU),与英伟达(NVIDIA)的统一计算设备架构快速傅里叶变换(Compute Unified Device Architecture Fast Fourier Transform,cuFFT)相比,在半精度数据的典型长度上性能和精度最多分别提升了16.5倍和48%。 |
| 关键词: 神经网络处理器 快速傅里叶变换 昇腾达芬奇架构 特定域架构 神经网络计算架构 |
| DOI:10.20079/j.issn.1001-893x.240607004 |
|
| 基金项目:国家自然科学基金资助项目(62275222,61901397);中央高校基本科研业务费项目(2682020CX87);四川省科技计划(2020YJ0014);西南交通大学种子基金(2682021GF027) |
|
| Design and Implementation of FFT Operator for Neural Network Processor—Taking Ascend910 as an Example |
| HE Jiaxing,LI Yanwen,LI Rou,ZHAI Pinghua,LI Yang,ZOU Xihua,PAN Wei,YAN Lianshan |
| (School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China) |
| Abstract: |
| In order to realize the fast Fourier transform(FFT) operator of the neural network processor,the authors discuss the issues of high-performance and high-precision parallel computing of the FFT algorithm in the neural network processor.Taking Huawei Ascend910 neural network processor as an example,based on Huawei搒 compute architecture for neural network(CANN),a high-performance computing solution for the FFT with cache slicing,efficient transposition,and vectorized butterfly calculations is designed,and the FFT calculations for complex sequences of arbitrary length in half-precision and single-precision are realized.Experimental results show that the accuracy and performance outperform central processing unit(CPU) when the sequences reach a certain length.Furthermore,compared with NVIDIA compute unified device architecture fast Fourier transform(cuFFT),the proposed solution exhibits performance improvements of up to 16.5 times and 48% in typical lengths of half-precision data,in terms of performance and accuracy respectively. |
| Key words: neural network processor fast Fourier transform Ascend Davinci architecture domain specific architecture compute architecture for neural networks |