首页期刊视频编委会征稿启事出版道德声明审稿流程读者订阅论文查重联系我们English
引用本文
  • 胡少东,袁文浩,时云龙.融合自注意力的卷积门控循环网络语音增强[J].电讯技术,2022,(7): - .    [点击复制]
  • HU Shaodong,YUAN Wenhao,SHI Yunlong.Convolutional gated recurrent network speech enhancement by integrating self-attention[J].,2022,(7): - .   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 6106次   下载 1 本文二维码信息
码上扫一扫!
融合自注意力的卷积门控循环网络语音增强
胡少东,袁文浩,时云龙
0
(山东理工大学 计算机科学与技术学院,山东 淄博 255000)
摘要:
语音时频特征的时间依赖性、局部相关性、全局相关性等特性,使得传统的神经网络结构与时频域语音增强任务无法完全相适应。针对这一问题,首先利用卷积层代替门控循环单元网络中的全连接层,构成卷积门控循环网络,解决门控循环单元网络在时间维度建模时无法提取频率维度局部相关性的问题;又因卷积层无法提取频率维度的全局相关性,进一步利用注意力机制关注全局相关性的能力,解决卷积门控循环网络无法关注频率维度全局相关性的问题,最后提出了一种深度融合自注意力机制的自注意-卷积门控循环网络。实验证明,该网络通过关注时频域特征的多种特性,有效地提升了语音增强性能。
关键词:  语音增强  全局相关性  门控循环单元  自注意力
DOI:
基金项目:国家自然科学基金资助项目(61701286)
Convolutional gated recurrent network speech enhancement by integrating self-attention
HU Shaodong,YUAN Wenhao,SHI Yunlong
(School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China)
Abstract:
The time dependence,local relevance,global relevance and other characteristics of speech time-frequency features make the traditional network structure unable to fully adapt to the time-frequency domain speech enhancement task.To solve this problem,firstly,the convolutional layer is used to replace the fully connected layer in the gated recurrent unit to build a convolutional gated recurrent network.The purpose is to solve the problem that the gated recurrent unit cannot extract the frequency when modeling the time dimension.Because the convolutional layer cannot extract the global relevance of the frequency dimension,the ablity of attention mechanism is further used to focus on the global relevance to solve the problem that the convolutional gated recurrent network cannot extract the global relevance of the frequency dimension.Finally,a self-attention-convolutional gated recurrent network with deep fusion of self-attention mechanism is proposed.Experiments prove that the network effectively improves speech enhancement performance by focusing on multiple characteristics of time-frequency domain characteristics.
Key words:  speech enhancement  global relevance  gated recurrent unit  self-attention
安全联盟站长平台