摘要: |
语音时频特征的时间依赖性、局部相关性、全局相关性等特性,使得传统的神经网络结构与时频域语音增强任务无法完全相适应。针对这一问题,首先利用卷积层代替门控循环单元网络中的全连接层,构成卷积门控循环网络,解决门控循环单元网络在时间维度建模时无法提取频率维度局部相关性的问题;又因卷积层无法提取频率维度的全局相关性,进一步利用注意力机制关注全局相关性的能力,解决卷积门控循环网络无法关注频率维度全局相关性的问题,最后提出了一种深度融合自注意力机制的自注意-卷积门控循环网络。实验证明,该网络通过关注时频域特征的多种特性,有效地提升了语音增强性能。 |
关键词: 语音增强 全局相关性 门控循环单元 自注意力 |
DOI: |
|
基金项目:国家自然科学基金资助项目(61701286) |
|
Convolutional gated recurrent network speech enhancement by integrating self-attention |
HU Shaodong,YUAN Wenhao,SHI Yunlong |
(School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China) |
Abstract: |
The time dependence,local relevance,global relevance and other characteristics of speech time-frequency features make the traditional network structure unable to fully adapt to the time-frequency domain speech enhancement task.To solve this problem,firstly,the convolutional layer is used to replace the fully connected layer in the gated recurrent unit to build a convolutional gated recurrent network.The purpose is to solve the problem that the gated recurrent unit cannot extract the frequency when modeling the time dimension.Because the convolutional layer cannot extract the global relevance of the frequency dimension,the ablity of attention mechanism is further used to focus on the global relevance to solve the problem that the convolutional gated recurrent network cannot extract the global relevance of the frequency dimension.Finally,a self-attention-convolutional gated recurrent network with deep fusion of self-attention mechanism is proposed.Experiments prove that the network effectively improves speech enhancement performance by focusing on multiple characteristics of time-frequency domain characteristics. |
Key words: speech enhancement global relevance gated recurrent unit self-attention |