quotation:		[Copy]
		[Copy]

This Paper:Browse 1051 Download 760	码上扫一扫！
一种基于可解释深度强化学习的动态频谱接入方法
耿凯,张建照,姚昌华
0 Fontlarge +\|Default\|Small
(1.南京信息工程大学电子与信息工程学院,南京 210044;2.国防科技大学第六十三研究所，南京 210007)

摘要:

针对基于强化学习的动态频谱接入模型性能有限、可解释性差的问题，提出了一种基于权重分析的动态频谱接入方法。采用储备池计算（Reservoir Computing,RC）网络来替代传统的深度Q学习网络(Deep Q-Learning Network,DQN)，以简化网络结构并提高计算效率。同时引入权重分析的可解释方法，通过生成热力图来反映神经网络对不同信道的认知和偏好，从而提高了模型的可解释性。仿真结果表明，在多用户环境中，该算法在平均成功率、平均碰撞率和平均奖励等关键指标上显著优于Q-Learning等传统强化学习算法。相较于DQN+MLP算法，该算法不仅加快了收敛速度，而且在平均成功率达到0.8、平均碰撞率接近0以及平均奖励等关键指标上的表现与之相当。

关键词: 动态频谱接入可解释人工智能储备池计算深度强化学习

DOI：10.20079/j.issn.1001-893x.240206001

基金项目:国家自然科学基金资助项目（62131005,62231012,61971439,U22B2002）；通信抗干扰全国重点实验室基础科研创新基金（稳定支持）项目（IFN20230207）

A Dynamic Spectrum Access Method Based on Explainable Deep Reinforcement Learning

GENG Kai,ZHANG Jianzhao,YAO Changhua

(1.School of Electronic & Information Engineering,Nanjing University of Information Science & Technology,Nanjing 210044,China;2.The 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China)

Abstract:

For the problems of limited performance and poor explainability of dynamic spectrum access model based on reinforcement learning,a dynamic spectrum access method based on weight analysis is proposed.The reservoir computing(RC) network is used to replace the traditional Deep Q-Learning Network(DQN) to simplify the network structure and improve the computing efficiency.At the same time,the explainability method of weight analysis is introduced to reflect the cognition and preference of neural network to different channels by generating heat map,so as to improve the explainability of the model.The simulation results show that the proposed algorithm is significantly better than traditional reinforcement learning algorithms such as Q-Learning in key indicators such as average success rate,average collision rate and average reward in multi-user environment.Compared with DQN+MLP,this algorithm not only speeds up the convergence speed,but also performs as well in key indicators such as average success rate of 0.8,average collision rate close to 0 and average reward.

Key words: dynamic spectrum access explainable artificial intelligence reservoir computing deep reinforcement learning