| 摘要: |
| 在动态干扰环境下的卫星通信系统中,各信道的质量和干扰功率存在差异。有限的频谱资源和复杂的干扰环境对抗干扰通信决策提出了资源分配和业务需求的挑战,即如何在避开干扰频率和优化功率的同时,实现资源的高效利用。为解决这一问题,提出了一种基于多奖励值函数的深度强化学习抗干扰算法。该算法将发送方、接收方与干扰方之间的交互建模为马尔可夫决策过程。通过优化信道切换与功率切换代价的奖励函数,引入频率切换与功率切换机制,分析相邻时隙频谱中的干扰特征,并将交互过程中采集到的干扰信号特征与信道信息结合,用于训练抗干扰策略。该策略实现了频率域与功率域的联合抗干扰决策。仿真结果表明,该算法能够有效降低系统的受干扰概率,加快算法收敛速度,并优化功率资源的利用效率。 |
| 关键词: 智能通信抗干扰 联合抗干扰决策 深度强化学习 多奖励值函数 |
| DOI:10.20079/j.issn.1001-893x.240715002 |
|
| 基金项目:国家自然科学基金资助项目(62201596;国防科技大学学校科研计划资助项目(ZK22-45 |
|
| An Intelligent Communication Anti-interference Decision Algorithm Based on Multiple Reward Value DDQN |
| LING Yao,XIE Shijun,LIANG Hao,FENG Jiao,GAO Weijie |
| (1.School of Electronic and Information Engineering,Nanjing University of Information Science & Technology,Nanjing 210044,China;2.The 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China) |
| Abstract: |
| In satellite communication systems operating in dynamic interference environments,the quality of channels and the interference power vary.Limited spectrum resources and complex interference environments pose challenges for anti-interference communication decisions,particularly in terms of resource allocation and service demands.Specifically,the challenge lies in efficiently utilizing resources while avoiding interference frequencies and optimizing power.To address this issue,a deep reinforcement learning-based anti-interference algorithm with multiple reward functions is proposed.The algorithm models the interaction between the transmitter,receiver,and interferer as a Markov decision process.By optimizing the reward function associated with the costs of channel and power switching,it introduces mechanisms for both frequency and power switching,analyzes the interference characteristics in the spectrum of adjacent time slots,and integrates the interference signal features collected during the interaction with channel information to train an anti-interference strategy.This strategy enables joint anti-interference decision-making in both the frequency and power domains.Simulation results demonstrate that the algorithm effectively reduces the probability of interference,accelerates convergence,and optimizes the utilization of power resources. |
| Key words: intelligent communication anti-interference joint anti-interference decision deep reinforcement learning multiple reward value functions |