摘要: |
在资源受限的水声网络中,使用软频率复用技术和自适应资源分配技术可以提高网络容量和能量效率。然而,水声信道的长传播时延和时变特性导致用于自适应技术的反馈信道状态信息(Channel State Information,CSI)是时变且过时的。非理想的反馈CSI会降低自适应系统的性能。针对该问题,提出了一种基于多智能体深度Q网络的资源分配(Multi-agent Deep Q Network Based Resource Allocation,MADQN-RA)方法。该方法将水声软频率复用网络视为多智能体系统,并使用过时的反馈CSI序列作为系统状态。通过建立有效的奖励表达式,智能体可以跟踪时变时延水声信道的变化特性并做出相应的资源分配决策。为了进一步提高智能体的决策准确度,同时避免状态空间维度增大时的部分学习成本,结合动态状态长度方法改进了MADQN-RA。仿真结果表明,所提方法实现的系统性能优于基于其他学习的方法和基于信道预测的方法,且更接近理论最优值。 |
关键词: 水声网络 资源分配 反馈信道状态信息 多智能体深度Q网络 动态状态长度 |
DOI:10.20079/j.issn.1001-893x.231117001 |
|
基金项目:国家自然科学基金资助项目(61801372);陕西省教育厅科研计划项目(22JK0454) |
|
Multi-agent Deep Reinforcement Learning Based Resources Allocation for Underwater Acoustic Networks |
LI Mengfan,ZHANG Yuzhi,HAN Xiang,FENG Xiaomei |
(School of Communication and Information Engineering,Xi揳n University of Science and Technology,Xi’an 710054,China) |
Abstract: |
In resource limited underwater acoustic networks,the network capacity and energy efficiency can be improved by using soft frequency reuse technology and adaptive resource allocation technology.However,the underwater acoustic channel has long propagation delays and time-varying features,resulting in the feedback channel state information(CSI) used in adaptive techniques being time-varying and outdated.Imperfect feedback CSI will reduce the performance of adaptive systems.To address this issue,a multi-agent deep Q network based resource allocation(MADQN-RA) method is proposed.The method treats the underwater acoustic soft frequency reuse network as a multi-agent system and employs outdated feedback CSI sequences as the system states.By establishing an effective reward expression,agents can track the properties of time-varying delay underwater acoustic channels and make corresponding resource allocation decisions.To further improve the decision-making accuracy of agents and avoid the partial learning cost of increasing state space dimensions,the MADQN-RA is improved by dynamic state length method.The simulation results show that the system performance achieved through the proposed methods surpasses that of other learning based and channel prediction based methods and converges closer to the theoretically optimal values. |
Key words: underwater acoustic networks resource allocation feedback channel state information multi-agent deep Q network dynamic state length |