quotation:		[Copy]
		[Copy]

This Paper:Browse 2555 Download 1686	码上扫一扫！
SWIPT-D2D通信中基于深度强化学习的资源分配
刘兴鑫,李君,李正权
0 Fontlarge +\|Default\|Small
(1.南京信息工程大学电子与信息工程学院，南京 210044;2.无锡学院电子信息工程学院，江苏无锡 214105;3.江南大学轻工过程先进控制教育部重点实验室，江苏无锡 214122;4.北京邮电大学网络与交换技术国家重点实验室，北京 100876)

摘要:

针对信道状态信息未知SWIPT-D2D（（Simultaneous Wireless Information and Power Transfer Device to Device）无线通信网络环境下设备间信号干扰以及设备能量损耗问题，提出通过使用近端策略优化（Proximal Policy Optimization，PPO）算法，在满足蜂窝用户通信质量要求的前提下同时对D2D用户的资源块、发射功率以及功率分割比三部分进行联合优化。仿真结果表明，所提算法相比于其他算法能够为D2D用户制定更好的资源分配方案，在保证蜂窝用户保持较高通信速率的同时使D2D用户获得更高的能效。同时，当环境中用户数量增加时，所提算法相比于Dueling Double DQN(Deep Q-Network)以及DQN算法，D2D能效分别平均提高了15.95%和23.59%，当通信网络规模变大时所提算法具有更强的鲁棒性。

关键词: SWIPT-D2D 资源分配深度强化学习联合优化

DOI：10.20079/j.issn.1001-893x.230202003

基金项目:未来网络科研基金项目（FNSRFP-2021-YB-11）

Resource Allocation Based on Deep Reinforcement Learning in SWIPT-D2D Communication

LIU Xingxin,LI Jun,LI Zhengquan

(1.School of Electronics and Information Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China;2.School of Electronic Information Engineering,Wuxi University,Wuxi 214105,China;3.Key Laboratory of Advanced Process Control for Light IndustryMinistry of Education,Jiangnan University,Wuxi 214122,China;4.State Key Laboratory of Network and Switching Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China)

Abstract:

To address the problems of inter-device signal interference and device energy loss in a channel state information-unknown simultaneous wireless information and power transfer- device-to-device (SWIPT-D2D) wireless communication network environment,the authors propose to use the Proximal Policy Optimization(PPO) algorithm to satisfy the communication quality requirements of cellular users,while the resource block,transmit power,and power split ratio of D2D users are simultaneously reduced.The proposed algorithm jointly optimizes the resource block,transmit power and power split ratio of D2D users while satisfying the communication quality requirements of cellular users.Simulation results show that the proposed algorithm can develop a better resource allocation scheme for D2D users than other algorithms,which can ensure a higher communication rate for cellular users while achieving higher energy efficiency for D2D users.Furthermore,when the number of users in the environment increases,the proposed algorithm improves the D2D energy efficiency by 15.95% and 23.59% on average compared with the Dueling Double DQN(Deep Q-Network) and DQN algorithms,respectively,and the algorithm is more robust when the communication network size becomes larger.

Key words: SWIPT-D2D resource allocation deep reinforcement learning joint optimization