quotation:		[Copy]
		[Copy]

This Paper:Browse 6481 Download 1283
基于深度强化学习的反向散射网络资源分配机制
江巍,朱江
0 Fontlarge +\|Default\|Small
(重庆邮电大学 a.移动通信教育部工程研究中心;b.移动通信技术重庆市重点实验室，重庆 400065)

摘要:

为了提升反向散射网络中物联网设备的平均吞吐量，提出了一种资源分配机制，构建了用户配对和时隙分配联合优化资源分配模型。由于该模型直接利用深度强化学习(Deep Reinforcement Learning,DRL )算法求解导致动作空间维度较高且神经网络复杂，故将其分解为两层子问题以降低动作空间维度：首先，基于深度强化学习算法，利用历史信道信息推断当前的信道信息以进行最优的用户配对；然后，在用户固定配对的情况下，基于凸优化算法，以最大化物联网设备总吞吐量为目标进行最优的时隙分配。仿真结果表明，与其他资源分配方法相比，所提资源分配方法能有效提升系统吞吐量，且有较好的信道适应性和收敛性。

关键词: 反向散射网络物联网设备资源分配深度强化学习吞吐量最大化

DOI：

基金项目:国家自然科学基金资助项目（61771084）；重庆市科委自然科学基金（KJQN201800834）

Backscatter network resource allocation algorithm based on deep reinforcement learning

JIANG Wei,ZHU Jiang

(a.Engineering Research Center of Mobile Communications of the Ministry of Education; b.Chongqing Key Laboratory of Mobile Communications Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)

Abstract:

In order to improve the average throughput of the Internet of Things(IoT) devices in the backscatter network,a resource allocation mechanism is proposed,and a joint optimization resource allocation model for user pairing and time slot allocation is constructed.Because the model directly uses deep reinforcement learning(DRL) to solve the problem,the action space has a high dimensionality and the network is complex,so the model is divided into two sub-problems to reduce the action space dimensionality.First,based on the DRL algorithm,historical channel information is used to infer the current channel information to perform optimal user pairing.Then,in the case of fixed user pairing,based on the convex optimization algorithm,the optimal time slot allocation is performed with the goal of maximizing the total throughput of IoT devices.The simulation results show that the proposed resource allocation method can effectively improve the system throughput,and has better channel adaptability and convergence when compared with other resource allocation methods.

Key words: backscatter network Internet of Things(IoT) device resource allocation deep reinforcement learning throughput maximum