quotation:		[Copy]
		[Copy]

This Paper:Browse 1413 Download 788	码上扫一扫！
面向MEC多智能体协同任务卸载的深度强化学习算法
张茜,苏冬冬,张聪,李润川
0 Fontlarge +\|Default\|Small
(1.中原工学院 a.人工智能学院;b.计算机学院，郑州450007;2.深圳江行联加智能科技有限公司，广东深圳 518100)

摘要:

针对移动边缘计算中的多用户协同任务卸载场景，提出了一种基于深度强化学习的多智能体协同任务卸载算法(Deep Reinforcement Learning-based Multi-agent Collaborative Task Offloading Algorithm，MCTO-DRL)。考虑到用户移动性、协同性、任务动态优先级以及资源受限等问题，构建了一种多用户协同任务卸载的网络模型。在此基础上建立了端到端优化目标函数，并利用马尔可夫决策过程（Markov Decison Processes，MDP）形式化多任务协同卸载问题。利用双向长短期记忆（Bidirectional Long Short-Term Memory，Bi-LSTM）网络提取状态向量动态时序依赖关系的特征信息，结合强化学习方法建立高维状态与动作之间的关系映射，并设计了一种动态优先级协同采样算法，用于提高多智能体的协同性。实验分析表明，在多智能体协同任务卸载场景中，MCTO-DRL算法最优卸载概率达到86%以上,时隙累积奖励较４种基线算法分别提升约20.0%、16.23%、22.0%、9.44%，并能够适应不同复杂性和需求型的卸载任务。

关键词: 移动边缘计算深度强化学习协同卸载双向长短期记忆（Bi-LSTM）网络

DOI：10.20079/j.issn.1001-893x.240107001

基金项目:河南省科技攻关计划项目(242102211046) ；河南省高等学校重点科研项目(25A520039，24B520048) ；中原工学院优势学科实力提升计划资助(SD202230)；中原工学院研究生教育教学改革研究项目（JG202424，JG202328）；中原工学院基本科研业务费专项资金项目（K2022QN021）

A Deep Reinforcement Learning Algorithm for Multi-agent Collaborative Task Offloading in MEC

ZHANG Qiana,SU Dongdongb,ZHANG Cong,LI Runchuanb

(1a.School of Artificial Intelligence;1b.School of Computer Science,Zhongyuan University of Technology,Zhengzhou 450007,China;2.Jiangxing Intelligence Inc.,Shenzhen 518100,China)

Abstract:

A deep reinforcement learning based multi-agent collaborative task offloading algorithm(MCTO-DRL) is proposed for the multi-user collaborative task offloading scenario in mobile edge computing.Considering the problems of user mobility,collaboration,task dynamic priority and resource constraints,a multi-user collaborative task offloading network model is constructed.On this base,the end-to-end optimization objective function is established,and the multi-task collaborative offloading problem is formalized by using Markov decision processes(MDP).The bidirectional long short-term memory (Bi-LSTM) network is used to extract the feature information of the dynamic time-series dependency of the state vector.Combined with reinforcement learning method,the relationship mapping between high-dimensional state and action is established,and a dynamic priority collaborative sampling algorithm is designed to improve the collaboration of multi-agent.The experimental analysis shows that in the multi-agent collaborative task offloading scenario,the optimal offloading probability of MCTO-DRL algorithm reaches more than 86%.Compared with that of the four baseline algorithms,the time slot cumulative reward is increased by about 20.0%,16.23%,22.0% and 9.44%,respectively.And it can adapt to offloading tasks with different complexity and requirements.

Key words: mobile edge computing deep reinforcement learning collaborative offloading long short-term memory (Bi-LSTM) network