quotation:		[Copy]
		[Copy]

This Paper:Browse 8279 Download 2593
基于深度强化学习的智能决策方法
熊蓉玲,段春怡,冉华明,杨萌,冯旸赫
0 Fontlarge +\|Default\|Small
(中国西南电子技术研究所，成都 610036;西南交通大学数学学院，成都 611756;国防科技大学系统工程学院，长沙 410003)

摘要:

针对传统深度强化学习算法难以快速解决长时序复杂任务的问题，提出了一种引入历史信息和人类知识的深度强化学习方法，对经典近端策略优化（Proximal Policy Optimization，PPO）强化学习算法进行改进，在状态空间引入历史状态以反映环境的时序变化特征，在策略模型中基于人类认知增加无效动作掩膜，禁止智能体进行无效探索，提高探索效率，从而提升模型的训练性能。仿真结果表明，所提方法能够有效解决长时序复杂任务的智能决策问题，相比传统的深度强化学习算法可显著提高模型收敛效果。

关键词: 智能决策深度强化学习近端策略优化动作掩膜

DOI：10.20079/j.issn.1001-893x.211123005

基金项目:

An intelligent decision making method based on deep reinforcement learning

XIONG Rongling,DUAN Chunyi,RAN Huaming,YANG Meng,FENG Yanghe

(Southwest China Institute of Electronic Technology,Chengdu 610036,China;School of Mathematics,Southwest Jiaotong University,Chengdu 611756,China;College of System Engineering,National University of Defense Technology,Changsha 410003,China)

Abstract:

Traditional deep reinforcement learning(DRL) algorithm is hard to solve complex long time-series tasks quickly.A DRL method introducing historical information and human knowledge is proposed.The classical proximal policy optimization(PPO) algorithm is improved by introducing historical information in state space to reflect the temporal changing characteristics of the environment.Invalid action mask is added in the policy model based on human cognition to prohibit the agent from invalid exploration and improve the exploration efficiency,so as to improve the training performance of the model.Simulation results show that the proposed method can solve intelligent decision making problem of complex long time-series tasks efficiently,and improve the convergence performance of the model significantly compared with traditional DRL algorithm.

Key words: intelligent decision making deep reinforcement learning proximal policy optimization action mask