摘要: |
针对传统深度强化学习算法难以快速解决长时序复杂任务的问题,提出了一种引入历史信息和人类知识的深度强化学习方法,对经典近端策略优化(Proximal Policy Optimization,PPO)强化学习算法进行改进,在状态空间引入历史状态以反映环境的时序变化特征,在策略模型中基于人类认知增加无效动作掩膜,禁止智能体进行无效探索,提高探索效率,从而提升模型的训练性能。仿真结果表明,所提方法能够有效解决长时序复杂任务的智能决策问题,相比传统的深度强化学习算法可显著提高模型收敛效果。 |
关键词: 智能决策 深度强化学习 近端策略优化 动作掩膜 |
DOI:10.20079/j.issn.1001-893x.211123005 |
|
基金项目: |
|
An intelligent decision making method based on deep reinforcement learning |
XIONG Rongling,DUAN Chunyi,RAN Huaming,YANG Meng,FENG Yanghe |
(Southwest China Institute of Electronic Technology,Chengdu 610036,China;School of Mathematics,Southwest Jiaotong University,Chengdu 611756,China;College of System Engineering,National University of Defense Technology,Changsha 410003,China) |
Abstract: |
Traditional deep reinforcement learning(DRL) algorithm is hard to solve complex long time-series tasks quickly.A DRL method introducing historical information and human knowledge is proposed.The classical proximal policy optimization(PPO) algorithm is improved by introducing historical information in state space to reflect the temporal changing characteristics of the environment.Invalid action mask is added in the policy model based on human cognition to prohibit the agent from invalid exploration and improve the exploration efficiency,so as to improve the training performance of the model.Simulation results show that the proposed method can solve intelligent decision making problem of complex long time-series tasks efficiently,and improve the convergence performance of the model significantly compared with traditional DRL algorithm. |
Key words: intelligent decision making deep reinforcement learning proximal policy optimization action mask |