quotation:		[Copy]
		[Copy]

This Paper:Browse 2114 Download 1804	码上扫一扫！
基于DDPG的综合化航电系统多分区任务分配优化方法
赵长啸,李道俊,汪鹏辉,田毅
0 Fontlarge +\|Default\|Small
(1.中国民航大学安全科学与工程学院，天津 300300;2.民航航空器适航审定技术重点实验室，天津 300300)

摘要:

综合化航电系统（Integrated Modular Avionics，IMA）通过时空分区机制实现共享资源平台下的多航电功能集成，分区间的任务分配方法的优劣决定着航电系统的整体效能。针对航电任务集合在多分区内的分配调度问题，提出了一种基于深度强化学习的优化方法。构建了航电系统模型与任务模型，以系统资源限制与任务实时性需求为约束，以提高系统资源利用率为优化目标，将任务分配过程描述为序贯决策问题。引入马尔科夫决策模型，建立基于深度确定性策略梯度（Deep Deterministic Policy Gradient，DDPG）法的IMA任务分配模型并提出通用分配架构；引入状态归一化、行为噪声等策略训练技巧，提高DDPG算法的学习性能和训练能力。仿真结果表明，提出的优化算法迭代次数达到500次时开始收敛，分析800次之后多分区内驻留任务方案在能满足约束要求的同时，最低处理效率提升20.55%。相较于传统分配方案和AC(Actor-Critic)算法，提出的DDPG算法在收敛能力、优化性能以及稳定性上均有显著优势。

关键词: 综合模块化航空电子系统（IMA）任务分配及调度深度强化学习 DDPG算法

DOI：10.20079/j.issn.1001-893x.230103001

基金项目:国家重点研发计划（2021YFB1600601）；天津市自然科学基金（21JCQN JC00900）

A DDPG-based Optimization Method for Multi-partition Task Assignment of IMA

ZHAO Changxiao,LI Daojun,WANG Penghui,TIAN Yi

(1.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China;2.Key Laboratory of Civil Aircraft Airworthiness Technology,Tianjin 300300,China)

Abstract:

The integrated modular avionics(IMA) system implements the integration of multiple avionics functions under a shared resource platform through a spatio-temporal partitioning mechanism.The merit of the task distribution method between partitions determines the overall effectiveness of the IMA system.An optimization method based on deep reinforcement learning(DRL) is proposed for the distribution and scheduling of avionics task sets within multiple partitions is proposed.The IMA system model and task model are constructed,and the constraints of system resource and task real-time requirements are used to improve the system resource utilization as the optimization objective.The task distribution process is described as a sequential decision problem.A Markov decision model is introduced to develop a deep deterministic policy gradient(DDPG) algorithm-based IMA task distribution model and a generic distribution architecture is proposed.Policy training techniques such as state normalization and behavioral noise are introduced to improve the learning performance and training capability of the DDPG algorithm.Simulation results show that the proposed optimization algorithm starts to converge after 500 iterations,and the efficiency of distribution scheme is improved by 20.55% while satisfying the constraint requirements after 800 iterations.Compared with the traditional assignment scheme and the Actor-Critic(AC) algorithm,the proposed DDPG algorithm has significant advantages in terms of convergence ability.

Key words: integrated modular avionics(IMA) task allocation and scheduling deep reinforcement learning DDPG algorithm