| 摘要: |
| 灾备系统在保障关键信息系统业务连续性方面具有重要作用,而故障预测是其中的核心挑战。传统监督学习方法在该任务中受到两方面制约:一是故障样本稀缺导致类别极度不平衡;二是常用的分类指标难以反映漏报与误报带来的非对称业务代价。为此,提出了一种基于决策优化的深度强化学习故障预测方法(Decision-Optimized Fault Prediction,DOFP)。该方法将故障预测建模为马尔可夫决策过程(Markov Decision Process,MDP),通过智能体在日志特征构成的状态空间中学习动态演变规律,并在每一时刻做出“有风险/无风险”的判定。针对类别不平衡问题,DOFP从数据、批次和算法三个层面引入多重处理机制;同时,设计了一个与业务逻辑紧密耦合的非对称加权混合奖励函数,引导模型在长期优化目标下更关注关键的少数类故障样本。在基于公开云计算平台日志构建的故障预测数据集上,从样本级与机器级两个维度对DOFP进行了系统性验证。实验结果表明,DOFP在预测精度方面显著优于基线方法,其中机器级F1分数提升了18.9%。 |
| 关键词: 灾备系统 故障预测 深度强化学习 非对称奖励 |
| DOI:10.20079/j.issn.1001-893x.250923002 |
|
| 基金项目:国网四川电力公司科技项目(521947240003) |
|
| A Decision-Optimization-Driven Fault Prediction Method for Disaster Recovery Systems |
| DAI Rui,LIU Cheng,PENG Xiaoqiang,ZENG Yu,LI Jiazhou,LUO Jiaqing |
| (1.Information and Communication Company of State Grid Sichuan Electric Power Company,Chengdu 610095,China;2.School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China) |
| Abstract: |
| Disaster recovery systems play a crucial role in safeguarding the business continuity of critical information services,with fault prediction being a core challenge.Traditional supervised learning methods face two major limitations in this task:the extreme class imbalance caused by scarce fault samples and the inability of conventional classification metrics to capture the asymmetric business costs associated with false positives and false negatives.To address these issues,a deep reinforcement learning-based method named Decision-Optimized Fault Prediction (DOFP) is proposed.The fault prediction problem is formulated as a Markov Decision Process (MDP),enabling an agent to learn the dynamic evolution patterns within a log-feature-based state space and to make “risky” or “non-risky” decisions at each time step.To mitigate class imbalance,DOFP incorporates a three-level handling mechanism across the data,batch,and algorithm layers.In addition,a business-aware asymmetric weighted hybrid reward function is designed to guide the model toward focusing more on critical minority fault samples under long-term optimization objectives.Experiments conducted on a fault prediction dataset constructed from logs of a public cloud computing platform evaluate DOFP from both the sample-level and machine-level perspectives.The results show that DOFP significantly outperforms multiple baseline approaches in predictive accuracy,achieving an 18.9% improvement in machine-level F1-score. |
| Key words: disaster recovery system fault prediction deep reinforcement learning asymmetric reward |