摘要: |
针对斜划分决策树算法普遍存在时间效率低、部分算法仅能应用于二分类问题,提出了一种基于加权距离的聚类决策树算法。通过Relief-F算法为预测属性计算权重,并将权重用于树结点中数据的聚类过程,使用分簇结果对结点进行多路划分,得到可直接用于多分类问题的决策树。理论分析和实验结果表明,该算法与经典轴平行决策树相比,拥有更好的泛化能力以及相近的算法时间复杂度,与大部分斜决策树相比,在付出更少计算代价的前提下,获得了近似的正确率以及模型简洁度。 |
关键词: 机器学习 决策树 聚类 属性加权 多路划分 |
DOI: |
|
基金项目:国家自然科学基金青年基金资助项目(61602075) |
|
A Weighted Clustering Splitting Decision Tree Algorithm |
LIU Zhenyu,CHU Na |
(1.School of Computer and Science and Engineering,Northeastern University,Shenyang 110819,China;
2.College of Computer and Software,Dalian Neusoft Information University,Dalian 116023,China) |
Abstract: |
To solve the problems that the time efficiency of the oblique decision tree is low and a few algorithms are used only in binary classification,a clustering decision tree algorithm based on weighted distance(WCDT) is proposed.Weights are calculated for prediction attributes by Relief-F,and applied during the clustering of tree nodes.The clustering results are serviced in multi-way splits in order to apply in the decision tree for multi-classification problems.Theoretical analysis and experimental results show that compared with the classical axis parallel decision trees,the proposed algorithm has better generalization ability and similar algorithm time complexity;and compared with the most oblique decision trees,it has obtained an approximate accuracy and simplicity in case of less computational. |
Key words: machine learning decision tree clustering attribute weight multi-way splits |