欢迎访问《电讯技术》编辑部网站！

首页 期刊简介编委会征稿启事出版道德声明审稿流程读者订阅论文查重联系我们English

引用本文：

赵彦,孙俊.一种并行化的改进型灰狼分簇算法[J].电讯技术，2020，60（10）： - . [点击复制]
ZHAO Yan,SUN Jun.A Parallelized Improved Gray Wolf Clustering Algorithm[J].，2020，60（10）： - . [点击复制]

本文已被：浏览 1292次下载 51次	码上扫一扫！
一种并行化的改进型灰狼分簇算法
赵彦,孙俊
0 字体:加大+\|默认\|缩小-
(1.江苏信息职业技术学院物联网工程学院,江苏无锡 214153;2.江南大学人工智能与模式识别国际联合实验室,江苏无锡 214122)

摘要:

针对传统的分簇算法在解决超大规模数据集的分簇问题上不具有高效的时间和空间复杂度且易于陷入局部最优的问题，提出了改进型灰狼分簇算法（Improved Gray Wolf Clustering Algorithm，IGWCA），将灰狼行为规则与灰狼狩猎策略相融合，同时引入狄利克雷分布（Dirichlet Distribution）实现先验，在基准数据集上完成IGWCA与其他分簇算法的对比分析。实验结果表明IGWCA不仅具有较强的探索和开发能力，还具有较小的分散度。使用Hadoop框架的MapReduce模型实现IGWCA的并行化（IGWCA on MapReduce，IGWCA-MR），通过F-Measure和平均运行时间验证IGWCA-MR的分簇质量，并在真实数据集上验证了IGWCA-MR的运行时间和加速性能。实验结果证明，IGWCA-MR可以有效解决超大规模数据集的分簇问题，是一种高效的替代算法。

关键词: 大数据分析数据挖掘分簇算法灰狼算法狄利克雷分布

DOI：

基金项目:国家自然科学基金资助项目(61672263)；江苏省自然科学基金资助项目(BK20131097)；江苏省高职院校教师专业带头人高端研修(个人访学研修)基金项目(2019GRGDYX015)；2017年江苏高校“青蓝工程”基金资助项目(苏教师〔2017〕15号)；江苏省第五期“333工程”第三层次培养对象基金资助项目(苏人才办〔2018〕6号)；学院教学团队项目（苏信院教〔2020〕4号）；学院科研课题(JSITKY201804)

A Parallelized Improved Gray Wolf Clustering Algorithm

ZHAO Yan,SUN Jun

(1.Internet of Things Engineering College,Jiangsu Vocational College of Information Technology,Wuxi 214153,China; 2.International Joint Laboratory of Pattern Recognition and Artificial Intelligence,Jiangnan University,Wuxi 214122,China)

Abstract:

For the problem that the traditional clustering algorithm does not have efficient time and space complexity in solving the clustering problem of very large-scale data sets,and is easy to fall into local optimization,the improved gray wolf clustering algorithm(IGWCA) is proposed in which gray wolf behavior rules are combined with gray wolf hunting strategies.Dirichlet distribution is introduced to achieve a priori,and comparative analysis between IGWCA and other clustering algorithms on the benchmark data set shows that IGWCA has not only strong exploration and development capabilities,but also a small degree of dispersion.The MapReduce model of the Hadoop framework is used to realize the parallelization of IGWCA，or IGWCA on MapReduce(IGWCA-MR),the clustering quality of IGWCA-MR is verified by F-Measure and average running time,as well as the running time and acceleration performance of IGWCA-MR on the real data set.Experimental results prove that IGWCA-MR can effectively solve the clustering problem of very large-scale data sets,and is an efficient alternative algorithm.

Key words: big data analysis data mining clustering algorithm gray wolf algorithm Dirichlet distribution