quotation:[Copy]
[Copy]
【Print page】 【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 161   Download 74  
基于可扩展子空间学习的数据流聚类方法
尹宏伟,倪钰洲,胡文军
0
(1.湖州师范学院 信息工程学院,浙江 湖州 313000;2.浙江省现代农业资源智慧管理与应用研究重点实验室,浙江 湖州 313000;3.湖州市水域机器人技术重点实验室,浙江 湖州 313000)
摘要:
传统数据流聚类方法缺乏对高维数据的在线降维能力,导致其聚类性能受限。为解决此问题,提出了一种基于可扩展子空间学习的数据流聚类方法(Scalable Subspace Learning for Clustering Data Streams,S2狶CStream。首先,通过可扩展子空间学习建立历史数据与新增数据之间的投影关系,将新增数据投影至历史数据张成的子空间中,以实时获取其聚类划分。其次,为保持不同时刻聚类划分的准确性,对持续到达的数据流进行数据分布的一致性检测,捕获其中存在的概念漂移,并结合回溯机制对聚类划分进行调整以适应动态变化的数据分布。最后,通过在多个真实数据集上进行测试,验证了所提方法在处理高维数据流的效能。具体而言,S2狶CStream在保持较高聚类准确性的同时,在应对概念漂移时,处理时间明显优于EmCStream。
关键词:  数据流聚类  子空间学习  可扩展子空间学习  概念漂移检测
DOI:10.20079/j.issn.1001-893x.240618002
基金项目:国家自然科学基金资助项目(62206094;湖州市公益性应用研究项目(2021GZ05;江苏省网络空间安全工程实验室开放课题(SDGC2237;湖州师范学院研究生科研创新项目(2024KYCX62
Scalable Subspace Learning for Clustering Data Streams
YIN Hongwei,NI Yuzhou,HU Wenjun
(1.School of Information Engineering,Huzhou University,Huzhou 31300,China;2.Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources,Huzhou 313000,China;3.Huzhou Key Laboratory of Aquatic Robot Technology,Huzhou 313000,China)
Abstract:
Traditional data stream clustering methods lack online dimensionality reduction capabilities for high-dimensional data,leading to limited clustering performance.To address this issue,a Scalable Subspace Learning for Clustering Data Streams(S2狶CStream method is proposed.Firstly,this method establishes a projection relationship between historical data and new data through scalable subspace learning,projecting the new data into the subspace spanned by historical data to obtain its clustering assignment in real-time.Secondly,to maintain the accuracy of clustering assignments over time,the method performs consistency detection of data distribution on the continuously arriving data stream,capturing concept drifts and adjusting clustering assignments through a backtracking mechanism to adapt to dynamically changing data distributions.Finally,the proposed method is validated on multiple real-world datasets,demonstrating its efficiency in handling high-dimensional data streams.Specifically,S2狶CStream maintains high clustering accuracy while significantly outperforming EmCStream in processing time when handling concept drift.
Key words:  data stream clustering  subspace learning  scalable subspace learning  concept drift detection