Hadoop平台下基于优化X-means算法的大数据聚类研究
DOI:
CSTR:
作者:
作者单位:

广东农工商职业技术学院

作者简介:

通讯作者:

中图分类号:

基金项目:

广东省普通高校重点领域专项(新一代信息技术)课题(2021ZDZX1138)


Research on Large Cluster Analysis Based on Optimized X-means Algorithm under Hadoop Platform
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有聚类方法对数据处理规模的局限性,解决数据聚类效果差的问题,在Hadoop平台的支持下提出基于优化X-means算法的大数据聚类方法。利用Hadoop平台架构与函数采集大数据样本,通过缺失补偿、噪声滤波、归一化等步骤,实现初始样本数据的预处理。选择大数据聚类中心,分别提取聚类中心数据与其他所有数据样本的特征,计算数据样本与聚类中心之间的特征相似度。以相似度度量结果为聚类判定条件,利用优化X-means算法确定数据所属类型,最终实现大数据的聚类处理工作。通过聚类效果测试实验得出结论:在有、无两种实验条件下,与传统聚类方法相比,优化设计方法的查全率和查准率分别提升了4.75%和4.5%,同时优化聚类方法得出数据具有更高利用率。

    Abstract:

    In response to the limitations of existing clustering methods on data processing scale and the problem of poor data clustering performance, a big data clustering method based on optimized X-means algorithm is proposed with the support of Hadoop platform. Utilizing the Hadoop platform architecture and functions to collect big data samples, preprocessing the initial sample data is achieved through steps such as missing compensation, noise filtering, and normalization. Select a big data clustering center, extract the features of the clustering center data and all other data samples, and calculate the feature similarity between the data samples and the clustering center. Using similarity measurement results as clustering criteria, the optimized X-means algorithm is used to determine the type of data, ultimately achieving clustering processing of big data. Through clustering effectiveness testing experiments, it was concluded that under both experimental conditions, the recall and precision of the optimized design method were improved by 4.75% and 4.5%, respectively, compared to traditional clustering methods. At the same time, the optimized clustering method resulted in higher data utilization.

    参考文献
    相似文献
    引证文献
引用本文

张鹏飞,江岸,熊念. Hadoop平台下基于优化X-means算法的大数据聚类研究计算机测量与控制[J].,2023,31(12):284-289.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-06-13
  • 最后修改日期:2023-07-05
  • 录用日期:2023-07-06
  • 在线发布日期: 2023-12-27
  • 出版日期:
文章二维码