云环境下的高效K-Medoids并行算法
作者:
作者单位:

常州大学 信息科学与工程学院,常州大学 信息科学与工程学院,常州大学 信息科学与工程学院,

中图分类号:

TP311

基金项目:

国家自然科学(11271057,51176016);江苏省自然科学(BK2009535)


Highly efficient parallel algorithm of K-Medoids in cloud environment
Author:
Affiliation:

School of Information Science Engineering,ChangZhou University,School of Information Science Engineering,ChangZhou University,School of Information Science Engineering,ChangZhou University,

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [13]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    传统聚类算法K-Medoids对初始点的选择具有随机性,容易产生局部最优解;替换聚类中心时采用的全局顺序替换策略降低了算法的执行效率;同时难以适应海量数据的运算。针对上述问题,提出了一种云环境下的改进K-Medoids算法,该改进算法结合密度法和最大最小原则得到优化的聚类中心,并在Canopy区域内对中心点进行替换,再采用优化的准则函数,最后利用顺序组合MapReduce编程模型的思想实现了算法的并行化扩展。实验结果表明,该改进算法与传统算法相比对初始中心的依赖降低,提高了聚类的准确性,减少了聚类的迭代次数,降低了聚类的时间。

    Abstract:

    Traditional K-Medoids clustering algorithm selects the initial points randomly, which is easy to produce local optimum; when replace the cluster centers, adopted global sequential replacement policy reduces the efficiency of the algorithm; at the same time, it is difficult to adapt to operation of massive data. In response to the above problems, an improved K-Medoids clustering algorithm in cloud environment is proposed. The algorithm combines the density method and Max-Min principle to obtain optimized cluster centers, and replaces centers in the area of Canopy, and adopts optimization criterion function, and finally uses the ideas of sequential composition of MapReduce programming model to achieve the parallel extensions of the algorithm. Result of the experiments shows that the improved method is less dependent on the initial points and reduces the number of iterations and the clustering time.

    参考文献
    [1]Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107–113.
    [2]HAN J, KAMBER M.数据挖掘:概念与技术[M].范明,孟小峰,译.2 版.北京:机械工业出版社,2007.
    [3]Chen Xinquan,Peng Hong,Hu Jingsong. K-medoids subatitution clustering method and a new clustering validity index method[C]//Proc of 6th World Congress on Intelligent Control and Auto-mation,2006:5896-5900.
    [4]张雪萍,龚康莉,赵广才.基于MapReduce的K-Medoids并行算法[J].计算机应用,2013, 33(4):1023-1025,1035.
    [5]Gao Danyang,Yang Bingru. An improved K-medoids clustering algorithm[C]//Proc of the 2nd International Conferenceon Computer and Autonmation Engineering(ICCAE),2010:132-135.
    [6]PARK H S,JUN C H. A simple and fast algorithm for K-medoids clustering[J].Expert Systems with Applications,2009, 36(2): 3336-3341.
    [7]夏宁霞,苏一丹,覃希.一种高效的K-medoids 聚类算法[J].计算机应用研究, 2010, 27(12): 4517-4519.
    [8]姚丽娟,罗可,孟颖.一种新的K-medoids聚类算法[J].计算机工程与应用,2013, 49(19): 153-157.
    [9]孟颖,罗可,姚丽娟,王 琳.一种基于ACO的K-medoids聚类算法[J].计算机工程与应用,2012, 48(16): 136-139.
    [10]马箐,谢娟英.基于粒计算的K-medoids聚类算法[J].计算机应用,2012, 32(7): 1973-1977.
    [11]Isard M, Budiu M, Yu Y, et al. Dryad: Distributed data-parallel programs from sequential building blocks. Proc. of the 2nd European Conf. on Computer Systems (EuroSys), 2007: 59-72.
    [12]孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008, 19(1): 48-61.
    [13]毛典辉.基于MapReduce的Canopy-Kmeans改进算法[J].计算机工程与应用,2012,48(27):22-26.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李媛媛,孙玉强,晁亚,刘阳.云环境下的高效K-Medoids并行算法计算机测量与控制[J].,2016,24(12):58.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2016-07-20
  • 最后修改日期:2016-08-02
  • 录用日期:2016-08-03
  • 在线发布日期: 2017-02-06
文章二维码