基于两阶分区的MapReduce实验室系统负载均衡研究
DOI:
作者:
作者单位:

1.深圳市检验检疫科学研究院;2.深圳市检验检疫科学研究院深圳

作者简介:

通讯作者:

中图分类号:

TP301.6????

基金项目:

国家重点研发计划课题(2019YFC1605401);海关总署课题(2020HK109)。


Research on load balancing of MapReduce laboratory system based on two-tier partition
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在实验室系统处理海量原始数据时,实际应用场景中存在采样率高、偏度(skewness)高的特殊情况,导致在使用两阶分区算法在平衡同构环境下的Reducer节点负载时,无法有效地处理这些问题。为此,引入MapReduce的并行化处理,可以提高实验室系统中采样数据利用率;同时,为了解决数据偏度和采样度高的问题,则采用了ICSC(Improved Cluster Split Combination)分区调度的算法。经过实验证明,基于两阶分区的MapReduce负载均衡算法能够有效减少Mapper和Reducer节点空转的时间。随着数据偏度的增加,算法的执行时长基本不产生变化,即数据偏度对该算法执行时间的影响较小。此外,数据采样度的增加,ICSC分区调度算法也保持着对比模型中最少的时间开销。因此,基于两阶分区的MapReduce负载均衡算法弱化了Reducer节点间的依赖性,并提升MapReduce任务的执行效率和容错率,从而高效地实现MapReduce框架下的实验室系统中数据处理的负载均衡。

    Abstract:

    When processing raw data in a laboratory system, there are special cases of high sampling rate and high skewness in real-world application scenarios, which cannot be effectively dealt with when balancing the load on the Reducer nodes in a homogeneous environment using a two-order partitioning algorithm. Therefore, the parallel processing of MapReduce is introduced to improve the utilization of sampling data in the laboratory system; At the same time, in order to solve the problem of data skewness and high sampling, ICSC (Improved Cluster Split Combination) partition scheduling algorithm is adopted. Experiments show that MapReduce load balancing algorithm based on two-tier partition can effectively reduce the idle time of Mapper and Reducer nodes. With the increase of data skewness, the execution time of the algorithm is basically unchanged, that is, data skewness has little impact on the execution time of the algorithm. In addition, with the increase of data sampling, ICSC partition scheduling algorithm also maintains the minimum time cost in the comparison model. Therefore, the MapReduce load balancing algorithm based on two-tier partitions weakens the dependency between the reducer nodes, and improves the execution efficiency and fault tolerance of MapReduce tasks, thus effectively realizing the load balancing of data processing in the laboratory system under the MapReduce framework.

    参考文献
    相似文献
    引证文献
引用本文

郑文丽,熊贝贝,程立勋,蔡伊娜,包先雨.基于两阶分区的MapReduce实验室系统负载均衡研究计算机测量与控制[J].,2023,31(4):252-257.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-11-11
  • 最后修改日期:2022-12-19
  • 录用日期:2023-01-03
  • 在线发布日期: 2023-04-24
  • 出版日期: