基于FPGA的深度强化学习硬件加速技术研究
DOI:
CSTR:
作者:
作者单位:

哈尔滨工业大学 电子与信息工程学院

作者简介:

通讯作者:

中图分类号:

TP3

基金项目:


Research on hardware acceleration technology of deep reinforcement learning based on FPGA
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    深度强化学习(Deep Reinforcement Learning, DRL)是机器学习领域的一个重要分支,用于解决各种序贯决策问题,在自动驾驶、工业物联网等领域具有广泛的应用前景。由于DRL具备计算密集型的特点,导致其难以在计算资源受限且功耗要求苛刻的嵌入式平台上进行部署。针对DRL在嵌入式平台上部署的局限性,采用软硬件协同设计的方法,设计了一种面向DRL的FPGA加速器,提出了一种设计空间探索方法,在ZYNQ7100异构计算平台上完成了对Cartpole应用的在线决策任务。实验结果表明,研究在进行典型DRL算法训练时的计算速度和运行功耗相对于CPU和GPU平台具有明显的优势,相比于CPU实现了12.03的加速比,相比于GPU实现了28.08的加速比,运行功耗仅有7.748W,满足了深度强化学习在嵌入式领域的在线决策任务。

    Abstract:

    Deep reinforcement learning (DRL) is an important branch in the field of machine learning. It is used to solve various sequential decision-making problems. It has a wide application prospect in the fields of automatic driving, industrial Internet of things and so on. Because DRL is computationally intensive, it is difficult to deploy on embedded platforms with limited computing resources and demanding power consumption. In view of the limitations of DRL deployment on embedded platform, a DRL oriented FPGA accelerator is designed by using the method of software and hardware collaborative design, and a design space exploration method is proposed. The online decision-making task of cartpole application is completed on the zynq7100 heterogeneous computing platform. The experimental results show that the computing speed and running power consumption of the research in the training of typical DRL algorithm have obvious advantages over the CPU and GPU platform. Compared with the CPU, the CPU achieves an acceleration ratio of 12.03 and the GPU achieves an acceleration ratio of 28.08, and the running power consumption is only 7.748w, which meets the online decision-making task of deep reinforcement learning in the embedded field.

    参考文献
    相似文献
    引证文献
引用本文

凤雷,王宾涛,刘冰,李喜鹏.基于FPGA的深度强化学习硬件加速技术研究计算机测量与控制[J].,2022,30(6):242-247.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-12-20
  • 最后修改日期:2022-01-04
  • 录用日期:2022-01-05
  • 在线发布日期: 2022-06-21
  • 出版日期:
文章二维码