基于FPGA加速的低功耗的MobileNetV2网络识别系统

首页 > 过刊浏览>2023年第31卷第5期 >221-227

基于FPGA加速的低功耗的MobileNetV2网络识别系统
DOI:
                        
CSTR:
                        
作者:
                        
作者单位:福州大学电气工程与自动化学院
作者简介:
通讯作者:
中图分类号:
基金项目:

FPGA-accelerated Low-power MobileNetV2 Network Identification System

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

近年来，卷积神经网络由于其出色的性能被广泛应用在各个领域，如图像识别、语音识别与翻译和自动驾驶等；但是传统卷积神经网络（Convolutional Neural Network，CNN）存在参数多，计算量大，部署在CPU与GPU上推理速度慢、功耗大的问题。针对上述问题，采用量化感知训练（Quantization Aware Training，QAT）的方式在保证图像分类准确率的前提下，将网络参数总量压缩为原网络的1/4；将网络权重全部部署在FPGA的片内资源上，克服了片外存储带宽的限制，减少了访问片外存储资源带来的功耗；在MobileNetV2网络的层内以及相邻的点卷积层之间提出一种协同配合的流水线结构，极大的提高了网络的实时性；提出一种存储器与数据读取的优化策略，根据并行度调整数据的存储排列方式及读取顺序，进一步节约了片内BRAM资源。最终在Xilinx的Virtex-7 VC707开发板上实现了一套性能优、功耗小的轻量级卷积神经网络MobileNetV2识别系统，200HZ时钟下达到了170.06 GOP/s的吞吐量,功耗仅为6.13W，能耗比达到了27.74 GOP/s/W，是CPU的92倍，GPU的25倍，性能较其他实现有明显的优势。

Abstract:

In recent years, convolutional neural networks have been widely used in various fields, such as image recognition, speech recognition and translation, and autonomous driving, due to their excellent performance. However, traditional Convolutional Neural Network (CNN) has the problems of many parameters, large computation, slow inference speed and high power consumption when deployed on CPU and GPU. To address the above problem, Quantization Aware Training (QAT) is used to compress the total number of network parameters to 1/4 of the original network while ensuring the accuracy of image classification. All the network weights are deployed on the on-chip resources of FPGA, which overcomes the limitation of off-chip storage bandwidth and reduces the power consumption caused by accessing off-chip storage resources. A cooperative pipeline structure is proposed within the layers of the MobileNetV2 network and between adjacent point convolutional layers, which greatly improves the real-time performance of the network. An optimization strategy for memory and data reading is proposed to adjust the data storage arrangement and reading order according to the parallelism degree, further saving on-chip BRAM resources. Finally, a lightweight convolutional neural network MobileNetV2 recognition system with excellent performance and low power consumption was implemented on Xilinx's Virtex-7 VC707 development board. The 200HZ clock reached the throughput of 170.06 GOP/s, with power consumption of only 6.13W, energy consumption ratio of 27.74 GOP/s/W, 92 times that of CPU and 25 times that of GPU. The performance has obvious advantages over other implementations.

参考文献

相似文献

引证文献

引用本文

孙小坚,林瑞全,方子卿,马驰.基于FPGA加速的低功耗的MobileNetV2网络识别系统计算机测量与控制[J].,2023,31(5):221-227.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-10-13
最后修改日期:2022-12-26
录用日期:2022-11-03
在线发布日期: 2023-05-19
出版日期:

引用本文

相关视频

分享

文章指标

历史

文章二维码