Abstract:In recent years, convolutional neural networks have been widely used in various fields, such as image recognition, speech recognition and translation, and autonomous driving, due to their excellent performance. However, traditional Convolutional Neural Network (CNN) has the problems of many parameters, large computation, slow inference speed and high power consumption when deployed on CPU and GPU. To address the above problem, Quantization Aware Training (QAT) is used to compress the total number of network parameters to 1/4 of the original network while ensuring the accuracy of image classification. All the network weights are deployed on the on-chip resources of FPGA, which overcomes the limitation of off-chip storage bandwidth and reduces the power consumption caused by accessing off-chip storage resources. A cooperative pipeline structure is proposed within the layers of the MobileNetV2 network and between adjacent point convolutional layers, which greatly improves the real-time performance of the network. An optimization strategy for memory and data reading is proposed to adjust the data storage arrangement and reading order according to the parallelism degree, further saving on-chip BRAM resources. Finally, a lightweight convolutional neural network MobileNetV2 recognition system with excellent performance and low power consumption was implemented on Xilinx's Virtex-7 VC707 development board. The 200HZ clock reached the throughput of 170.06 GOP/s, with power consumption of only 6.13W, energy consumption ratio of 27.74 GOP/s/W, 92 times that of CPU and 25 times that of GPU. The performance has obvious advantages over other implementations.