Abstract:Aiming at the low efficiency of weight gradient calculation in neural network training accelerator, this paper designs a floating-point operation optimization architecture of high-performance convolutional neural network (CNN) training processor. Based on the analysis of the basic principle of CNN training architecture, a training optimization architecture including 32bit, 24bit, 16bit and mixed accuracy is proposed, so as to find the best floating-point format for edge devices with low energy consumption and smaller size. The field programmable gate array (FPGA) verifies that the accelerator engine can be used for the reasoning and training of MNIST handwritten digital data sets. The accuracy of the hybrid convolution 24bit floating-point format formed by the combination of 24bit custom floating-point format and 16bit brain floating-point format can reach more than 93%. TSMC 55nm chip is used to realize the optimized hybrid accuracy accelerator, and the energy consumption of each image is 8.51μJ.