Abstract:A novel algorithm named as real-time object detection algorithm based on attention mechanism and multi-spatial pyramid pooling is proposed to avoid the disadvantages of an enhancement to the representational power of the deep feature maps of the feature fusion network for the spatial pyramid pooling module, higher computational complexity and the difficulty in highlighting important channel features for the feature maps of the detection head network in YOLOv4 algorithm. Since multiple receptive fields are fused after extracting multi-scale information by multi-space pyramid pooling, the characterization ability of the shallow, middle and deep feature maps is strengthened for the feature fusion network. By utilizing the squeeze-and-excitation channel attention mechanism to model interdependencies between channels, the weight of each channel is adaptively recalibrated to make the network pay more attention to important features. Moreover, the depthwise separable convolution is exploited to reduce the parameters of the feature fusion and detection head networks. The experimental results show that the mean average precision of the proposed algorithm is higher than that of the state-of-the-art algorithms, while the average speed of the algorithm reaches 33.70FPS, which meets the real-time requirements. Compared with YOLOv4, the parameters and model size are reduced by 27.85M and 106.25MB, respectively. The presented algorithm not only improves the detection accuracy, but also reduces the computational complexity compared to the baseline algorithm.