Abstract:Spiking Neural Networks (SNNs) now attract significant attention in extreme energy constrained applications due to their exceptional bio-interpretability and energy efficiency. However, existing SNN-based algorithms often exhibit limited recognition capabilities relative to SOTA conventional recognition algorithms and neglect temporal correlations in input data frames. To address these issues, this paper proposes TS-SpikeYOLO, an enhanced algorithm based on SpikeYOLO. The proposed method introduces multi-scale dilated group convolution? (MSGDC) and a residual temporal shift (TS) module? after the feature extraction structure of SpikeYOLO. By partially shifting channels from previous frame to the current frame for feature fusion, the algorithm improves information extraction and temporal correlation utilization with only a minimal increase in parameters. Additionally, the multi-scale dilated group convolutions and residual architecture effectively integrate refined feature maps while mitigating performance degradation and training instability caused by non-local feature incorporation. Experiment results demonstrate that TS-SpikeYOLO achieves 72.0% mAP50? on a random static dataset constructed from COCO 2017, and attains 80.1% mAP50? ?on temporal-related dataset derived from KITTI and MOT15. Compared to the baseline algorithm, this represents an improvement of 1.7% and 1.0%, respectively, with only 2% increase in parameter scale. The research can be beneficial for enhancing the application of SNNs in video-related object recognition tasks.