基于SpikeYOLO改进的视频目标检测算法
DOI:
CSTR:
作者:
作者单位:

中国电子科技集团公司第五十四研究所

作者简介:

通讯作者:

中图分类号:

TP391.41

基金项目:


TS-SpikeYOLO:An Improved Video Object Detection Method Based on SpikeYOLO
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    脉冲神经网络由其突出的生物可解释性和能效,在对能量消耗有极端限制的应用领域正在受到越来越多的关注,但目前已实现算法普遍存在识别能力相对传统算法较弱、未充分利用输入数据的时间相关性等问题;为此提出了一种基于SpikeYOLO的改进算法TS-SpikeYOLO,该算法在原算法的特征提取结构后引入多尺度扩张卷积MSGDC与残差时移TS模块,通过将过去相邻帧的部分通道移入本帧并进行特征融合,在增加少量参数的情况下改善了算法的信息提取和时间相关性的利用能力;通过引入多尺度扩张卷积和残差结构使算法能够有效融合调整后的特征图并减小非本帧特征引入导致的算法表达能力减弱和训练不稳定;实验结果表明,该算法在选取人与车两种目标的COCO2017静态数据集上的mAP50达到72.0%,在类似的基于KITTI和MOT15数据集制作的动态数据集上的mAP50也达到80.1%,相比基本算法分别提升了1.7%和1%,而算法参数量仅增加2%, 有利于提升脉冲神经网络在视频相关目标识别任务上的应用前景。

    Abstract:

    Spiking Neural Networks (SNNs) now attract significant attention in extreme energy constrained applications due to their exceptional bio-interpretability and energy efficiency. However, existing SNN-based algorithms often exhibit limited recognition capabilities relative to SOTA conventional recognition algorithms and neglect temporal correlations in input data frames. To address these issues, this paper proposes TS-SpikeYOLO, an enhanced algorithm based on SpikeYOLO. The proposed method introduces multi-scale dilated group convolution? (MSGDC) and a residual temporal shift (TS) module? after the feature extraction structure of SpikeYOLO. By partially shifting channels from previous frame to the current frame for feature fusion, the algorithm improves information extraction and temporal correlation utilization with only a minimal increase in parameters. Additionally, the multi-scale dilated group convolutions and residual architecture effectively integrate refined feature maps while mitigating performance degradation and training instability caused by non-local feature incorporation. Experiment results demonstrate that TS-SpikeYOLO achieves 72.0% mAP50? on a random static dataset constructed from COCO 2017, and attains 80.1% mAP50? ?on temporal-related dataset derived from KITTI and MOT15. Compared to the baseline algorithm, this represents an improvement of 1.7% and 1.0%, respectively, with only 2% increase in parameter scale. The research can be beneficial for enhancing the application of SNNs in video-related object recognition tasks.

    参考文献
    相似文献
    引证文献
引用本文

张昭然.基于SpikeYOLO改进的视频目标检测算法计算机测量与控制[J].,2026,34(3):171-176.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-30
  • 最后修改日期:2026-01-20
  • 录用日期:2026-01-23
  • 在线发布日期: 2026-03-24
  • 出版日期:
文章二维码