基于动态 Transformer 的监控视频摘系统设计
DOI:
作者:
作者单位:

江南大学 物联网工程学院

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金(61873112,61802107)


Design of a Dynamic Transformer-based Surveillance Video Summarization System
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    监控视频系统是一种重要的技术手段,用于从庞大而复杂的监控视频中提取关键信息,为安全管理和事件分析提供有效支持。随着监控设备的普及和监控视频数据的快速增长,传统的手动方法已经无法满足快速处理和准确提取所需信息的需求,现代的深度学习方法普遍存在计算复杂度高,参数多的问题。针对这一问题,提出了一种基于动态Transformer的监控视频模型。自动为每个输入视频帧配置适当数量的token,通过级联多个Transformer模型,并逐渐增加生成的token数量,以实现自适应的激活顺序;一旦产生足够置信的预测,推理过程就会终止,并采用了特征重用和注意力重用技术以减少冗余计算;该模型在降低计算复杂度方面取得了显著进展,经实验测试,相较于传统模型,该动态Transformer模型在准确率上有所提升,在这两个公开数据集上分数指标分别提高了3.7%和0.9%,同时计算复杂度降低了40%,可以满足精度要求和监控要求,证明模型具有良好的泛化性。

    Abstract:

    The surveillance video summarization system is an important technical tool used to extract key information from large and complex surveillance videos, providing effective support for security management and event analysis. Traditional manual summarization methods have become inadequate in meeting the demands of rapid processing and accurate extraction of necessary information due to the proliferation of surveillance devices and the rapid growth of surveillance video data. Modern deep learning methods commonly suffer from high computational complexity and a large number of parameters. To address this issue, a dynamic Transformer-based surveillance video summarization model is proposed.The model automatically assigns an appropriate number of tokens to each input video frame, cascades multiple Transformer models, and gradually increases the number of generated tokens to achieve adaptive activation order. Once sufficiently confident predictions are made, the inference process terminates. The model employs feature reuse and attention reuse techniques to reduce redundant computations. It has made significant progress in reducing computational complexity.Experimental tests show that compared to traditional models, the dynamic Transformer model achieves improvements in accuracy, with score metrics increasing by 3.7% and 0.9% on two publicly available datasets, respectively. At the same time, the computational complexity is reduced by 40%. This model can meet the precision requirements and surveillance demands, demonstrating good generalization performance.

    参考文献
    相似文献
    引证文献
引用本文

阮志坚,彭力.基于动态 Transformer 的监控视频摘系统设计计算机测量与控制[J].,2024,32(8):201-208.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-01-16
  • 最后修改日期:2024-02-23
  • 录用日期:2024-02-28
  • 在线发布日期: 2024-09-02
  • 出版日期: