基于改进MTSv2的文本检测和识别算法研究
DOI:
作者:
作者单位:

江南大学 物联网工程学院

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金青年项目(6170185),国家自然科学基金(61901206)


Research on Text Detection and Recognition Algorithm Based on Improved MTSv2
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在自然场景图像中,丰富的文本内容对于全面理解场景非常重要。针对自然场景文本图像存在背景复杂、文本粘连、文本多角度等问题,提出一种基于改进MTSv2的文本检测和识别算法。检测算法以MTSv2为基础网络,首先采用CBAM注意力机制增大特征图中的小型文本的权重,更好捕捉图像中的关键特征;其次融合CE-FPN结构,减轻多尺度融合产生的特征混叠问题;最后引入focal loss函数,减少正负样本分布不均衡对识别准确率的影响,使网络更加关注难以分类的样本,改善模型的泛化能力。通过多个文本数据集进行训练,并在ICDAR2015数据集上进行验证,改进后模型对场景文本检测和识别的准确率达到了89.3%,召回率达到了87.6%,F1值达到了88.5%,相比于原模型都有一定程度的提高。

    Abstract:

    In natural scene images, rich text content is very important for a comprehensive understanding of the scene. Aiming at the problems of complex background, sticky text, and multi-angle text in natural scene text images, a text detection and recognition algorithm based on improved MTSv2 is proposed. The detection algorithm takes MTSv2 as the base network, firstly, the Convolutional Block Attention Module(CBAM) attention mechanism is used to increase the weight of the small text in the feature map, so as to better capture the key features in the image; secondly, the Channel Enhancement-Feature Pyramid Network(CE-FPN) structure is used to alleviate the feature aliasing problem generated by multi-scale fusion; finally, the focal loss function is introduced to reduce the impact of the imbalance of the distribution of the positive and negative samples on the recognition accuracy, so that the network pays more attention to the samples that are difficult to classify and improve the generalization ability of the model. Trained on multiple text datasets and validated on the ICDAR2015 dataset, the accuracy of the improved model for scene text detection and recognition reaches 89.3%, the recall rate reaches 87.6%, and the F1 value reaches 88.5%, which are all improved to a certain extent compared with the original model.
    Keywords:Scene Text; Text Detection; Text Recognition; CBAM; CE-FPN; Attention Mechanism

    参考文献
    相似文献
    引证文献
引用本文

王艳媛,茅正冲,杨雨涵.基于改进MTSv2的文本检测和识别算法研究计算机测量与控制[J].,2024,32(9):256-261.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-09-05
  • 最后修改日期:2023-10-14
  • 录用日期:2023-10-16
  • 在线发布日期: 2024-10-08
  • 出版日期: