基于像素分配的文本检测方法研究
DOI:
作者:
作者单位:

江南大学

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Text Detection Based on Pixel to Box Assignment
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有方法在场景文本检测上的不足,提出一种基于像素分配方的场景文本检测方法,并采用了交叉注意力模块和多尺度特征自适应模块来分别在空间和和通道上优化特征提取。为了丰富不同尺度的特征表示,采用多尺度特征自适应模块进行自动分配不同尺度特征的权重。为了有效获取上下文信息,将特征网络提取到的特征送入交叉注意力模块。对每个像素,在其所在的水平路径和垂直路径上收集上下文信息。再通过循环操作,每一个像素便可以在全图范围内获取上下文信息。通过全卷积网络方法,使用多任务学习框架学习文本实例的几何特征,结合多任务学习的结果完成像素到文本框的分配,经过简单处理后重建文本实例的多边形边界框。在任意形状公开数据集Total-text上进行测试,本文方法的召回率、精确率、F值分别为75.71%、89.15%、81.89%,在多方向公开数据集ICDAR2015上也表现良好,经实验得召回率、精确率、F值分别为79.06%、89.24%、83.84%,证明了本文方法的有效性。

    Abstract:

    Aiming at the shortcomings of existing methods in scene text detection, a scene text detection method based on pixel allocation is proposed, and a cross-attention module and a multi-scale feature adaptive module are used to optimize feature extraction in space and channel respectively. In order to enrich the feature representations of different scales, a multi-scale feature adaptive module is used to automatically assign the weights of features of different scales. In order to effectively obtain contextual information, the features extracted by the feature network are fed into the cross-attention module. For each pixel, contextual information is collected on its horizontal path and vertical path. Then through the loop operation, each pixel can obtain context information in the whole image. Through the fully convolutional network method, the multi-task learning framework is used to learn the geometric features of the text instance, and the results of the multi-task learning are combined to complete the allocation of pixels to the text box, and the polygonal bounding box of the text instance is reconstructed after simple processing. Tested on the public dataset Total-text with any shape, the recall rate, precision rate, and F value of the method in this paper are 75.71%, 89.15%, and 81.89%, respectively, and it also performs well on the multi-directional public dataset ICDAR2015. The recall rate, precision rate, and F value are 79.06%, 89.24%, and 83.84%, respectively, which proves the effectiveness of the method in this paper.

    参考文献
    相似文献
    引证文献
引用本文

吉训生,喻智,徐晓祥.基于像素分配的文本检测方法研究计算机测量与控制[J].,2023,31(7):21-27.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-02-08
  • 最后修改日期:2023-03-06
  • 录用日期:2023-03-06
  • 在线发布日期: 2023-07-12
  • 出版日期: