Abstract:In natural scene images, rich text content is very important for a comprehensive understanding of the scene. Aiming at the problems of complex background, sticky text, and multi-angle text in natural scene text images, a text detection and recognition algorithm based on improved MTSv2 is proposed. The detection algorithm takes MTSv2 as the base network, firstly, the Convolutional Block Attention Module(CBAM) attention mechanism is used to increase the weight of the small text in the feature map, so as to better capture the key features in the image; secondly, the Channel Enhancement-Feature Pyramid Network(CE-FPN) structure is used to alleviate the feature aliasing problem generated by multi-scale fusion; finally, the focal loss function is introduced to reduce the impact of the imbalance of the distribution of the positive and negative samples on the recognition accuracy, so that the network pays more attention to the samples that are difficult to classify and improve the generalization ability of the model. Trained on multiple text datasets and validated on the ICDAR2015 dataset, the accuracy of the improved model for scene text detection and recognition reaches 89.3%, the recall rate reaches 87.6%, and the F1 value reaches 88.5%, which are all improved to a certain extent compared with the original model.
Keywords:Scene Text; Text Detection; Text Recognition; CBAM; CE-FPN; Attention Mechanism