Abstract:A surface defect detection and segmentation method based on improved Mask R-CNN is proposed to address the problems of metal surface defects. Replace ResNet-50 with advanced ConvNeXt-T to improve the backbone network for feature extraction, add interleaved sparse self attention modules in the feature pyramid section to enhance the global modeling ability of the model, and enhance the contextual information representation ability of the model through multi-level regional feature fusion. Comparison and validation were conducted on the dataset of steel surface defects, and the results showed that the improvement of the backbone network was the most significant, with an increase of 8.2% and 6.3% in the mAP-bbox and mAP_mask indicators, respectively. Compared with the comparison method, the proposed method has the highest detection and segmentation accuracy for steel surface defects, The mAP_bbox index and mAP_mask index reached 0.690 and 0.662, respectively.