Abstract:Aiming at the practical demand for accurate recognition and efficient annotation of regular shape targets in controlled visible light scenes, a lightweight integrated framework for object detection and automatic annotation is researched and constructed. YOLOv8 is adopted as the basic detection model, and the Convolutional Block Attention Module (CBAM) is fused to optimize the feature extraction process. Model training and verification are carried out based on a self-constructed 16-category regular shape dataset with solid-color background. An end-to-end conversion process from detection results to YOLO format annotation files is designed to realize automatic detection and annotation of regular shape targets. The model is lightweight by INT8 quantization technology, deployed on the edge side combined with RK1828 AI coprocessor, and the inference performance tests on multiple hardware platforms are carried out. After 3 independent verifications, the mAP50 of the framework reaches 0.992±0.3% on the dataset with an 8:2 division of training and validation sets, and the single-frame inference time is about 6.9 ms on the NVIDIA RTX 3060 graphics card. After the model is quantized to 5.1 MB, the inference delay on RK1828 is 12.8 ms, and the mAP50 only drops by 0.5% to 0.987. The mAP50 of the model trained with automatically annotated files reaches 0.976, which is only 0.5% different from 0.981 of the model trained with manual annotation, and the annotation quality meets the requirements of model retraining. The integrated framework has the application advantages of high precision and high real-time performance in the scenarios of regular shape detection and automatic annotation, and can be adapted to multi-platform deployment on desktop and edge sides, but its generalization performance in complex industrial and security scenarios still needs further verification and optimization.