Abstract:Traditional visual SLAM systems suffer from decreased localization accuracy and poor robustness in indoor dynamic scenes because the static environment assumption fails. To address these problems, this paper designs a YOLO and ELSED-based SLAM (YEL-SLAM) system for RGB-D cameras. The system combines object detection with point-line feature fusion, adopting the ELSED extraction strategy to integrate adaptive length suppression and angle filtering, which introduces line features as a geometric supplement to ORB point features. Furthermore, the system fuses the YOLOv11n object detection model with depth residual consistency checks to construct a joint semantic-geometric dynamic feature screening mechanism, achieving precise identification and removal of dynamic point and line features. Additionally, a joint optimization model with adaptive weight adjustment for points and lines is established. Experimental tests on the TUM dataset show that compared to ORB-SLAM3, YEL-SLAM significantly improves performance, with a maximum reduction in absolute trajectory error reaching 96%. Moreover, compared with other semantic SLAM methods, YEL-SLAM shows better robustness and accuracy in highly dynamic sequences like Walking_xyz.