Abstract:When UAV perform advanced tasks such as search and rescue, they often need to determine their own location and environmental information. Modeled on humans perceiving environmental information through vision, visual SLAM (visual simultaneous localization and mapping, VSLAM) is a cutting-edge technology in the field of computer vision that uses visual sensors to perceive environmental information and quickly track its own location and build environmental maps. This article first explains the important components of VSLAM: front-end processing (feature point method and direct method), data association, back-end optimization algorithms (filtering methods and optimization methods) and mapping; then summarizes some of the successful applications on UAVs Typical VSLAM algorithm, and many outstanding programs and research institutions that have emerged during the development of VSLAM for more than 30 years; Then it discusses several key issues currently used in the development of UAV VSLAM, the application of multi-UAV collaborative C-SLAM, deep learning and semantic segmentation in SLAM, and multi-sensor fusion SLAM represented by visual inertial navigation; Finally, the VSLAM method is summarized, and the future development direction is given, hoping to provide guidance and help for follow-up research.