Abstract:RGBT visual tracking refers to the emerging hot research topic of fusing visible and thermal infrared multimodal image information for visual tracking, and the reasonable fusion of complementary information of visible and thermal infrared images can improve the performance and robustness of trackers. Artificial intelligence technology has promoted the development of RGBT multimodal visual tracking, and deep learning technology gradually replaces the traditional target tracking method, which has more advantages in terms of accuracy and speed. This paper is presented to organize the development history of RGBT multimodal visual tracking, summarize and discuss related algorithms, specifically including correlation filtering-based methods and deep learning-based methods, review the development history of evaluation datasets, introduce algorithm performance evaluation metrics, analyze the performance of different algorithms on evaluation datasets, and look forward to the future research trends of RGBT multimodal visual tracking methods with deep learning are presented. This paper aims to provide a comprehensive overview and reference for related researchers to promote research and development in RGBT multimodal vision tracking.