Abstract:Existing rigid body pose estimation suffers from data scarcity, low robustness in complex scenes and low real-time performance, for this reason, A rigid object pose tracking network based on synthetic data is proposed. Temporal and spatial feature fusion techniques are used to capture temporal and spatial feature information and generate a feature map with temporal and spatial sensitivity. Residual connectivity is utilized to learn richer and more abstract quality features to improve the accuracy of tracking the target. Data augmentation is performed on the scarce data to generate complex synthetic data that conforms to the real physical characteristics, which is used to train the deep learning model and improve the generalization of the model. The proposed method in this paper selects seven objects in the YCB-Video dataset for real-time pose tracking experiments, and the results show that the method in this paper is more accurate in estimating the poses of rigid bodies in complex scenarios and performs optimal performance in real-time estimation efficiency compared with similar related methods.