Abstract:[Objective]To enhance the real-time performance, interactivity, and environmental adaptability of remote monitoring for coal mine excavation equipment, and to overcome the limitations of traditional detection methods in terms of three-dimensional perception and dynamic response. [Method]The SICK Visionary-T Mini AP camera was employed to collect RGB and depth data, and high?precision virtual?real fusion was achieved through point cloud rendering and three?dimensional reconstruction. Combined with the r700 AR device and a multi?branch convolutional gated recurrent neural network (Multi?branch CNN?GRU) based on channel state information (CSI), remote gesture interaction control was realized. [Results]Experiments were conducted under dust concentrations of 50–150?mg/m3, light intensities of 50–500?lux, and both static and dynamic working conditions. The results indicate that the AR reconstruction RMSE ranges from 1.67?mm to 2.71?mm under ideal static conditions, and remains below 6.63?mm under complex dynamic conditions. In dynamic environments, the gesture recognition rate exceeds 99.6%, the operation error is controlled within 0.05–0.5?cm, and the end?to?end operation delay is 85–120?ms.[Conclusion]The proposed model exhibits strong robustness and good engineering practicality, providing reliable technical support for remote real?time monitoring and interactive control of coal mine excavation equipment.