Abstract:Aiming at the text data recorded after the failure of high-speed railway signal equipment, a multi-level classification model of high-speed railway signal equipment failure based on text mining is proposed. A feature representation method combining Term Frequency-Inverse Document Frequency (TF-IDF) word weight and word dictionary is proposed to extract the feature of signal equipment fault text data. In the multi-level classification model, the single-layer classification model was designed based on Stacking Integrated learning idea, the recurrent neural network Bidirection Gated Recurrent Unit (BiGRU) and Bidirection Long Short Term Memory (BiLSTM) were used as primary learners, and the weight combination calculation method was designed as secondary learners, multi-level classification tasks were decomposed into single classification tasks of each layer, and K-fold cross-verification was used to train Stacking model. After k = 5 training, the evaluation indexes of bigru are higher than those of bilstm. The weight of bigru and bilstm was 0.7 and 0.3 respectively. The output of the two networks is calculated by combination weighting, the accuracy is improved to 0.8814, and the recall rate is increased to 0.8642. High-speed railway from the opening to a decade of signal switch machine failure data, the secondary classification of fault location and fault cause is realized by analyzing the text data of fault cause, experiment show that multi-level classification model can effectively improve the classification of signal equipment failure multi-level classification task evaluation index, and can ensure the correctness of the subordinate relations classification results.