Abstract:In order to meet the needs of human action recognition in complex environments, a dual-flow network recognition structure based on scene understanding is proposed. The scene information is added as auxiliary information to the human action recognition network structure to improve the recognition accuracy of the recognition network. The different fusion modes of the scene recognition network and the human action recognition network are studied, and the network optimal identification structure is determined. By analyzing the influence of different parameters on the recognition accuracy, all the structural parameters of the dual-flow network are finally determined. Through experiments on public data sets such as UCF50 and UCF101, 95% and 93% accuracy were obtained, respectively, which is higher than the typical identification network results. Some other typical identification networks have been studied by adding the same scene information. The experimental results show that this method can effectively improve the recognition accuracy. Key words: Dual stream network structure; Scene recognition; Human action recognition