Abstract:Human action feature recognition, as an important research direction in the field of computer vision, has wide applications in real life. Its research methods can be divided into traditional methods and deep learning-based methods. The two-stream convolutional networks, as a classic network in deep learning-based methods, provides researchers with a rich set of ideas by dividing video sequences into two types of features: temporal and spatial. This paper investigates the current research status of two-stream convolutional networks for human action feature recognition from four aspects: two-stream convolutional networks based on 3D convolutional neural network, two-stream convolutional networks integrated with long short-term memory, two-stream convolutional networks based on graph convolutional network, and two-stream convolutional networks incorporating attention mechanisms. It analyzes the limitations of various methods, reviews the key milestones in the development of two-stream convolutional networks, and summarizes the advantages, disadvantages, and application scenarios of each method. The paper also lists commonly used datasets and provides an overview of the practical applications of human action feature recognition. Additionally, it identifies the challenges currently faced, and provides an outlook for the future.