Abstract:A multi-modal human behavior recognition algorithm based on attention mechanism is proposed. Aiming at the problem of effective fusion of multimodal features, a two-stream feature fusion convolutional network TAM3DNet (Two-stream Attention Mechanism 3D Network) based on attention mechanism is designed. The backbone network adopts AM3DNet (Attention Mechanism 3D Network), which combines the attention mechanism, and weights the feature map and the attention map to obtain the weighted behavior characteristics, so that the network focuses on the characteristics of the limb movement area and reduces the influence of the background and the static area of the limb. The color and depth modal data of the RGB-D data are respectively used as the input of the dual-stream network, and the color and depth behavior features are obtained from the two branch networks, and then the fusion features are classified to obtain the human behavior recognition results.