Abstract:In order to assist developers in testing and fixing bugs, software defect prediction technique is used to locate defective code snippets in programs. Traditional defect prediction features are manual static code metrics based on software scale, software complexity and language characteristic. However, these features cannot capture defect information from program context, resulting in the degradation of defect prediction performance. To take full advantage of the syntactic and semantic features in program context, we propose a method called Defect Prediction via Mixed Attention Mechanism (DP-MHA) in this paper. Specifically, DP-MHA first extracts the AST tree-based syntactic and semantic sequence from programs and performs word embedding and positional encoding. Then it learns the contextual syntax and semantic information by the Multi-head attention mechanism. Finally it uses the global attention mechanism to extract key syntactic and semantic features which are used to build a software defect prediction model and identify code snippets with potential defects. In order to verify the effectiveness of DP-MHA, we select six Apache open-source Java projects, and compare it with the state-of-the-art methods including classical static code metric method based on RF, unsupervised learning method based on RBM+RF, DBN+RF and deep learning method based on CNN, RNN. The experimental results show that DP-MHA improves F1-Measure by 16.6%, 34.3%, 26.4%, 7.1% and 4.9%, respectively.