Abstract:Static analysis (SA) tools can aid the developers detect the critical errors in software to some degree. However, challenges such as scalability and undecidability are likely to have impact on their precision and performances, preventing these tools from being widely adopted in practice. Recently, researchers have begun to utilize artificial intelligence techniques to improve the usability of these tools by automatically classifying false positive alarms, manual identification of which is laborious and time-consuming in software development processes. Traditional approaches mainly focus on using hand-engineered features to represent the defective code snippets, hard to extract the deep semantic information of reported alarms. To overcome the limitations of traditional approaches, a novel feature extraction approach is designed and proposed. By collecting and capturing the fine-grained semantic and syntactic information included in instructions related to the state-transforming processes of instances of fault pattern state machine, and combining them with an effective deep learning framework, cross-project defect automatic identification can be achieved. The experiment is based on the alarm dataset of five open-source projects. Comparing with the traditional metrics-based method, the indicator AUC is increased by between 1.83%-31.81%. The experimental results show that the proposed method is effective and can yield significant improvement on cross-project defect identification.