Abstract:The database of spacecraft materials and devices needs the support of massive test reports at home and abroad. As the most common form of data storage, table contains the largest amount of data. However, faced with the tedious and error-prone work of manual identification and extraction of table data, the table of PDF document is taken as the research object. The data identification system of spacecraft material and device test based on OCR technology is proposed. Using B/S architecture, based on EXT, JAVA, Python and other technical languages for development, the system has PDF document conversion, form recognition, data extraction, data editing and other functions; According to the system design, the key technologies and methods of layout analysis and PDFPlumber form inspection are used to identify PDF document forms accurately and effectively. The extracted data are displayed in the form of EXT form control. The batch identification and data extraction of regular forms in PDF documents are realized through the test. The feasibility of the design scheme is verified to meet the characteristics of high recognition accuracy and fast recognition of the test data test system.