基于OCR技术的航天器材料及器件试验数据识别系统
DOI:
CSTR:
作者:
作者单位:

中船重工奥蓝托无锡软件技术有限公司

作者简介:

通讯作者:

中图分类号:

基金项目:


Spacecraft Material and DeviceTest Data Identification System Based on OCR Technology
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    航天器材料及器件数据库需要海量国内外试验报告数据的支撑,其中表格作为最普遍的数据存储形式含有的数据量最为庞大,然而面对人工识别提取表格数据工作繁琐且易出错的难点,以PDF文档的表格为研究对象,提出基于OCR技术的航天器材料及器件试验数据识别系统;采用了B/S架构,基于EXT、JAVA、Python等技术语言进行开发,系统具备PDF文档转换、表格识别、数据提取、数据编辑等功能;依据系统设计采用版面分析和PDFPlumber表格检测的关键技术和方法以达导准确有效识别PDF文档表格的目的,采用EXT表格控件形式展现提取的数据经试验测试实现了对PDF文档内规整表格的批量识别和数据提取;验证了设计方案的可行性,满足了试验数据试别系统的高识别准确率、快速识别等特点;

    Abstract:

    The database of spacecraft materials and devices needs the support of massive test reports at home and abroad. As the most common form of data storage, table contains the largest amount of data. However, faced with the tedious and error-prone work of manual identification and extraction of table data, the table of PDF document is taken as the research object. The data identification system of spacecraft material and device test based on OCR technology is proposed. Using B/S architecture, based on EXT, JAVA, Python and other technical languages for development, the system has PDF document conversion, form recognition, data extraction, data editing and other functions; According to the system design, the key technologies and methods of layout analysis and PDFPlumber form inspection are used to identify PDF document forms accurately and effectively. The extracted data are displayed in the form of EXT form control. The batch identification and data extraction of regular forms in PDF documents are realized through the test. The feasibility of the design scheme is verified to meet the characteristics of high recognition accuracy and fast recognition of the test data test system.

    参考文献
    相似文献
    引证文献
引用本文

陆俊杰,魏亚东,李晓峰,王成,李洪普,李锋.基于OCR技术的航天器材料及器件试验数据识别系统计算机测量与控制[J].,2023,31(1):282-288.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-06-17
  • 最后修改日期:2022-07-12
  • 录用日期:2022-07-13
  • 在线发布日期: 2023-01-16
  • 出版日期:
文章二维码