基于国密SM4及保形加密算法的文件脱敏系统研究
DOI:
作者:
作者单位:

1.公安部第三研究所 数据安全技术研发中心;2.海军军医大学 影像医学系

作者简介:

通讯作者:

中图分类号:

基金项目:

:科技部重点研发计划资助(No.2021YFB3102002)


Research of File Desensitization System Based on The Domestic Algorithm SM4 and Format-Preserving Encryption
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对政务及金融等领域对于内部文件保密要求高,移动介质上存储的文件数据通过传统脱敏方法面临着数据内容量大、数据类型多样导致的脱敏效率低、脱敏内容不彻底等问题,提出了一种基于SM4与FF1结合的混合数据类型文件脱敏系统,该系统通过内容分割脱敏处理任意类型的数据,提升了文件脱敏的范围、准确性和效率;为了进一步减少脱敏系统代码运行的内存消耗,提出了汉字字典库索引转换算法,该算法通过构建待检测明文与汉字编码库的相对索引关系,优化传统脱敏系统中依赖于构建哈希表的键值映射;通过随机生成1000份测试文件进行脱敏测试,基于混合类型的文本不可识别率达到99.8%,脱敏以及内容复原的准确率达到99.9%;通过随机生成10份总大小约为10MB的测试文件,纯文本类型的脱敏速率平均可达2500字符/秒。

    Abstract:

    The hybrid data type file desensitization system based on SM4 and FF1 algorithms desensitize arbitrary types of data by content segmentation, the system solves the problems of the low efficiency and incomplete desensitization due to the large amount of data and diverse data types stored on mobile media from government and financial institutions and improves the scope, accuracy and efficiency of file desensitization. The index conversion algorithm based on Chinese character dictionary library optimizes the key-value mapping that relies on hash table construction in traditional desensitization system by constructing the relative index relationship between the plaintext to be detected and the Chinese character encoding library, and further reduces the memory consumption for the runtime of the desensitization system. By randomly generating 1000 test files for desensitization test, the text unrecognition rate based on mixed types reaches 99.8%, and the accuracy rate of desensitization and content recovery reaches 99.9%. By randomly generating 10 test files with a total size of about 10MB, the desensitization rate of the plain text type can reach 2500 characters/SEC on average.

    参考文献
    相似文献
    引证文献
引用本文

黄俊,刘家甫,曹志威.基于国密SM4及保形加密算法的文件脱敏系统研究计算机测量与控制[J].,2024,32(11):315-321.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-09-29
  • 最后修改日期:2024-10-23
  • 录用日期:2024-10-28
  • 在线发布日期: 2024-11-19
  • 出版日期:
文章二维码