Abstract:The hybrid data type file desensitization system based on SM4 and FF1 algorithms desensitize arbitrary types of data by content segmentation, the system solves the problems of the low efficiency and incomplete desensitization due to the large amount of data and diverse data types stored on mobile media from government and financial institutions and improves the scope, accuracy and efficiency of file desensitization. The index conversion algorithm based on Chinese character dictionary library optimizes the key-value mapping that relies on hash table construction in traditional desensitization system by constructing the relative index relationship between the plaintext to be detected and the Chinese character encoding library, and further reduces the memory consumption for the runtime of the desensitization system. By randomly generating 1000 test files for desensitization test, the text unrecognition rate based on mixed types reaches 99.8%, and the accuracy rate of desensitization and content recovery reaches 99.9%. By randomly generating 10 test files with a total size of about 10MB, the desensitization rate of the plain text type can reach 2500 characters/SEC on average.