基于双哈希索引的高效语音生物哈希安全检索算法

黄羿博; 陈德怀; 张秋余

引用本文：

黄羿博,陈德怀,张秋余.基于双哈希索引的高效语音生物哈希安全检索算法[J].信息安全学报,2024,9(2):69-83 [点击复制]
HUANG Yibo,CHEN Dehuai,ZHANG Qiuyu.Efficient Speech Biological Hashing Secure Retrieval Algorithm Based on Double Hash Index[J].Journal of Cyber Security,2024,9(2):69-83 [点击复制]

本文已被：浏览 7996次下载 4822次	码上扫一扫！
基于双哈希索引的高效语音生物哈希安全检索算法
黄羿博¹, 陈德怀¹, 张秋余²
0 字体:加大+\|默认\|缩小-
(1.西北师范大学物理与电子工程学院兰州中国 730070;2.兰州理工大学计算机与通信学院兰州中国 730050)

摘要:

针对语音数据在信道传输与云端存储时的安全性问题,以及由于语音数据数目大、维数高、空间复杂度高带来的检索效率问题,提出了一种基于双哈希索引的高效语音生物哈希安全检索算法。首先,在服务端分别提取语音信号的频谱通量与峭度因子特征并将两种特征融合,利用Bagging分类对语音信号的差分哈希分类,并基于分类结果构建密钥分配索引表;然后,根据密钥分配索引表建立具有单一映射密钥的生物特征模板,并将其量化构造生物哈希,得到哈希索引;同时,采用混合域置乱加密算法对原始语音加密,构建密文语音库;最后,将哈希索引与密文语音库上传至云端并构建云端生物哈希索引表。在移动端,采用归一化汉明距离进行匹配检索。实验结果表明:本文算法的匹配阈值区间为(0.2694,0.4173),说明该检索算法能够灵活选取匹配阈值,具有较好的鲁棒性和区分性;检索过程中单条语音平均检索时间仅为9.4957×10^-4s,并且经过15种内容保持操作后的查全率与查准率均为100%,说明该算法具有较好的检索性能,可以满足各种环境下的语音检索需求;同时提出的加密算法密钥空间大小为10⁶⁰,说明能够抵御穷举密钥攻击、保证语音数据的安全;此外,构建的生物特征模板具有良好的多样性、安全性和可撤销性。

关键词: 安全语音检索双哈希索引生物特征模板生物哈希密文语音

DOI：10.19363/J.cnki.cn10-1380/tn.2024.03.06

投稿时间：2022-04-05修订日期：2022-12-21

基金项目:本课题得到甘肃省科技计划项目资助,甘肃省自然科学基金(No.21JR7RA120)和国家自然科学基金(No.61862041)资助。

Efficient Speech Biological Hashing Secure Retrieval Algorithm Based on Double Hash Index

HUANG Yibo¹, CHEN Dehuai¹, ZHANG Qiuyu²

(1.College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou 730070, China;2.School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China)

Abstract:

Aiming at the security of speech data in channel transmission and cloud storage, as well as the problems of retrieval efficiency caused by the large number, high dimension and high spatial complexity of speech data, an efficient speech biological hashing secure retrieval algorithm based on double hash index is proposed. Firstly, the spectral flux and kurtosis factor features of speech signal are extracted in the server terminal, and then the two features are fused, Bagging classification is used to classify speech signals by differential hashing, and the key distribution index table is constructed based on the classification results; then, according to the key distribution index table, the biometric template with a single mapping key is established, and its biometric hash is quantized to obtain the hash index; at the same time, the mixed domain scrambling encryption is used to encrypt the original speech and construct the encrypted speech database; finally, the hash index and encrypted speech database are uploaded to the cloud and the biological hash index table is constructed. In the mobile terminal, using normalized hamming distance for matching retrieval. The experimental results show that the matching threshold interval obtained by the algorithm is (0.2694, 0.4173), which shows that the retrieval system can flexibly select the matching threshold and has good robustness and discrimination; the average retrieval time of a single speech in the retrieval process is only 9.4957 × 10^-4s, and the recall and precision after 15 kinds of content preservation operations are 100%, it shows that the algorithm has good retrieval performance and can meet the needs of speech retrieval in various environments; at the same time, the size of the encryption algorithm key space is 10⁶⁰, which shows that it can resist exhaustive key attack and ensure the security of speech data; in addition, the constructed biometric templates have good diversity, security and revocability.

Key words: secure speech retrieval double hash index biometric template biological hashing encrypted speech