This application relates to the computer field, and in particular to a method and device for encrypted data storage.
Private data often needs to be stored via encryption, and the key for encrypted storage is often possessed separately by each user. Therefore, even if the original data is the same, the encrypted data is completely different, which makes it impossible to delete encrypted duplicate data. At the same time, encryption algorithms are constantly evolving. With the development of technology and the continuous iteration of hardware, the security of some encryption algorithms and hashing algorithms may be threatened, so user encryption algorithms may change. In the case of encryption algorithms evolvement, how to identify and delete duplicate encrypted data at the same time is still a problem to be solved.
The realization of data encryption and ciphertext search under the data deduplication scenario is a research hotspot in the field of cloud storage security. In order to achieve data deduplication under the condition of user data encryption, cloud storage systems often adopt convergence encryption technology or introduce additional independent servers. However, these mechanisms have defects such as the threat of offline brute force attacks and cost limitations.
Furthermore, a server-side data deduplication scheme based on password authenticated key exchange (PAKE) protocol is proposed. In the scheme, users compare private information with each other and share keys, and this solution does not require additional servers to achieve cross-user data deduplication. The scheme's advantage is that it not only allows users to encrypt data locally, but also prevents brute force attacks from malicious users or server. In order to resist brute force attacks, Belare et al. introduced a key management server into the server-side data deduplication scheme. Puzio et al. designed a block-level data deduplication scheme under the cloud storage system, and introduced additional encryption operations and access control mechanisms on the basis of convergence encryption to resist dictionary attacks. However, the general problem of these methods is that they are implemented in specific scenarios and require a dedicated management server. They are not universal and lack the native support of the file system. In addition, data security and privacy protection are increasingly being valued by the industry, and it has become the norm for users to upload encrypted data. As far as the current implementation is concerned, encryption is usually set by the user to achieve privacy protection. In this way, even if the original data is the same, the encrypted data is completely different. Data deduplication becomes a problem.
On the other hand, cryptography is constantly evolving and encryption technology is constantly improving. The industry needs a scalable and universal encryption structure. The academia's mechanisms for data encryption are mainly divided into two categories: identity-based encryption mechanisms and attribute-based encryption mechanisms. Identity-based encryption mechanism means that any two users do not need to exchange private or public keys to achieve secure communication and identity authentication. The attribute-based encryption mechanism uses the user identity determined by the user attribute set to generate the key, and users with the same attribute can decrypt the ciphertext. The problem with the attribute-based encryption mechanism is the complexity of user authority control and the leakage of identity privacy. As a result, the sharing granularity of encrypted data is too coarse, leading to the need to frequently upload the key to a third party, which is difficult to apply to the environment of data outsourcing.
Ciphertext search is a key technology corresponding to data encryption. Searches based on plaintext keywords require users to directly decrypt their stored data or decrypt them after downloading, which can easily lead to malicious users or service providers stealing users' private information, resulting in that it is not suitable for encrypted storage systems. The current research progress is to establish a multi-keyword ranking ciphertext search mechanism based on a single keyword or Boolean keyword ciphertext search. Among them, the data owner uploads the encrypted file and its encrypted searchable index to the storage server, and the data user obtains the retrieval trapdoor corresponding to its multiple keywords through the search control mechanism, and then sends the information to the storage server. After the server receives the request, it searches and sorts, and finally returns the search results. In the search, the correlation between the file and the query keyword is calculated using the K-nearestneighbor (kNN) technology based on the inner product similarity, and random variables are added to the request vector, and a fake keyword has been added to the binary vector of the file data. Therefore, when a server that only obtains ciphertext data receives a retrieval trapdoor, it becomes more difficult to analyze its correlation. However, if background knowledge such as the correlation between two retrieval trapdoors is known, the cloud server can obtain private information such as keywords through scale analysis. Therefore, adding multiple false keywords to the binary vector of the file data can protect the privacy of the keywords used when searching for files. Adding some blank words in the keyword dictionary, that is, set 0 in the binary vector of the corresponding data is to support dynamic operations such as adding, modifying and deleting files.
One area related to ciphertext search is content addressing, such as IPFS (Interplanetary File System) storage network, which calculates a hash value for each data block and compares the hash values to see if the content has been stored; and if the content has been stored, the existing data can be directly used and retrieved. There is currently no research on content addressing based on encrypted data content.
In summary, many existing storage systems have implemented data deduplication and content addressing (search), but these systems have not dealt with the following problems:
1. This kind of data deduplication and content addressing (search) can only be performed on unencrypted data. If data with the same content is encrypted and the ciphertext is different, data deduplication and content addressing cannot be implemented.
2. This method of data deduplication and encryption must have to compare user content. In most cases, the content belongs to different users, and the storage system must calculate based on the content of different users, which is a breach of user privacy protection.
3. Existing encryption systems and encryption applications are customized and dedicated systems, which are not resolved by protocol mechanisms, and lack scalability and backward compatibility.
One purpose of this application is to provide a method and device for encrypted data storage, which solves the problems in the prior art that the encrypted duplicate data cannot be identified and deleted, and the existing system lacks scalability and backward compatibility.
According to one aspect of the present application, there provides a method for encrypted data storage, the method including:
obtaining original data to be encrypted, using a first hash function to hash the original data to be encrypted, and generating an encryption key;
performing hash calculation on the original data to be encrypted using a second hash function to obtain first authentication metadata;
encrypting an original file for storing the original data based on the encryption key, generating an encrypted file, and performing hash calculation on the encrypted file based on the second hash function to obtain second authentication metadata;
generating a content descriptor based on the first hash function, the first authentication metadata, and the second authentication metadata; and
storing the encrypted file, the first authentication metadata and the second authentication metadata in a file using the content descriptor as identification information to obtain an encrypted storage file.
In one embodiment, encrypting an original file for storing the original data based on the encryption key and generating an encrypted file includes
encrypting the original file for storing the original data based on the encryption key and a designated encryption function, and generating a designated encrypted file.
In one embodiment, performing hash calculation on the encrypted file based on the second hash function to obtain second authentication metadata includes
performing hash calculation on the designated encrypted file based on the second hash function to obtain the second authentication metadata;
generating a content descriptor based on the first hash function, the first authentication metadata, and the second authentication metadata includes:
generating the content descriptor based on the first hash function, the first authentication metadata, the second authentication metadata, and the designated encryption function.
In one embodiment, generating the content descriptor based on the first hash function, the first authentication metadata, and the second authentication metadata includes
generating the content descriptor based on the first hash function, the second hash function, the first authentication metadata, the second authentication metadata, and the designated encryption function.
In one embodiment, the method includes
sending the content descriptor to an user, and obtaining a retrieval result after the user retrieves the encrypted storage file based on the content descriptor.
In one embodiment, after obtaining the retrieval result after the user retrieves the encrypted storage file based on the content descriptor, the method includes
when the retrieval result is that the encrypted storage file is retrieved, decompressing the encrypted storage file to obtain an encrypted file and second authentication metadata;
verifying the encrypted file according to the second authentication metadata and the second hash function in the content descriptor to obtain a verification result;
using the encryption key and the designated encryption function in the content descriptor to decrypt the encrypted file that has passed verification based on the verification result to obtain the original file;
verifying the original file according to the first hash function and the first authentication metadata in the content descriptor, and feeding back the original file that has passed verification to the user.
According to another aspect of the present application, there also provides a system for encrypted data storage, the system includes a data acquisition device, a data processing device, a data encryption device, a data identification device, and a data storage device,
the data acquisition device is configured to acquire original data to be encrypted, and use a first hash function to hash the original data to be encrypted to generate an encryption key;
the data processing device is configured to use a second hash function to perform a hash calculation on the original data to be encrypted to obtain first authentication metadata;
the data encryption device is configured to encrypt the original file used to store the original data based on the encryption key to generate an encrypted file, and perform a hash calculation on the encrypted file based on the second hash function to obtain second authentication metadata;
the data identification device is configured to generate a content descriptor based on the first hash function, the first authentication metadata, and the second authentication metadata;
the data storage device is configured to store the encrypted file, the first authentication metadata, and the second authentication metadata in a file using the content descriptor as identification information to obtain an encrypted storage file.
According to another aspect of the present application, there also provides a computer-readable medium having computer-readable instructions stored thereon, and the computer-readable instructions can be executed by a processor to implement a method according to any of the foregoing methods.
According to another aspect of the present application, there also provides a device for encrypted data storage, and the device includes:
one or more processors; and
a memory storing computer-readable instructions, and the computer-readable instructions, when executed, cause the processors to perform the operation of a method according to any one of the foregoing methods.
Compared with the prior art, this application obtains original data to be encrypted, and uses a first hash function to hash the original data to be encrypted to generate an encryption key; uses a second hash function to hash the original data to be encrypted to obtain first authentication metadata; encrypts an original file used to store the original data based on the encryption key to generate an encryption file, and performs hash calculation on the encrypted file based on the second hash function to obtain second authentication metadata; generates a content descriptor based on the first hash function, the first authentication metadata, and the second authentication metadata; stores the encrypted file, the first authentication metadata, and the second authentication metadata in a file using the content descriptor as identification information to obtain an encrypted storage file. Therefore, it is possible to perform data deduplication and search for encrypted data, and there is no need to compare the contents of users' files, which protects users' private information, improves scalability and backward compatibility, and can be applied to different systems.
Embodiments of the disclosure will be made apparent by the following drawings:
The same or similar reference signs in the drawings represent the same or similar components.
The application will be further described in details below in conjunction with the accompanying drawings.
In a typical configuration of this application, the terminal, the equipment of the service network, and the trusted party all include one or more processors (CPU), input/output interfaces, network interfaces, and memory.
Memory may include non-permanent memory in computer-readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
Computer-readable media includes permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program devices, or other data. Examples of computer storage media include, but are not limited to, phase change random access memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include non-transitory computer-readable media, such as modulated data signals and carrier waves.
In one embodiment, in S100, it obtains original data to be encrypted, and uses a first hash function to perform data processing on the original data to be encrypted, and generates an encryption key. In this step, the original data to be encrypted is calculated using the first hash function, and the obtained hash data is the encryption key. Therefore, when the original data to be encrypted are the same and the hash function is the same, the obtained encryption keys are the same. The original data itself is used to generate content as encryption keys through a specific hash algorithm. This is vital to realize data deduplication while realizing data encryption.
In S200, it performs hash calculation on the original data to be encrypted using a second hash function to obtain first authentication metadata. In this step, the hash data obtained by calculating the original data to be encrypted using the second hash function is the first authentication metadata, and the first authentication metadata is used to detect whether the decrypted data obtained after decryption is consistent with the original data to be encrypted.
In S300, it encrypts an original file for storing the original data based on the encryption key, generates an encrypted file, and performs hash calculation on the encrypted file based on the second hash function to obtain second authentication metadata. In this step, the encryption may use a designated encryption method to obtain an encrypted file. After the second authentication metadata is obtained, the consistency of the encrypted data may be verified according to the designated encryption method and the second authentication metadata.
In S400, it generates a content descriptor based on the first hash function, the first authentication metadata, and the second authentication metadata. In this step, the first hash function only needs to encode its corresponding type information into the content descriptor. For any original data, if the first hash function, the second hash function, and the second authentication element being used are the same, the same original data will get the same encryption result through the same encryption method. However, it is only possible to know the encryption key if the original data is known. Based on the content descriptor, the public can learn the name of the first hash function, the name of the second hash function, the first authentication metadata, and the second authentication metadata in this application. Through such content descriptors, the file data after encryption can be directly retrieved without the need to decrypt user privacy in the original data. Also, it is beneficial to identify encrypted data that are duplicated, which leads to realize data deduplication and retrieval of the encrypted data, improve the scalability and backward compatibility and it can be used in different systems.
In S500, it stores the encrypted file, the first authentication metadata and the second authentication metadata in a file using the content descriptor as identification information to obtain an encrypted storage file. In this step, the encrypted storage file may be a storage file in a specified format, and the storage file in a specified format includes the encrypted file, the first authentication metadata, and the second authentication metadata. The first authentication metadata and the second authentication metadata are recorded in the content descriptor, which is used as identification information to identify the encrypted file, and an encrypted storage file stored in a specified format is obtained.
In a preferred embodiment of the present application, in S300, the original file used to store the original data is encrypted based on the encryption key and the designated encryption function to generate a designated encrypted encrypted file. Herein, the designated encryption method may be a symmetric data encryption method, and the original file used to store the original data is encrypted based on the encryption key and the symmetric data encryption function to generate a symmetrically-encrypted encrypted file.
In a preferred embodiment of the present application, in S300, it performs hash calculation on the designated encrypted file based on the second hash function to obtain the second authentication metadata, and it generates the content descriptor based on the first hash function, the first authentication metadata, the second authentication metadata, and the designated encryption function. Herein, the designated encryption function may only encode its corresponding encryption function type into the content descriptor, to further simplify the content of the content descriptor. In this application, the encryption method and hash function used are all data available to the public, but the encrypted data file can be decrypted only when the original data to be encrypted is known.
In a preferred embodiment of the present application, in S300, it generates the content descriptor based on the first hash function, the second hash function, the first authentication metadata, the second authentication metadata, and the designated encryption function. Herein, the type information of the first hash function, the type information of the second hash function, the first authentication metadata, the second authentication metadata, and the encryption function type corresponding to the designed encryption function are encoded into the content descriptor. Through such content descriptor, the encrypted file data can be directly retrieved without decrypting the user privacy in the original data, and it is convenient to identify the duplicate encrypted data. As a result, it can realize data deduplication and retrieval of encrypted data, improve the scalability and backward compatibility, and can be applied to different systems.
In a preferred embodiment of the present application, the content descriptor is sent to the user, and the retrieval result after the user retrieves the encrypted storage file based on the content descriptor is obtained. Herein, the user can retrieve the encrypted file after obtaining the content descriptor.
In a preferred embodiment of the present application, after obtaining the search result after the user retrieves the encrypted storage file based on the content descriptor, when the search result is that the encrypted storage file is retrieved, decompress the encrypted storage file to obtain an encrypted file and second authentication metadata; verify the encrypted file according to the second authentication metadata and the second hash function in the content descriptor to obtain a verification result; based on the verification result, the encrypted file that has passed the verification is decrypted using the encryption key and the designated encryption function in the content descriptor to obtain the original file; according to the first hash function and the first authentication metadata in the content descriptor, the original file is verified, and the original file that has passed the verification is fed back to the user. Herein, based on the content descriptor, users can retrieve and decrypt encrypted files. The method facilitates the identification of duplicate encrypted data, and can realize data deduplication and retrieval of encrypted data, improve scalability and backward compatibility, and be applied to different systems.
Following the above embodiment, after the user uses the Cid to retrieve the file, when the corresponding file object exists, the file F can be obtained according to the above Cid. After the file F is decompressed, the file G and the self-certification information EncID are obtained. The EncID and the hash algorithm H2 in the Cid verifies the integrity of the file G, and then the method uses the Key and the encryption algorithm Enc in the Cid to decrypt the file G, obtains the file D, verifies the integrity of the file D according to the Id in the Cid, and returns File D to the user when the verification is passed.
The embodiments of the present application also provide a computer-readable medium on which computer-readable instructions are stored, and the computer-readable instructions can be executed by a processor to implement any one of the foregoing mentioned method for storing encrypted data.
Corresponding to the method described above, this application also provides a terminal, which includes devices or units for executing the steps of the method described in
one or more processors; and
a memory storing computer-readable instructions, and the computer-readable instructions, when executed, cause the processors to perform the operation of a method according to any one of the foregoing mentioned method for storing encrypted data.
For example, when the computer-readable instructions are executed, the one or more processors: obtain original data to be encrypted, use a first hash function to hash the original data to be encrypted, and generate an encryption key; perform hash calculation on the original data to be encrypted using a second hash function to obtain first authentication metadata; encrypt an original file for storing the original data based on the encryption key, generate an encrypted file, and perform hash calculation on the encrypted file based on the second hash function to obtain second authentication metadata; generate a content descriptor based on the first hash function, the first authentication metadata, and the second authentication metadata; and store the encrypted file, the first authentication metadata and the second authentication metadata in a file using the content descriptor as identification information to obtain an encrypted storage file.
It should be noted that the contents executed by the data acquisition device 100, the data processing device 200, the data encryption device 300, the data identification device 400, and the data storage device 500 are respectively the same or correspondingly same as those performed by the above steps S100, S200, S200, S300 and S400. For the sake of brevity, the details are not repeated herein.
In a preferred embodiment of the present application, the data storage device 500 is further configured to send the content descriptor to the user, and obtain the retrieval result after the user retrieves the encrypted storage file based on the content descriptor.
In a preferred embodiment of the present application, the data processing device 200 is further configured to decompress the encrypted storage file when the search result is that the encrypted storage file is retrieved, and the encrypted file and the second authentication metadata are obtained; verify the encrypted file according to the second authentication metadata and the second hash function in the content descriptor to obtain a verification result; decrypt the encyrpted file that passes verification by using the encryption key and the content descriptor based on the verification result, and the original file is obtained; verify the original file according to the first hash function and the first authentication metadata in the content descriptor, and feed back the original file that has passed the verification to the user.
It should be noted that the content executed by the data processing device 200 and the data storage device 500 is the same or correspondingly the same as the corresponding execution content in the foregoing method embodiment. For the sake of brevity, the details will not be repeated herein.
Various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of this application fall within the scope of the claims of this application and their equivalent technologies, this application also intends to include these modifications and variations.
It should be noted that this application can be implemented in software and/or a combination of software and hardware. For example, it can be implemented by using an application specific integrated circuit (ASIC), a general purpose computer or any other similar hardware device. In an embodiment, the software program of the present application may be executed by a processor to realize the steps or functions described above. Similarly, the software program (including related data structures) of the present application can be stored in a computer-readable recording medium, for example, RAM memory, magnetic or optical drives or floppy disks and similar devices. In addition, some steps or functions of the present application may be implemented by hardware, for example, as a circuit that cooperates with a processor to execute each step or function.
In addition, a part of this application can be applied as a computer program product, such as a computer program instruction, when it is executed by a computer, through the operation of the computer, the method according to this application can be invoked or provided. The program instructions for invoking the method of this application may be stored in a fixed or removable recording medium, and/or be transmitted through a data stream in a broadcast or other signal-bearing medium, and/or be stored in accordance with the program instructions run in the working memory of the computer equipment. Herein, an embodiment according to the present application includes a device including a memory for storing computer program instructions and a processor for executing the program instructions, and when the computer program instructions are executed by the processor, trigger the device to operate the method based on the aforementioned methods according to multiple embodiments of the present application.
The present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes within the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved. In addition, the word “including” does not exclude other units or steps, and the singular number does not exclude the plural number. Multiple units or devices stated in the device claims can also be implemented by one unit or device through software or hardware. The first, second and other words are used to indicate names, but do not indicate any specific order.
Number | Date | Country | Kind |
---|---|---|---|
202011567499.6 | Dec 2020 | CN | national |