This application claims priority to Chinese Patent Application No. 202410194540.1, filed on Feb. 22, 2024, which is hereby incorporated by reference in its entirety.
The present disclosure relates to the field of data security technology and, in particular, to a full-link data security protection method and system.
A data lifecycle faces security concerns in six stages: data collection, data transmission, data storage, data processing, data exchange, and data destruction. On one hand, data need to pass through all stages of the lifecycle, each stage with a distinct set of vulnerability issues severely challenging data security protection. On the other hand, in the context of proliferating big data and advancing network technology, various stages in data generation and usage are experiencing dynamic evolution very rapidly, with surging data volume. Safeguarding data simply cannot be brushed aside. Cryptography technology plays a key role in every link of the data lifecycle, lending a firm hand to the safety of data. To effectively defend the data at various stages, it will be imperative to create a full-link data protection technology featuring comprehensive protections that broadly target at various security risks at all stages throughout the entire data lifecycle.
The flow and application of data can involve multiple participants and institutions, and data sharing is a must in various critical aspects of storage, transmission, and processing in order to support various business demands and innovative applications. However, existing data security protection technologies are typically designed narrowly towards some isolated scenarios or stages, missing out on a thorough and a holistic view. That is where the limitation lies. Throughout the entire lifecycle of data, from collection, storage, transmission, processing to destruction, each stage faces potential vulnerability threats. Existing art mainly focus on the protection of individual stages, lacking a comprehensive strategy for full-link security protection. Very few schemes bring together access control, data encryption and data destruction together to design a full-link data security protection solution that provides protection for the entire lifecycle of data. For example, for the data destruction stage, Xiong Jinbo et al. have proposed a secure data self-destructing scheme characterized by leveraging the automatic update utility of network nodes to discard ciphertext components to render the ciphertext unrecoverable. However, this scheme does not use proxy re-encryption, nor does it consider the security attributes of the data, and even less, the integrity verification technology for the data.
Therefore, an immediate technical problem to be solved is about how to design a full-link data security protection technology that keep in view all of the secure storage of data, the security attributes of data, the integrity verification of the data, and on-demand and automatic data destruction.
In order to overcome the defects and shortcomings of the existing art, the present disclosure provides a full-link data security protection method and a system. The present disclosure takes data security attributes as attribute inputs, establishes data security policies, and then converts them into user attributes to implement access control. By using this attribute-based proxy re-encryption scheme for encryption, cloud data storage is achieved during the data storage phase, realizing fine-grained access control. This ensures secure storage of data in view of the data security attributes. Data integrity is also verified, and on-demand and automatic data destruction is achieved.
To achieve the aforementioned objectives, the present disclosure adopts the following technical solutions.
The present disclosure provides a full-link data security protection method, which includes the following steps:
In some embodiments of the present disclosure, the public parameters of the key generation center are denoted as:
G1, G2, q, e, P, PKT, H1, g, H2, H3,
In some embodiments of the present disclosure, after a request for registration by a data owner is received, the key generation center: selects a random number ri; selects a secret value Ski shared with the data owner; selects a secret value Sti shared between the data owner and the data user; and calculates the public key for verifying the signature of the data owner, denoted as:
Vi=Ski⊕IDu;
In some embodiments of the present disclosure, the data owner DOi calculates the private key by:
ski=Si+Sti·Si·H2(Di∥ti),
In some embodiments of the present disclosure, the generating, by the data owner, a symmetric key based on a symmetric encryption algorithm; encrypting a plaintext file with the symmetric key to generate a ciphertext file includes: for achieving fine-grained access control, encrypting the symmetric key to form a ciphertext denoted as:
E=(A,E′=(s∥SID)·K1a, {Eci=ptcia}1≤ci≤l);
s=Ks;
ptci=g1t
K1=e(g1,g2)y;
In some embodiments of the present disclosure, the dividing, by the data owner, the ciphertext file into blocks to generate ciphertext components; calculating, by the data owner, a virtual index and a data label for a data block specifically includes:
In some embodiments of the present disclosure, for the transmitting the ciphertext components to the DHT network, the ciphertext components are denoted as:
Scs=(Cs1, . . . , Csi, . . . , CsN);
Csi=(Q1(xi),Q2(xi), . . . , Qv+1(xi));
Q1(x)=CM1+o1x1+o1x2+ . . . +ok−1xk−1;
Qi(x)=CMi+o1x1+o1x2+ . . . +ok−1xk−1;
Qv(x)=CMv+o1x1+o1x2+ . . . +ok−1xk−1;
Qv+1(x)=Cc+o1x1+o1x2+ . . . +ok−1xk−1;
Cc=H2(CM1∥CM2∥ . . . ∥CMv);
CM 32 {CM1,CM2, . . . , CMv},
In some embodiments of the present disclosure, the applying, by the hub node, re-encryption based on a re-encryption key generation algorithm to generate a re-encrypted ciphertext is denoted as:
Cb=(A′,E′=(s∥SID)·K1a·r,{Eci′=((ptcia)RK
RKlAlice→Bob=t′|ψ|/t∥ψ′|;
s=Ks;
ptci=g1t
K1=e(g1,g1)y,
In some embodiments of the present disclosure, the verifying, by the data user, integrity of the ciphertext specifically includes:
The present disclosure also provides a full-link data security protection system, including: an initialization module, a key generation center, a data owner, a data user, a hub node, a cloud server, a DHT network, and an identifier building module, where:
Compared with the existing art, the present disclosure has the following advantages and desirable effects:
To make the objectives, technical solutions, and advantages of the present disclosure clearer, the present disclosure will be explained in more detail in the following in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein merely serve to explain, rather than limit, the present disclosure.
As shown in
System initialization, specifically including the following.
Initializing a KGC (key generation center) to select a cyclic additive group G1 and a cyclic multiplicative group G2 having the bilinear mapping property, and g denotes a key calculation algorithm. G1 and G2 are cyclic groups having a prime order q. A bilinear mapping may be denoted as e: G1×G1→G2. P is a generator of G1. A random number SKT∈Zp* is selected to be a private key for the KGC, where Zp* are positive prime integers. Calculate a public key PKT=SKTP of the KGC. Select three secure hash functions, with H1: G1→Zp*, H2: {0,1}*→Zp* and H3: G1→Zp* each denoting a one-way hash encryption function. The public parameters of the KGC are denoted as:
(G1, G2, q, e, P, PKT, H1, g, H2H3).
The data owner DOi registers with the KGC. The data owner Doi uses its identity IDu to register with the KGC. The data owner DOi generates a private key for signature. The data owner DOi will encrypt data, and then sign the ciphertext using the private key. The data owner DOi transmits its identity IDu to the KGC. After receiving from DOi a request to register, the KGC selects a random number ri, and then a secret value Ski shared with the data owner DOi. Then, the KGC selects a secret value Sti shared between the data owner DOi and the data user. The KGC calculates the public key Vi=Ski⊕IDu for verifying the signature of the data owner DOi, where (and afterwards) ⊕ denotes exclusive OR, then calculates Di,1=riP, Di,2=Vi⊕H1(Sti·Di,1) and Di=(Di,1, Di,2), and then calculates Si=SKT·H1(Vi). The KGC will transmit <Di, Ski, Sti, Si> to the data owner DOi. The data owner DOi calculates its own private key as ski=Si+Sti·Si·H2(Di∥ti), where (and afterwards) ∥ denotes concatenation of bit strings, and ti denotes the time the private key is generated.
The data user DUj registers with the KGC. An organization may have multiple data users, so each data user DUj uses its own identity IDt for the registration with the KGC. In turn, the KGC generates for this data user a private key skj of signature. The data user will, when operating on the data, use the private key to sign the operation message, hence implementing traceability for secured processing of data. Via a secured channel, the KGC transmits Xj=Vi∥Sti to the data user DUj for subsequent verification of the data.
Data creation and collection stage: Setup data security identification for the data owner to encrypt data locally.
In this embodiment, the data security identification is the carrier of data attributes. Security attributes are configured based on the evaluation and the sensitivity of the data. The sensitivity of the data depends on whether the data can be made public and the degree of confidentiality of the data. The data security identification is used to specify the attribute information of the data, where the attributes may include data source, data publication time, data name, data category, data evaluation rating, data sensitivity, etc. These attributes further clarify the security level and protection level required by the data. Different data categories have different contents. In case the collected data involves user privacy, K-anonymity may be used to cut off the one-to-one or one-to-many relations between the identification attributes and the sensitive attributes in the data set, preventing linking attacks. Hence, data privacy breaching may be fended off. The data security identification may be formatted to include: a signed identifier SID, attributes, and a checksum.
Build data security identification during the data generation or collection stage. The data is sourced from data collectors, primarily by various sensors and information software, etc. The collected data is stored in a local storage device of the data owner. In this stage, data security identification is established. Procedures to establish data security identification: begin from building the data security identification and verifying a signed identifier based on a category and security level of an object data. Specifically, the object data can be classified according to the content of a data service or the like. The security level can be classified to be short-term confidentiality, long-term top-secret, public, etc. The data owner generates a digital identifier according to pre-established identifier encoding rules. The digital identifier takes the form of an organizational code having a binary data structure and acts as the unique ID of the data. Setup a data attribute set Au=(w1, w2, . . . , ww). The identifier SID will survive throughout the entire lifecycle of the data. As shown in
Initialize an encryption parameter: use attribute-based proxy re-encryption technology to achieve fine-grained access control. The data owner DOi uploads the data security attribute set to a hub node. The data security attribute set may be denoted as A=(w1, w2, . . . , wl). The HN (hub node) selects a security parameter 1k as its input, and selects a cyclic multiplication group G′ of order p′, with g1 being a generator of G′. The HN acquires the data security attribute set A=(w1, w2, . . . , wl). Here, the data security attribute set A is a subset of the data attributes Au, with l denoting the number of attributes, and all of the l elements w1, w2, . . . , wl of the set represent some data attributes. For the attributes in A, l random numbers y∈Zp* are selected, hence yielding a MK (master key): (t1, t2, . . ., tci, . . . , tl, y), where tl denotes the l random numbers, tci∈Zp*, and 1≤ci≤l.
A public parameter (PK) is denoted as:
(pt1=g1t
Data encryption: the data owner DOi selects a secure symmetric encryption algorithm, with a symmetric key Ks, and uses the symmetric key Ks to encrypt a plaintext file M to generate a ciphertext file m. The HN encrypts the key Ks, and set the secret value to be s=Ks. To encrypt s, select the data security attribute set A (a subset of the data attributes Au). The security attribute set A is used for access control. Select a random number a∈Zp*, and acquire the ciphertext E=(A,E′=(s∥SID)·K1a, {Eci=ptcoa}1≤ci≤l). The symmetric key Ks is associated with the security attributes of the encrypted data, while the data user's private key is associated with an access hierarchy. Only when the access policy in the user's private key is in compliance with the data security attribute set will the data user be able to acquire the symmetric key Ks and the data identifier SID.
Decryption key generation: this step is done by the HN. The access policies as practiced herein are implemented via an access control tree Tree. Each non-leaf node of the tree represents a threshold, and each leaf node describes an attribute. Tree is an access tree with root node R, and TRx represents a subtree of Tree with root node x. An attribute set in compliance with the access hierarchy tree is denoted as TRx(
)=1. An iteration to calculate TRx(
) follows: for x being a non-leaf node, calculate the access hierarchy tree for all child nodes x′ of x as TPx′(
), and set TPx(
)=1 if and only if at least kx child nodes return 1. For x being a leaf node, set TPx(
)=1 if and only if att(x)∈
. The access control tree algorithm produces a key if and only if Tree(A)=1. Hence, the attribute-based encryption method ensures that a user is granted with decryption only when the user has attributes and access policies in compliance with the access policies of the access control tree. The data user is able to decrypt messages calculated under the set of attributes A. In the access control tree, a non-leaf node x has NMx child nodes. kx acts as a threshold of x, with 0<kx≤NMx. A node is evaluated to be True when at least kx child nodes have been evaluated to be True. In particular, when kx=1, the node becomes an OR gate. When kx=NMx, the node becomes an AND gate. For a leaf node, we have kx=1. Some functions are defined in the following: (1) parent(x) denotes a parent node of the node x; (2) att(x) denotes attribute value(s) associated with x, where the x denotes a leaf node; (3) For a node in Tree, an order relation is defined for all of its child nodes, with numberings in the range of 1 to NMx. A function index(x) returns the number associated with x, i.e., a value for the index as contained in the access hierarchy of a given key and as uniquely assigned to the node x by various means. Specifically, an access control tree Tree is built. For a node x in the tree, set kx to be a threshold for the node x. Calculate the order dx=kx−1 for the polynomial qx. From top to bottom, select for each non-leaf node x a polynomial qx of order dx=kx−1, with the root node R having qR(0)=y, where y denotes a random number. For other deeper nodes, set qx(0)=qparent(x)(index(x)). For each leaf node x, set a secret value: Dx=g1q
Data transmission and data storage stages: generate ciphertext components, and upload the data to a cloud server and a DHT network, specifically including:
The data owner DOi divides the ciphertext data into blocks, generating ciphertext components. Initially, the data owner DOi divides the encrypted data file m into v data blocks F={m1′, m2′,m3′, . . . , mv′}. From the v data blocks, extract CM1=u1, CM2=u2, . . . , CMv=uv, meaning that various quantities of bits are extracted from the individual data blocks; calculates Cc=H2(CM1∥CM2∥ . . . ∥CMv), CM={CM1, CM2, . . . , CMv}, with the remaining blocks being Mdso={m1, m2, m3, . . . , mv}. With the ciphertext components being generated, the ciphertext stored in the CS (cloud server) is no longer the entire ciphertext. This is a preparation for data destruction.
The data owner DOi calculates virtual indexes of the data blocks. The data owner uploads the data to the CS in the data format of a tuple (virtual index-data block-data label). Take as input a message block mc indexed by c, where mc∈Mdso, calculate the virtual index ζc=c·SID. The data blocks uploaded to the cloud server will use these virtual indexes as their indexes for storage.
The data owner DOi calculates labels of the data blocks. The data owner DOi takes as input its private key ski and an encrypted data block mc with virtual index ζc to calculate the data block label, and the label bears the signature of DOi. DOi chooses a random yc and calculate Yc=yc·P and δc=H3(Di∥Yc∥mc∥Ti∥Kc). Here, Ti denotes a timestamp of the signature. Then, DOi generates the relevant signature σm
The data owner DOi transmits the ciphertext components generated from CM to the DHT network. Since the DHT network has distributed structure, a secret sharing scheme is used to store the CM. Let there be N ciphertext components. Take a large prime number bb, and select from a finite field containing 1 to bb some k−1 numbers o1, o2, . . . , ok−1. Set the threshold value to be k, and construct v+1 Lagrange polynomials:
Q1(x)=CM1+o1x1+o1x2+ . . . +ok−1xk−1;
Qi(x)=CMi+o1x1+o1x2+ . . . +ok−1xk−1;
Qv(x)=CMv+o1x1+o1x2+ . . . +ok−1xk−1;
Qv+1(x)=Cc+o1x1+o1x2+ . . . +ok−1xk−1.
Calculate the ciphertext components Scs=(Cs1, . . . , Csi, . . . , CsN),
The algorithm uses the data identification SID as a seed for a secure pseudo-random number generator to generate N distribution indexes l1, l2, . . . , lN, then produces, from the ciphertext components associated with the indexes, N tuples <li, Csi>; and then distributes all of the tuples throughout a DHT network node for storage.
The data owner DOi uploads the data to the cloud server, and transmits a message to the data user.
As shown in
Processing and exchange stage: the data user acquires a decryption key, verifies data integrity, and decrypts and uses the data. The specific steps include:
The hub node (HN) performs re-encryption. Attribute-based proxy re-encryption means that the access hierarchy can be redefined using the data user's attributes regarding acquisition of the decryption key. Initially, the re-encryption key is generated. The re-encryption key generation algorithm yields a one-way re-encryption key.
Input user attributes A′. For all attributes in A′, select t′ci∈Zp*, where 1≤ci≤l, and calculate RKci=RK1Alice→Bob=t′1/t1, RK2Alice→Bob=t′2/t2, . . . , RKlAlice→Bob=t′|ψ|/t|ψ′|. Here, the access control hierarchies for Alice and Bob are, respectively, denoted as ψ and ψ′, with ψ′ including no more attributes than that of ψ. Hence, the data attribute A defined access control hierarch ψ has been converted into a user attribute A′ defined access control hierarch ψ′. A leaf node has a secret value Dx′=g1q
The HN re-encrypts to acquire the ciphertext:
Cb=(A′,E′=(s∥SID) ·K1a·r,{ci′=((ptcia)RKciAlice→Bob)T}1≤ci≤l),
The data user DUj acquires data. The data user DUj performs the decryption and verification following a procedure as illustrated in
Integrity verification of the ciphertext. The data user DUj begins from verifying the CM1′, CM2′, . . . , CMv′ regenerated from the extracted ciphertext components, then calculates H2(CM1ζ∥CM2′∥ . . . ∥CMv′) and determine whether it equals to Cc. Then, the data user DUj verifies the corresponding v tuples (ζc, mc, σv
The equation is derived as below:
Through the data signatures and ciphertext data labels, the data user may verify the integrity of the data.
Decryption by the data user.
A data user DUj in accordance with the access policy should have acquired from the DHT network N tuples <li, Csi> associated with the ciphertext component index, and from the cloud server v tuples (ζc, mc, σv
where S is a sub set of Zp*. When x is a leaf node, e(g1, g1)r·a·q
Calculate
Πz∈S
=Πz∈S
=Πz∈S
=Πz∈S
=Πz∈S
=e(g1,g1)r·a·q
Then, calculate K1a·r. After that, solve E′=(s∥SID)·K1a·r for s∥SID. Since SID has a fixed length, the data user can always solve for the decryption key by letting Ks=s. With the Ks, the ciphertext file m may be decrypted to obtain the plaintext file M. Therefore, different data users may access files of various security levels, realizing fine-grained access control throughout the entire lifecycle of the data.
Using the data. Having acquired the plaintext file, the data may be operated on. After the operation, a random number may be selected, and the data operation with a personal private key. The operation may be logged. When the data falls into a high protection level, then duplication and dissemination of the data need to be regulated to prevent the leakage of data with high protection level.
Data Destruction Stage:
The last stage in the data lifecycle is data destruction, where discrete destruction control can be practiced based on the data security identification. Specifically, the data security identification implementation can be stored separately from the physical data. Since the data security identification has already explicitly described the data attributes, it can be leveraged to track the access and flow of the data throughout its lifecycle for analysis and administration. The standards for data destruction will be decided based on critical information such as the confidentiality level and category of the data. Appropriate retention period and data expiry time are established based on the data security level. In this embodiment, the cloud server will automatically destroy a document upon expiration of the data. To ensure that the data is completely irrecoverable, the automatic update function of the DHT network node data is used to discard the index tuples <li, Csi> associated with the stored ciphertext components, forestalling the recovery of CM. Without the CM, the partial knowledge of Mdso alone would be insufficient to yield the decrypted plaintext because only with the entire original ciphertext will we be able to use the symmetric key to acquire the decrypted plaintext file M. In this way, secure self-destruction of the data is hence achieved. To ensure the security of critical data, data destruction may be prearranged with data backup. Recovery after the data destruction will become extremely difficult, if not impossible, so measures must be in place to prevent the accidental loss of any essential information. For the data destruction, delicate destruction strategies must be formulated according to the confidentiality level and category of the data, with different data handled distinctly to meet their specific confidentiality needs. When dealing with files of high confidentiality levels, there should be additional measures to guarantee complete annihilation. Such sensitive data may cycle through multiple erase-overwrites to prevent any recovery.
As shown in
In this embodiment, the initialization module is used to: initialize the key generation center.
In this embodiment, the key generation center (KGC) is used to: generate public parameters, calculate a public key for verifying a signature of the data owner; and generate a private key for the data user for signing.
In this embodiment, the data owner is used to: register with the key generation center using the identity IDu of the data owner; generate a private key; and sign a ciphertext based on the private key.
In this embodiment, the data user (DU), i.e., the one who uses the data, is used to: register with the key generation center using an identity IDt of the data user; and sign an operation message based on the private key.
In this embodiment, the identifier building module is used to: build a data security identification based on a category and security level of an object data; and verify a signed identifier.
In this embodiment, the hub node (HN) is used to: acquire a data security attribute set; and generate a decryption key. Assuming that the HN is honest and trustable, this embodiment generates the user's decryption private key through user attributes.
In this embodiment, the data owner (DO) is used to: generate a symmetric key based on a symmetric encryption algorithm; encrypt a plaintext file with the symmetric key to generate a ciphertext file. The symmetric key is associated with a security attribute of an encrypted data.
In this embodiment, the data owner is further used to: divide the ciphertext file into blocks to generate ciphertext components; calculate a virtual index and a data label for a data block; transmit the ciphertext components to a DHT network; and upload a tuple including the virtual index, the data block, and the data label to the cloud server.
In this embodiment, the hub node is further used to: apply re-encryption based on a re-encryption key generation algorithm to generate a re-encrypted ciphertext; the data user is further used to: acquire the re-encrypted ciphertext, which is then decrypted to obtain the signed identifier and a secret value; generate multiple distribution indexes based on the signed identifier; and acquire from the DHT network a tuple having a ciphertext component associated with the index.
In this embodiment, the data user is further used to: calculate a virtual index based on the signed identifier; acquire the tuple consisting of the virtual index, the data block and the data label from the cloud server; verify integrity of the ciphertext via a data signature and a ciphertext label; recover the data blocks of the ciphertext file corresponding to the ciphertext components using a Lagrange interpolation method; assemble the data blocks to acquire a complete ciphertext file that is decrypted to obtain the plaintext file based on the symmetric key.
In this embodiment, the cloud server (CS) is further used to: store a tuple including the virtual index, the data block, and the data label. This embodiment assumes that the cloud server is untrusted but honest.
In this embodiment, the distributed hash table (DHT) network is a decentralized storage network that supports redundancy. Information uniquely identified by key-value pairs can be dispersed throughout multiple nodes. The DHT network may be used to store the tuples generated from the ciphertext components, and has automatic data update functionalities. The index tuples associated with the stored ciphertext components may be discarded under preset conditions based on the data security identification, and the original data information will be irrevocably deleted after the automatic update.
The above embodiments are preferred embodiments of the present disclosure, but the embodiments of the present disclosure are not limited thereto. Any changes, modifications, substitutions, combinations, simplifications that do not depart from the spirit and principles of the present disclosure should be considered equivalent alternatives and are deemed within the scope of protection of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202410194540.1 | Feb 2024 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
6950517 | Hawkes | Sep 2005 | B2 |
8108678 | Boyen | Jan 2012 | B1 |
12200110 | Yu | Jan 2025 | B2 |
20020095454 | Reed | Jul 2002 | A1 |
20220253516 | Chung | Aug 2022 | A1 |
Number | Date | Country |
---|---|---|
101227271 | Jul 2008 | CN |
105376054 | Mar 2016 | CN |
107707354 | Feb 2018 | CN |
108111313 | Jun 2018 | CN |
108632032 | Oct 2018 | CN |
110474645 | Nov 2019 | CN |
111800424 | Oct 2020 | CN |
111884815 | Nov 2020 | CN |
112019591 | Dec 2020 | CN |
112434343 | Mar 2021 | CN |
113194082 | Jul 2021 | CN |
114710370 | Jul 2022 | CN |
115330383 | Nov 2022 | CN |
115529123 | Dec 2022 | CN |
116132105 | May 2023 | CN |
2251415 | May 2006 | ES |
2002101092 | Apr 2002 | JP |
WO-2020133032 | Jul 2020 | WO |
WO-2024000430 | Jan 2024 | WO |
Entry |
---|
Zhou Hao, Ma Jian feng, Liu Zhiquan, Wang libo, Wu Yongdong, Fan Wenjie “Blockchain-assisted solution for emergency message trust evaluation in the VANET” «Journal of Xidian University» No. 4, Apr. 11, 2023 (Apr. 11, 2023), full text. |
Wu Mingli “Research on Attribute-based Searchable Encryption in Cloud Storage” «CMFD, Information Technology» No. 4, Apr. 15, 2022 (Apr. 15, 2022), pp. 1-61. |
Abdelrahn1an Altigani, etc. “Key-dependent Advanced Encryption Standard” «2018 International Conference on Computer, Control, Electri cal, and Electronics Engi neeri ng (ICCCEEE), Nov. 1, 2018 (Nov. 1, 2018 ), full text. |