The present invention refers to a method and to a unit of or for operating a storage means, to a storage means as such as well as to a system for data processing.
Known storage means and methods for operating the same recite on the so-called Shannon model for message transmission and storage of channels wherein the messages and storage items can reliably be transmitted and written in an exponentially growing manner regarding the involved block length. However, the immense and fast development and increase of the amount of data to be managed in a large variety of applications calls for more efficient data storage and identification strategies also taking care of increasing secrecy issues, in particular with an identification process to be understood as a process of detecting and/or confirming the presence or absence of a storage item in a storage means, namely in the sense of the Ahlswede and Dueck introduced in 1989 [2].
It is an object underlying the present invention to provide methods and units for operating storage means, storage means as such as well as systems for data processing which are configured to more efficiently and more safely store and identify data on storage means.
The object underlying the present invention is achieved by a method of operating a storage means according to independent claim 1, by a unit for operating or controlling storage means according to independent claim 11, by a storage means according to independent claim 12, and by a system for data processing according to independent claim 13.
According to a first aspect of the present invention a method of or for operating a storage means is provided, wherein for writing and storing a storage item to the storage means (A) the storage item to be written and stored is provided, (B) an encoding process by means of randomization is applied to the storage item in order to generate and to provide a randomized encoded storage item, and (C) the randomized encoded storage item is written and stored to the storage means. At least a first randomization process is underlying the encoding process. Said first randomization process is a randomization process dedicated and assigned to the underlying storage means.
The process of providing a storage item to be written or stored may also be referred to as a process of acquiring, receiving, generating, obtaining or the like, for instance from an apparatus, a sensor, a processor or another storage means.
According to a preferred embodiment of the method according to the present invention, at least one second randomization process is underlying the encoding process.
By having two randomization processes underlying the encoding process, a distinction can be made between a secrecy insuring and secrecy non-ensuring randomization processes.
In this regard the second randomization process may be a randomization process dedicated to a particular hardware item.
In particular, the second randomization process may be based on a PUF signature of the underlying hardware item in order to ensure a high degree of secrecy. According to a preferred embodiment of the present invention, by means of the PUF signature storing a storage item can be made secure and can be protected against an eavesdropper. The PUF signature may therefore be designed having a length which is comparable small when compared to the block length of an underlying storage cell and/or assigned to a storage item to be written and/or stored in the storage means. Additionally, a secret key derived from the PUF signature may also advantageously have a negligible length.
The first randomization process may be a public randomization process.
According to an alternative and preferred embodiment of the present invention, a respective randomization process is obtained from and/or based on a discrete memoryless multiple source with respect to one or multiple underlying probability distributions and alphabets.
According to a concrete realization of the inventive method for operating a storage means, the encoding process and its underlying encoder may be configured in order to generate from the obtained storage item the encoded storage item—in particular based on a source item obtained from a discrete memoryless source—as a concatenation of
Said common randomness and/or said secret key may be generated and derived by the encoder and in particular by dedicated units thereof and/or based on the storage item and the source item obtained from a public source, a PUF source and/or a general and underlying discrete memoryless multiple source on an underlying alphabet.
According to a further embodiment of the method for operating the storage means and for identifying within the storage means the presence or the absence of a given storage item,
Preferably, for an encoding process and its underlying encoder and/or the decoding process and its underlying decoder are configured, such that by taking into account said helper data and said helper message conveyed with the encoded storage item written to the storage means
For the identification process and/or for the outputting process regarding the identification message, the decoding process and its underlying decoder may advantageously be configured in order
When summing up all the circumstances given above, the present invention may additionally or alternatively be described by means of the following description:
When the storage process or system receives an item d as a message to store, an encoder Φd is used for the encoding process S2, which may have the following configuration:
Φd(Xn)=(M,Td(K)⊕
Thus by means of the encoder Φd the encoded storage item Uk as a public message is—based on the source item Xn—constructed, written S3 to the storage means 10 and is a concatenation of
Such identification mappings Td are part of a protocol for identification and can—as set forth in Verdu and Wei, [15]—explicitly be constructed.
For identification of a message
Based on Uk and Yn the decoder Φd generate common randomness
The decoder Φd reconstructs Td(K) from Td(K)⊕
Then the decoder Φd compares the reconstruction of Td(K) and T
In addition, the present invention also provides a unit for operating a storage means which is configured to initiate, perform and/or control a method for operating a storage means, the method being configured according to the present invention.
The present invention also suggests the provision of a storage means which is configured to store storage items and to perform or to be used, operated and/or ruled by a method according to the present invention and which in particular comprises a unit for operating a storage means which is configured according to the present invention and/or a connection to such a unit.
Finally, a system for data processing according to the present invention is configured to be used with and/or ruled by a method according to the present invention and which in particular comprises a storage means designed according to the present invention.
These and further details, advantages and features of the present invention will be described based on embodiments of the invention and by taking reference to the accompanying figures.
In the following embodiments and the technical background of the present invention are presented in detail by taking reference to accompanying
The depicted and described features and further properties of the invention's embodiments can arbitrarily be isolated and recombined without leaving the gist of the present invention.
The present invention refers to a method S of operating a storage means 10, wherein for writing and storing a storage item d to the storage means 10 the storage item d to be written and stored—in particular by using the concept and theory of identification—is provided S1, a encoding process S2 by means of randomization is applied to the storage item d in order to generate and to provide a randomized encoded storage item Uk, and the randomized encoded storage item Uk is written and stored S3 to the storage means 10. At least a first randomization process S4 is underlying the encoding process S2. Said first randomization process S4 is a randomization process dedicated and assigned to the underlying storage means 10. The present invention further refers to a unit for operating a storage means 10, to a storage means 10 and to a system 1 for processing data. By having two randomization processes S4, S5 underlying the encoding process S2, a distinction can be made between a secrecy insuring and secrecy non-ensuring randomization processes.
In
The process S2 of encoding the storage item d or message is realized by an encoder Φd as already defined above and as further elucidated below and it yields an encoded storage item Uk or message which is written by a process S3 and thereby stored into the underlying storage means 10, which is for instance realized by a public database.
The process S2 of encoding the storage item d as realized by the encoder Φd is thus dependent on the underlying message or storage item d to be written or stored as well as on the source inputs Xn provided by the first and second randomization processes S4 and S5 and its underlying sources 30 and 40, respectively, and eventually on the further concrete nature and properties of the encoder Φd.
The result of the decoding process S6 is provided to a process S7 of identification which controls—by a process S7′—a subsequent process S8 of outputting an identification message.
Said identification message provided by the process S8 of outputting yields a confirming result and for instance a “yes” in case that the investigated message
In addition, the attack of an eavesdropper 20 is elucidated in
More details on the embodiments shown in
Thus, these and further aspects of the present invention will also be described in detail in the following:
In connection with the present invention, secure storage on a public database such that a stored messages can be identified is considered. It is assumed that legitimate users have access to the output of a source. This source is configured and used to generate common randomness which is used for identification. A protocol is defined for secure storage for identification such that the number of messages that can be identified grows doubly exponentially with the number of symbols read from the source. In addition, privacy leakage of the protocols used for identification is considered.
In the following, some of the aspects of identification underlying the present invention will first of all be developed by means of the concept of point-to-point transmissions and will then be used and applied in order to develop a storage for identification model on which the present invention resides:
One of the most basic models in information theory is the discrete memoryless channel or DMC for point to point transmission. This concept has been introduced by C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379 to 423, 1948, [1]. For this model, the Shannon capacity is defined as the supremum of all achievable transmission rates.
Informally speaking, a rate is achievable if it is possible to transmit messages at this rate, while the message sent can be reconstructed from the channel output with high probability. The number of messages that can reliably be transmitted for this notion of achievability grows exponentially with the block length. In addition to the Shannon capacity, the identification capacity may be introduced as set forth by R. Ahlswede and G. Dueck, “Identification via channels,” IEEE Transactions on Information Theory, vol. 35, no. 1, pp. 15 to 29, 1989, [2].
Here again point to point transmission over a discrete memoryless channel is considered, but the definition of achievability is different.
The decoder now does not try to find out the message that has been sent from the channel output, but the decoder is interested in a distinct or single message and tries to find out whether or not this message has been, i.e. the decoder tries to identify the message. Of course the sender does not know the message the receiver is interested in.
In this scenario the probability that the receiver correctly identifies the message should be close to one. For this notion of achievability the number of messages that can reliably be identified grows doubly exponentially with the block length.
A corresponding strong converse result has been found T. S. Han and S. Verdu, “New results in the theory of identification via channels”, IEEE Transactions on Information Theory, vol. 38, no. 1, pp. 14 to 25, 1992, [3].
The further development of the concept of identification used for and applied to a model of storage for identification can be further motivated by having a look at possibly use cases taken from a variety of possible applications one can think of in the context of the present invention:
In addition, there exists a variety of execution examples, one of which being given by the following scheme:
In this context two different models are considered:
In the following, additional technical background for better understanding the present invention's gist and its differences when compared to common strategies of transmitting, writing and/or storing storage items or messages is summarized:
Storing is traditionally only performed in the Shannon picture. Here, all messages are stored so that exponentially many messages can be stored. When reading the memory contents, the question is answered which message was stored.
R. Ahlswede and I. Csiszar, “Common randomness in information theory and cryptography. ii. cr capacity”, IEEE Transactions on Information Theory, vol. 44, no. 1, pp. 225 to 240, 1998, [4], defined the so called source model for generating common randomness.
Common randomness plays an important role for identification. In addition to R. Ahlswede and I. Csiszar [4], it is described by R. Ahlswede and V. B. Balakirsky, “Identification under random processes”, Citeseer, 1995, [5], how to make use of the common randomness generated from the source to reliably identify a message by sending a helper message over a channel. Here the number of messages that can reliably be identified grows doubly exponentially with the number of symbols read from the source.
Security is a key requirement for modern communication and storage systems. A promising approach to realize security is physical layer security based on information theoretic security.
A basic model considered in information theoretic security is the wiretap channel as discussed by A. D. Wyner, “The wire-tap channel”, Bell Labs Technical Journal, vol. 54, no. 8, pp. 1355 to 1387, 1975, [6], and I. Csiszar and J. Korner, “Broadcast channels with confidential messages”, IEEE transactions on information theory, vol. 24, no. 3, pp. 339-348, 1978, [7].
In this background context and in contrast to point-to-point transmissions it is preferably assumed in the context of the present invention that an attacker or eavesdropper has access to the message sent via an additional discrete memoryless channel. In particular, the present invention is concerned with protocols which allow for reliable communication between the legitimate users while making it hard for an attacker to decode the message from the channel output he has access to. The number of messages that can reliably and securely be transmitted in this scenario grows exponentially with the block length.
According to R. Ahlswede and Z. Zhang, “New directions in the theory of identification via channels”, IEEE transactions on information theory, vol. 41, no. 4, pp. 1040 to 1050, 1995, [8], identification for the wiretap channel is considered. It can be shown that the number of messages that can reliably be identified as described above in this case grows doubly exponentially with the block length. The secure identification capacity even equals the Shannon capacity of the main channel. This result can be generalized according to H. Boche and C. Deppe, “Secure identification for wiretap channels; robustness, super-additivity and continuity”, IEEE Transactions on Information Forensics and Security, 2018, [9], and “Secure identification under jamming attacks”, in Information Forensics and Security (WIFS), 2017 IEEE Workshop on. IEEE, 2017, pp. 1 to 6, [10], robust identification for wiretap channels is considered.
For the source model one can also consider secret key generation as indicated by R. Ahlswede and I. Csiszar, “Common randomness in information theory and cryptography—Part i: secret sharing”, IEEE Transactions on Information Theory, vol. 39, no. 4, 1993, [11].
T. Ignatenko and F. M. Willems, “Biometric security from an information theoretical perspective”, Now, 2012, [12], and L. Lai, S.-W. Ho, and H. V. Poor, “Privacy security trade offs in biometric security systems”, in Communication, Control, and Computing, 2008, 46th Annual Allerton Conference on IEEE, 2008, pp. 268 to 273, [13], interpret the discrete memoryless source from the source model as a biometric source and they consider the privacy leakage of the protocols for secret key generation.
Some results concerning common randomness and secret key generation from a discrete memoryless multiple source are essential for the present invention. In the following common randomness is also referred to as CR, a secret key is also referred to SK, and a discrete memoryless multiple source is also referred to as DMMS.
In the following, particular information theoretic entities and requirements for defining the present invention will be motivated, introduced and defined:
First of all, in the context of the present invention inter alia the following information theoretic model is considered:
Definition 1.
Let n∈a natural number. The source model consists of a discrete memoryless multiple source (DMMS) PXY, a (possibly randomized) encoder F:Xn→× and a (possibly randomized) decoder G: n×→. Let Xn and Yn be the output of the DMMS. The random variables or RVs (K,M) are generated from Xn using F and the random variable {circumflex over (K)} is generated from (Yn,M) using G. We call (F,G) a common randomness/secret key or CR/SK generation protocol.
In addition, the generation of common randomness or CR as introduced above is considered.
Definition 2.
Let L≥0. The item R(L)≥0 is called an achievable common randomness or CR generation rate with forward communication rate constraint L for the source model if for every δ>0 there is an n0=n0(δ) such that for all n≥n0 there is a common randomness/secret key or CR/SK generation protocol such that the relations
are fulfilled for a c>0. The corresponding CR/SK generation protocols are called common randomness or CR generation protocols with rate constraint. The supremum of all achievable CR generation rates with forward communication rate constraint L is denoted by CR capacity CCR(L).
Remark 1.
The last achievability requirement (2) is required in order to avoid protocols where the CR is generated deterministically while H(K) is arbitrarily large. It can be motivated to require =by arguing that, together with the bound on |, this implies an arbitrarily small distance between
H(K). So = is required.
Remark 2.
It can be seen that for each CR generation protocol with rate constraint one can find a CR generation protocol with rate constraint such that
is valid for a c>0. That is why in the following one can always consider such protocols where the distribution of the common randomness CR is in this sense near the uniform distribution.
In Ahlswede and Csiszar 1998 CCR(L) has been further characterized.
Also privacy leakage for the source model is considered. This makes sense when one assumes that the DMMS, that is part of the source model, models a PUF source.
Definition 3.
A triple (RCR,RFC,RPL), RCR,RFC,RPL≥0 is called an achievable CR generation rate versus forward communication rate versus privacy leakage rate triple for the source model if for every δ>0 there is an n0=n0(δ) such that for all n≥n0 there is a CR/SK generation protocol such that the relations
are fulfilled for a c>0. The corresponding CR/SK generation protocols are referred to as private CR generation protocols. The set of all rate triples that are achievable using private CR generation protocols is referred to as the CR capacity region CR.
In the context of the present invention, one is interested in CR. In a first approach one considers private CR generation protocols with deterministic encoders and decoders (f, g).
The corresponding CR capacity region is denoted by CRd.
In Ahlswede and Csiszar 1998, deterministic CR generation protocols with rate constraint have been considered and the corresponding capacity has been characterized, which is here referred to as CCRd(L).
They following property is valid:
Theorem 1.
It holds that
where the maximization runs over all random variables V such that the property V−X−Y and the property I(V;X)−I(V;Y)≤L are fulfilled. One also only has to consider random variables V obeying |V|≤|X|.
One also considers secret key generation with perfect secrecy.
Definition 4.
The item R≥0 is called an achievable SK generation rate for the source model if for every δ>0 there is an n0=n0(δ) such that for all n≥n0 there is a CR/SK generation protocol such that the relations
are fulfilled. The corresponding CR/SK generation protocols are denoted by perfect SK generation protocols. One refers the supremum of all achievable SK generation rates as the SK capacity CSK.
In the following result can be proven:
Theorem 2.
It holds that CSK=I(X;Y).
Remark 3.
In the achievability proof one can use a deterministic encoder and decoder. This implies the relation
log||≤log|X|.
In the description of the section describing the state of the art, the storage of exponential data amount according to the Shannon picture as used today has been described. In Shannon's picture of storage, big data is actually a huge problem. The gap between the data rate generated by big data and what Moor's law provides for the development of storage media continues to diverge. For storage for later identification, this problem does not exist.
One considers the source model for generating common randomness. But in contrast to Ahlswede and Csiszar 1998 one may also consider privacy leakage of the corresponding protocols while interpreting the source as a biometric source. One can then use common randomness for identification.
Therefore, the invention's contribution is inter alia twofold in the following sentence.
The capacity for common randomness generation is characterized from a discrete memoryless source while considering privacy leakage. Protocols for identification using a discrete memoryless source are constructed. In contrast to Ahlswede and Csiszar 1998 and Ahlswede and Balakirsky 1995 it is assumed in the context of the present invention that a helper message is stored on a public database.
The protocols for identification are constructed such that they provide secrecy. So these protocols allow for secure storage for identification. The present invention may also consider the privacy leakage of these protocols.
The present invention is inter alia based on the presentation of a model for secure storage for identification and corresponding protocols.
In the following, an information theoretic model of the storage process for identification underlying the present invention is defined.
Definition 5.
Let k, n∈. The storage for identification model consists of the alphabet , a discrete memoryless multiple source (DMMS) PXY on the alphabet X×, a set of (possibly randomized) encoders , Φd: Xn→k and a set of (possibly randomized) decoders {, Ψd: k×n→{0,1} or all d∈ for. Let Xn and Yn be the random variables (RVs) generated from PXY. We call ({, {) a storage for identification protocol.
Assume that for each storage cell we read B>0 symbols from the PUF source. Now properties of intuitively good storage for identification protocols are discussed.
When the decoder Φd is interested in the message d it is reasonable to require that when d is stored on the database the decoder Φd decides correctly with high probability. One refers to the corresponding error as an error of the first kind. So the probability that the decoder makes an error of the first kind should be small.
When the message stored on the database is not d the decoder Φd should also decide correctly with high probability. We call the corresponding error an error of the second kind. So the probability that the decoder Φd makes an error of the second kind should be small.
One is interested in the largest possible identification rate, where one considers the number of storage cells as a resource. As usual for identification one considers the second order rate.
One considers an eavesdropper 20 who reads from the public database 10. It is assumed that the eavesdropper 20 wants to identify a specific message. The eavesdropper 20 knows the protocol used and one can even assume that the eavesdropper 20 knows the message the decoder wants to identify. It is desired that the sum of the probability that the eavesdropper makes an error of the first kind and the probability that the eavesdropper makes an error of the second kind is close to one.
The output of the PUF source uniquely characterizes a device, so one possibly wants to reuse parts of it. That is why one wants that the attacker does not have a lot of information about the PUF source output X.
This motivates the following definition of achievability for the storage for identification model.
Definition 6.
Let B>0. The tuple (RID,RPL) RID,RPL≥0 is called an achievable rate pair for the storage for identification model if for every δ>0 there is a k0=k0(δ) such that for all k≥k0 and n=┌B·k┐ there exists a storage for identification protocol such that for all d,
are fulfilled for all decoding strategies { of an eavesdropper 20.
The first item describes decoder errors of the first kind, the second item describes decoder errors of the second kind, the third item describes the property of the model in view of an eavesdropper 20, the fourth item describes the increase of manageable storage items in the model with its double exponential growth or increase, the fifth item describes the model's privacy leakage properties.
The corresponding storage for identification protocols are referred to as secure storage protocols. We call the set of all rate pairs that are achievable using such storage for identification protocols capacity region ID(B).
Remark 4.
Requirement (3)—the third item as given above in definition 6—ensures that the protocols are optimal considering security in the following sense. There are decoding strategies for the eavesdropper such that the sum of the probability that the eavesdropper makes an error of the first kind and the probability that he makes an error of the second kind is 1, while the eavesdropper does not use any of his observations from the public database.
Remark 5.
The secret model chosen from Ignatenko and Willems 2012 can be interpreted as a model for secure storage making use of a biometric source. But here the decoder reconstructs the message stored on the database instead of identifying it. Correspondingly, the set of messages that can be stored on the database grows exponentially with the block length, instead of doubly exponentially.
The following observation concerning the capacity region can be derived:
Lemma 1.
Let B>0. ID(B) is a closed set.
One may use Theorem 1 obtained from Ahlswede and Csiszar 1998 in order to characterize CRd.
Theorem 3.
It holds that
and one only has to consider random variables V fulfilling |V|≤|X|+1.
Now one considers CR generation with randomized private CR generation protocols.
Theorem 4.
It holds that
and one only has to consider random variables V fulfilling |V|≤|X|+1.
Now ID(B) is characterized. In order to do so one makes use of results for CR and SK generation while considering the privacy leakage. Firstly consider deterministic secure storage for identification protocols ({, ). One denotes the corresponding capacity region by IDd(B) and obtains the following achievability result.
Theorem 5.
It holds that
wherein the union is taken over all random variables V fulfilling V−X−Y and I(V;X|Y)B≤log||.
Now randomized secure storage for identification protocols are considered.
Theorem 6.
It holds that
wherein the union is taken over all random variables V such that V−X−Y and I(V;X|Y)B≤log||−ϵB.
In the following, an information theoretic model of a storage process for identification with two sources 30, 40 is defined.
Definition 7.
Let k, n∈, a finite set and 1≥α≥0. The two source storage for identification model consists of the alphabet , two discrete memoryless multiple sources (DMMSs) PX
It is reasonable to require a small probability that an error of the first kind occurs when using the decoder for the message or storage item d to find out whether or not the message or storage item Φd is stored on the database 10. One also desires that an error of the second kind occurs with a small probability. One considers an eavesdropper 20 who reads from the public database 10 and who wants to find out whether or not message d is stored on the database 10. The eavesdropper 20 also has access to the public source 30. It is desired that the sum of the probability that the eavesdropper 20 makes an error of the first kind and the probability that the eavesdropper 20 makes an error of the second kind is close to 1.
One is interested in the largest possible identification rate, where one considers the number of storage cells as a resource. One considers a fixed ratio B of the number of symbols read from the two sources and the number of storage cells in the database 10.
The output of the PUF source uniquely characterizes a device, so one possibly wants to reuse parts of it. That is why one desires that the attacker 20 does not have a lot of information about the PUF source output X2n−┌αn┐.
This motivates the following definition of achievability for the storage for identification model.
Definition 8.
Let B>0. We call the tuple (RID,RPL) RID,RPL≥0 an achievable rate pair for the storage for identification model if for every δ>0 there is a k0=k0(δ) such that for all k≥k0 and n=┌B·k┐ there exists a storage for identification protocol such that for all d,
are fulfilled for all decoder strategies { of an eavesdropper 20. The set of all rate pairs that are achievable using such storage for identification protocols is referred to as the capacity region ID,2(B).
The considerations on CR generation may be extended by adding a second source.
Definition 9.
Let n∈ and let 1≥α≥0. A two source model consists of two discrete memoryless multiple sources (DMMSs) PX
Inspired by the discussion on the achievability for the source model and the storage for identification model one can define achievability for the two source model.
Definition 10.
The triple (RCR,RFC,RPL), RCR,RFC,RPL≥0 is referred to as an achievable CR generation rate versus forward communication rate versus privacy leakage rate pair for the two source model if for every δ>0 there is an n0=n0(δ) such that for all n≥n0 there is a CR generation protocol such that the relations
are fulfilled for a c>0. The set of all rate triples that are achievable using such CR generation protocols is referred to as the CR capacity region CR.
In addition to the foregoing description of the present invention, for an additional disclosure explicit reference is taken to graphic representation of
Number | Date | Country | Kind |
---|---|---|---|
10 2018 210 104.3 | Jun 2018 | DE | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/066258 | 6/19/2019 | WO | 00 |