1. Field of the Invention
The present invention relates to computerized cryptographic methods, and particularly to a message authentication code with blind factorization and randomization.
2. Description of the Related Art
In recent years, the Internet community has experienced explosive and exponential growth. Given the vast and increasing magnitude of this community, both in terms of the number of individual users and web sites, and the sharply reduced costs associated with electronically communicating information, such as e-mail messages and electronic files, between one user and another, as well as between any individual client computer and a web server, electronic communication, rather than more traditional postal mail, is rapidly becoming a medium of choice for communicating information. The Internet, however, is a publicly accessible network, and is thus not secure. The Internet has been, and increasingly continues to be, a target of a wide variety of attacks from various individuals and organizations intent on eavesdropping, intercepting and/or otherwise compromising or even corrupting message traffic flowing on the Internet, or further illicitly penetrating sites connected to the Internet.
Encryption by itself provides no guarantee that an enciphered message cannot or has not been compromised during transmission or storage by a third party. Encryption does not assure integrity due to the fact that an encrypted message could be intercepted and changed, even though it may be, in any instance, practically impossible, to cryptanalyze. In this regard, the third party could intercept, or otherwise improperly access, a ciphertext message, then substitute a predefined illicit ciphertext block(s) which that party, or someone else acting in concert with that party, has specifically devised for a corresponding block(s) in the message. The intruding party could thereafter transmit the resulting message with the substituted ciphertext block(s) to the destination, all without the knowledge of the eventual recipient of the message.
The field of detecting altered communication is not confined to Internet messages. With the burgeoning use of stand-alone personal computers, individuals or businesses often store confidential information within the computer, with a desire to safeguard that information from illicit access and alteration by third-parties. Password controlled access, which is commonly used to restrict access to a given computer and/or a specific file stored thereon, provides a certain, but rather rudimentary, form of file protection. Once password protection is circumvented, a third party can access a stored file and then change it, with the owner of the file then being completely oblivious to any such change.
Therefore, a need exists for a cryptographic technique that not only provides an extremely high level of security against cryptanalysis, particularly given the sophistication and power of current and future processing technology, but which is also capable of detecting a change made to a ciphertext message. Such a technique could be applied to (but is not limited in its use) secure file storage or safeguarding messages transmitted over an insecure network.
Systems and methods which provide integrity checks based on a secret key are usually called message authentication codes (MACs). Typically, message authentication codes are used between two parties that share a secret key in order to authenticate information transmitted between these parties. An adversary should be unable (with significant probability) to produce any properly-authenticated message for any message which he or she has not yet seen. Typically, a party authenticates a message by appending to it the corresponding MAC. The receiving party then applies a verification procedure on the received message and its message authentication code to decide if the transmitted message is authentic. This may be accomplished by having the receiving party compute his or her own message authentication code and check to see whether the received and generated codes match.
Message authentication code methods are used widely in many applications to provide data integrity and data origin authentication. However, MACs provide weaker guarantees than digital signatures, as they can only be used in a symmetric setting, where the parties trust each other. In other words, MACs do not provide non-repudiation of origin. However, MACs are preferred over digital signatures because they are two to three orders of magnitude faster in implementation, and MAC results are four to sixteen bytes long compared to the forty to one hundred and twenty eight bytes for signatures.
In order to use a MAC, a sender and a receiver need to share a secret key k (a random bit string of n, bits with typical values for nk in the range of 56 to 128). In order to protect a message, the sender computes the MAC corresponding to the message, which is a bit-string of nmac bits, and appends this string to the message (typical values for nmac are between 32 and 64). The MAC is a complex function of every bit of the message and the key. On receipt of the message, the receiver recomputes the MAC and verifies that it corresponds to the transmitted MAC value.
With regard to the security of MAC algorithms, an opponent who tries to deceive the receiver still does not know the secret key. For this analysis, it is assumed that he knows the format of the messages, and the description of the MAC algorithm. His goal is to try to inject a fraudulent message and append a MAC value which will be accepted by the receiver. He can choose one of two attack strategies: a forgery attack or a key recovery attack. The forgery consists of predicting the value of MACk(m) for a message m without initial knowledge of the shared key k. If the adversary can do this for a single message, he is said to be capable of “existential forgery”. If the adversary is able to determine the MAC for a message of his choice, he is said to be capable of “selective forgery”. Practical attacks often require that a forgery is verifiable; i.e., that the forged MAC is known to be correct beforehand with a probability near one.
A key recovery attack consists of finding the key k from a number of message/MAC pairs. Such an attack is more powerful than forgery, since it allows for arbitrary selective forgeries. Ideally, any attack allowing key recovery requires approximately 2n
These attacks can be further classified according to the type of control an adversary has over the device computing the MAC value. In a chosen-text attack, an adversary may request and receive MACs corresponding to a number of messages of his choice, before completing his attack. For forgery, the forged MAC must be on a message different than any for which a MAC was previously obtained. In an adaptive chosen-text attack, requests may depend on the outcome of previous requests. It should be noted that in certain environments, such as in wholesale banking applications, a chosen message attack is not a very realistic assumption: if an opponent can choose a single text and obtain the corresponding MAC, he can already make a substantial profit. However, it is best to remain cautious and to require resistance against chosen text attacks.
In the following, various attacks on MACs are considered: brute force key searching, guessing of the MAC, a generic forgery attack, and attacks based on cryptanalysis. A brute force key search requires a few known message-MAC pairs (approximately nk/nmac, which is between one and four for most MAC algorithms). It is reasonable to assume that such a small number of message-MAC pairs is available. The opponent tries all the possible keys and checks whether they correspond to the given message-MAC pairs. Unlike the case of confidentiality protection, the opponent can only make use of the key if it is recovered within its active lifetime (which can be reasonably short). On the other hand, a single success during the lifetime of the system might be sufficient. This depends on a cost/benefit analysis; i.e., how much one loses as a consequence of a forgery. The only way to preclude a key search is to choose a sufficiently large key.
A second relatively simple attack is in the form of choosing an arbitrary fraudulent message, and appending a randomly chosen MAC value. Ideally, the probability that this MAC value is correct is equal to 1/2 nmac where nmac is the number of bits of the MAC value. This value should be multiplied with the expected profit corresponding to a fraudulent message, which results in the expected value of one trial. Repeated trials can increase this expected value, but in a good implementation, repeated MAC verification errors will result in a security alarm (i.e., the forgery is not verifiable). For most applications nmac is between 32 and 64, which is sufficient to make this attack uneconomical.
A generic forgery attack exploits the fact most MAC algorithms consist of the iteration of a simple compression function. The MAC input message m is padded to a multiple of the block size, and is then divided into t blocks denoted m1 through mt. The MAC involves a compression function f and an n-bit (n≧nmac) chaining variable Hi between stage i-1 and stage i, such that H0=IV; H=f(Hi-1, mi), where 1≦i≦t; and MACk(m)=g(Hi). Here, g denotes the output transformation. The secret key may be employed in the IV, in f, and/or in g. For an input pair (m, m′) with MACk(m)=g(Ht) and MACk(m′)=g(H′t), a collision is said to occur if MACk(m)=MACk(m′). This collision is termed an internal collision if Ht=H′t, and an external collision if Ht≠H′t but g(Ht)=g(H′t).
One form of general forgery attack applies to all iterated MACs. Its feasibility depends on the bit sizes n of the chaining variable and nmac of the MAC result, the nature of the output transformation g, and the number s of common trailing blocks of the known texts (s≧0). A simple way to preclude this attack is to append a sequence number at the beginning of every message and to make the MAC algorithm stateful. This means that the value of the sequence number is stored to ensure that each sequence number is used only once within the lifetime of the key. While this is not always practical, it has the additional advantage that it prevents replay attacks. To add more security against external collisions, the function g can include some form of additional randomization.
The above attacks assume that no shortcuts exist to break the MAC algorithm (either for forgery or for key recovery). Since most existing MAC algorithms are not based on mathematically known hard problems, it is now becoming increasingly important to have MAC methods that are based on mathematically known hard problems such as integer factorization and discrete logarithm problems.
There are, conventionally, three main approaches for MAC design that are based on: a hash function with a secret key; a block cipher with chaining (CBC-MAC); and a dedicated MAC. Compared to the number of block ciphers and hash functions, relatively few dedicated MAC algorithms have been proposed. The main reason for this is that security of dedicated MAC methods need to be evaluated from scratch in order to assess their robustness. On the other hand, the security of MAC methods that are based on well established primitives such as secure block ciphers or hash functions can be based on the security of these underlying primitives, and its security does not have to be assessed from scratch.
The availability of fast dedicated hash functions (such as MD4 and MD5, for example) has resulted in several proposals for MAC algorithms based on these functions. However, these hash functions are weaker than intended, thus they are currently being replaced by RIPEMD-160 and by SHA-1, even though these hash functions are not based on mathematically known hard problems.
One method of using hash functions for MAC is to use secret prefix and secret suffix methods such that MACk(m)=h(k∥m) and MACk(m)=h(m∥k). However, the first equation allows for extension attacks, and the second equation opens the possibility of off-line attacks.
Another method is the “secret envelope” method, which requires that MACk(m)=h(k1∥m∥k2) (for example, Internet RFC 1828). For this method, a security proof may be performed based on the assumption that the compression function of the hash function is pseudo-random. While this is an interesting result, it should be pointed out that the compression function of most hash functions has not been evaluated with respect to this property. Further, 2n/2 known texts does not allow for a forgery or a key recovery attack. Additionally, MDx-MAC extends the envelope method by also introducing secret key material into every iteration. This makes the pseudo-randomness assumption more plausible. Moreover, it precludes the key recovery attack by extending the keys to complete blocks. HMAC is yet another variant of this methodology, which uses a nested construction (also with padded keys), such that MACk(m)=h(k2∥h(m∥k1)).
HMAC is used for providing message authentication in the Internet Protocol. The security of HMAC is guaranteed if the hash function is collision resistant for a secret value Ho, and if the compression function itself is a secure MAC for one block (with the secret key in the Hi input and the message in the mi input). While these assumptions are weaker, it is believed that the latter assumption still requires further validation for existing hash functions. It is clear from the above that none of the current MACs based on hash functions are based on mathematically known relatively hard or difficult cryptographic problems.
Block ciphers are presently the most popular algorithms in use for providing data privacy. Block ciphers with a block size n and a key size k can be viewed as a family of permutations on the set of all n-bit strings, indexed by k-bit long encryption keys and possessing certain properties.
Some of the properties that are typically required of block ciphers are simplicity of construction and security. With regard to security, it is usually assumed that the underlying block cipher is secure and that the key size k is chosen so that an exhaustive key search is computationally infeasible. In practice, there are two issues to be considered with respect to security: (i) for a randomly chosen key k, it appears as a random permutation on the set of n-bit strings to any computationally bounded observer (i.e., one who does not have an unlimited amount of processing power available) who does not know k and who can only see encryption of a certain number of plaintexts x of their choice; and (ii) to achieve a so-called semantic security which is resistant to collision attacks such as birthday and meet-in-the-middle attacks. Such attacks have been proven to reduce an exhaustive key search significantly against block ciphers. In practice, most data units (including any typical file, database record, IP packet, or email message) which require encryption are greater in length than the block size of the chosen cipher. This will require the application of the block cipher function multiple times. The encryption of many plaintext blocks under the same key, or the encryption of plaintexts having identical parts under the same key may leak information about the corresponding plaintext. In certain situations, it is impossible to achieve semantic security. The goal then is to leak the minimum possible amount of information.
A further property is scalability. Obviously, no block cipher can be secure against a computationally unbounded attacker capable of running an exhaustive search for the unknown value of k. Furthermore, the development of faster machines will reduce the time it takes to perform an exhaustive key search. There is always a demand for more secure ciphers. It will be advantageous to develop a block cipher which is scalable so that an increase in security can be achieved by simply changing the length of the key rather than changing the block cipher algorithm itself.
Another property is efficiency. It is obvious that block ciphers are made computationally efficient to encrypt and decrypt to meet the high data rates demands of current applications such as in multimedia. Furthermore, since speed of execution is also important, it is advantageous to have block cipher that can be implemented in parallel. Of further interest is random access. Some modes allow encrypting and decrypting of any given block of the data in an arbitrary message without processing any other portions of the message.
Keying material is also an important factor in block ciphers. Some modes require two independent block cipher keys, which leads to additional key generation operations, a need for extra storage space or extra bits in communication. Additionally, of interest, are counter/IV/nonce requirements. Almost all modes make use of certain additional values together with block cipher key(s). In certain cases, such values must be generated at random or may not be reused with the same block cipher key to achieve the required security goals. Further, pre-processing capability is another important factor in block ciphers
The Data Encryption Standard (DES) is a public standard and is presently the most popular and extensively used system of block encryption. DES was adopted as a federal government standard in the United States in 1977 for the encryption of unclassified information. The rapid developments in computing technology in recent years, in particular the ability to process vast amounts of data at high speed, meant that DES could not withstand the application of brute force in terms of computing power. In the late 1990's, specialized “DES cracker” machines were built that could recover a DES key after a few hours by trying possible key values. As a result, after 21 years of application, the use of DES was discontinued by the United States in 1998.
A new data encryption standard called Advanced Encryption Standard (AES) was launched in 2001 in the United States, and it was officially approved with effect from May 26, 2002. However, AES has no theoretical or technical innovation over its predecessor, DES. The basic concept remains the same and, essentially, all that has changed is that the block size n has been doubled. The AES standard specifies a block size of 128 bits and key sizes of 128, 192 or 256 bits. Although the number of 128-bit key values under AES is about 1021 times greater than the number of 56-bit DES keys, future advances in computer technology may be expected to compromise the new standard in due course. Moreover, the increase in block size may be inconvenient to implement.
Furthermore, AES is not based on known computationally difficult problems, such as performing factorization or solving a discrete logarithm problem. It is known that encryption methods that are based on known cryptographic problems are usually stronger than those that are not based on such problems. Also, AES provides a limited degree of varying security, 128-bits, 192-bits and 256-bits; i.e., it not truly scalable. It should noted that to have a cipher with a higher degree of security, the cipher would probably need a completely new algorithm which will make the hardware for AES redundant. As a clear example, the hardware for DES cannot be used efficiently for AES. Also, the hardware of the 192-bits AES cipher is not completely compatible with the hardware of the other two ciphers 128-bits and 256-bits.
There are many ways of encrypting data stream that are longer than a block size, where each is referred to as a “mode of operation”. Two of the standardized modes of operation employing DES are Electronic Code Book (ECB), and Cipher Block Chaining (CBC). It should be noted that the security of a particular mode should in principle be equivalent to the security of the underlying cipher. For this, we need to show that a successful attack on the mode of operation gives us almost an equally successful attack on the underlying cipher.
With regard to the ECB mode, in order to encrypt a message of arbitrary length, the message is split into consecutive n-bit blocks, and each block is encrypted separately. Encryption in ECB mode maps identical blocks in plaintext to identical blocks in ciphertext, which obviously leaks some information about plaintext. Even worse, if a message contains significant redundancy and is sufficiently long, the attacker may get a chance to run statistical analysis on the ciphertext and recover some portions of the plaintext. Thus, in some cases, security provided by ECB is unacceptably weak. ECB may be a good choice if all is need is protection of very short pieces of data or nearly random data. A typical use case for ECB is the protection of randomly generated keys and other security parameters.
With regard to CBC mode, in this mode the exclusive—or (XOR) operation is applied to each plaintext block and the previous ciphertext block, and the result is then encrypted. An n-bit initialization vector IV is used to encrypt the very first block. Unlike ECB, CBC hides patterns in plaintext. In fact, it can be proved that there is a reduction of security of CBC mode to security of the underlying cipher provided that IV is chosen at random. The computational overhead of CBC is just a single XOR operation per block encryption/decryption, so its efficiency is relatively good. Further, CBC provides random read access to encrypted data; i.e., to decrypt the i-th block, we do not need to process any other blocks. However, any change to the i-th message block would require re-encryption of all blocks with indexes greater than i. Thus, CBC does not support random write access to encrypted data.
The most serious drawback of CBC is that it has some inherent theoretical problems. For example, if Mi denotes the i-th plaintext block and Ci denotes the i-th ciphertext block, if one observes in a ciphertext that Ci=Cj, it immediately follows that Mi XOR Mj=Ci-1 XOR Cj-1, where the right-hand side of the equation is known. This is called the “birthday” or matching ciphertext attack. Of course, if the underlying cipher is good in the sense of pseudorandom permutation, and its block size is sufficiently large, the probability of encountering two identical blocks in ciphertext is very low.
Another example of its security weakness is its use of XOR-based encryption. A further drawback of CBC is that its randomization must be synchronized between the sending and the receiving correspondent. CBC uses an initialization vector that must be generated at random. This initialization vector must be synchronized between the sending and receiving correspondent for correct decryption.
From the above, it is clear that the security of encrypting a sequence of message blocks using a block cipher depends on two aspects: the security of the underlying block cipher; and the effectiveness of the randomization used in reducing collision attacks when encrypting a sequence of blocks.
With regard to the security of the underlying block cipher, it is known that encryption methods that are based on computationally hard problems, such as performing factorization or solving a discrete logarithm problem, are usually stronger than those that are not based on such problems. Integer factorization can be formulated as follows: For an integer n that is the product of two primes p and q, the problem is to find the values of p and q given n only. The problem becomes harder for larger primes. The discrete logarithm problem can be formulated as follows: Given a value g and a value y whose value is equal to gk defined over a group, find the value of k. The problem becomes harder for larger groups. Although the applications of integer factorization and discrete logarithm problems in designing block ciphers is known, the resulting ciphers are computationally more demanding than those currently used, such as AES.
With regard to the effectiveness of randomization and semantic security, the one time pad is the only unconditionally semantically secure cipher presently in use. With the one time pad, the sequence of keys does not repeat itself. In other words, it is said to have an infinite cycle. However, since the sending and the receiving correspondents have to generate the same random sequence, the one time pad is impractical because of the long sequence of the non-repeating key. As a consequence, the keys to encrypt and decrypt in all private-key systems, including block ciphers, remain unchanged for every message block, or they are easily derived from each other by inference using identical random number generators at the sending and receiving correspondent. Furthermore, these generators must be initialized to the same starting point at both correspondents to ensure correct encryption and decryption. This is true of all the existing block ciphers, including the RNS encryption and decryption method discussed above.
Many methods have been proposed to construct a pseudo-random number generator or adaptive mechanisms for pseudo-random generation of permutations. Such methods include those based on tables that are used to increase randomization. However, no matter how good the randomization property of the underlying generator, it always has a finite number of states and, hence, the numbers generated by existing generators have a finite cycle where a particular sequence is repeated one cycle after other. Therefore, such block ciphers are vulnerable to collision attacks. Thus, the security of such block ciphers is very much dependant on the randomness of the random number generator. The RNS encryption and decryption method described above is not an exception. As a consequence, one can conclude that semantic insecurity is inherent in all existing block ciphers, but with varying degrees.
It the following, existing ciphers where both the sending and the receiving correspondents have to generate the same random sequence will be referred to as synchronized-randomization ciphers. Synchronized-randomization is achieved under the control of a key or some form of an initialization mechanism. Starting from this initial value, the subsequent keys are easily obtained by some form of a random number generator. Therefore, synchronized-randomization between encryption and decryption is guaranteed as long as identical random number generators are used by both correspondents and as long as the generators at both correspondents are synchronized to start from the same initial state. Thus, no unilateral change in the randomization method is allowed in synchronized-randomization.
With regard to MACs based on block ciphers, the most popular presently used MAC algorithm is the CBC-MAC; it has been adopted by many standardization committees including ANSI and ISO/IEC. It is widely used with DES as the underlying block cipher. CBC-MAC is an iterated MAC, with the following compression function: Hi=Ek(Hi-1⊕mi), where 1≦i≦t. Here, Ek(x) denotes the encryption of x using the nk bit key k with an n-bit block cipher E and Ho=0. The MAC is then computed as MACk(m)=g(Ht), where g is the output transformation.
A widely used alternative is to replace the processing of the last block with a two-key triple encryption (with keys k1=k and k2); this is commonly known as the ANSI retail MAC: g(Ht)=Ek
An alternative to CBC-MAC is RIPE-MAC, which adds a feedforward: Hi=Ek(Hi-1⊕mi)⊕mi, where 1≦i≦t. This has the advantage that the round function is harder to invert (even for someone who knows the secret key). An output transformation is needed as well. XOR-MAC is another scheme based on a block cipher. It is a randomized algorithm and its security can again be reduced to that of the block cipher. It has the advantage that it is parallellizable and that small modifications to the message (and to the MAC) can be made at very low cost. The use of random bits helps to improve security, but it has a cost in practical implementations. Further, performance is typically 25% to 50% slower than CBC-MAC.
As noted above, the strength of MAC based on block cipher is dependant on the security of the underlying block cipher. Further, none of the current block ciphers are based on a known cryptographically hard problem.
Block ciphers may also be based on the Residue Number System (RNS). In RNS, the vector {p1, p2, . . . , pL} forms a set of moduli, termed the RNS “basis β”, where, the moduli {p1, p2, . . . , pL} are relatively prime with respect to each other. P is the product
and defines the dynamic range of the system. The vector {m1, m2, . . . , mL} is the RNS representation of an integer M, which is less than P, where mt=<M>pt=M mod pt. Any integer M belonging to the set {0, . . . , P−1} has a unique representation in the basis β.
The operations of addition, subtraction, and multiplication are defined over the set {0, . . . , P−1} as: C±D=(<ci±d1>p1, . . . <cL±dL>pL) and C×D=(<c1×d1<p1, . . . <cL×dL>pL). These equations illustrate the parallel carry-free nature of RNS arithmetic.
The reconstruction of M from its residues {m1, m2, . . . , mL} is based on the Chinese Remainder Theorem (CRT):
where
The vector {m1′, m2′, . . . , mL′}, where 0≦ml′≦pl, is the Mixed Radix System (MRS) representation of an integer M less than P, such that,
With regard to equations (1)-(4), a change in any one of the residue values ml can have an effect on the whole number, M.
RNS encryption and decryption is known. One such method is taught by U.S. Pat. No. 5,077,793, which is herein incorporated by reference. In U.S. Pat. No. 5,077,793, the sending and receiving correspondents perform the following steps:
agreeing on a set of elements {p1, p2, . . . , pL} which are relatively prime with respect to each other and that are used as the RNS basis β;
agreeing on a set of L random number generators modulo pl, i=1, . . . , L; and
agreeing on a shared key which is used for the synchronized initialization of the random number generators by both the sending and receiving correspondents so that they can start generating random numbers from the same starting point.
The sending correspondent then performs the following steps:
converting the integer value Mi of the bit-string of the i-th message block into RNS representation {mi,1, mi,2, . . . , mi,L} using the basis β;
using a set of L random number generators modulo pi=1, . . . , L and the shared key to generate the i-th set of random numbers ri,l, l=1, . . . , L;
performing the addition ci,l=mi,l+ri,l for l=1, . . . , L;
converting the integer vector {ci,1, ci,2, . . . , ci,l} into an integer value Ci using the CRT and the basis β; and
sending the integer value Ci to the receiving correspondent.
The receiving correspondent performs the following steps:
converting the integer value Ci of a message string into the RNS representation {ci,1, ci,2, . . . , ci,l} using the basis β;
using a set of L random number generators modulo pi, i=1, . . . , L and the shared key to generate random numbers ri,l, l=1, . . . , L;
performing the subtraction mi,l=ci,l−ri,l for l=1, . . . , L;
converting the integer vector {mi,1, mi,2, . . . , mi,L} into an integer value Mi using the CRT and the basis β; and
recovering the bit string of the i-th message block from the integer value Mi.
In the above, elements of the RNS basis can be changed for successive blocks, but the key must indicate the set of elements used. The method also to uses a mixed radix representation to both speed up conversion and to introduce further complexity to the basic RNS encryption method.
The above method is designed to achieve a more efficient random number generation and exploit the advantage of performing modulo addition in the RNS domain. However, the latter advantage is lost due to the use of the CRT to convert from RNS to binary representation, which is performed at the sending correspondent and the receiving correspondent before the transmission of the ciphertext. It is well known that RNS representation is effective when many of the computations of the method can be performed in the RNS domain, since the size of this RNS computation is far greater than the overhead of the CRT computation. In the RNS encryption and decryption method discussed above, only one addition is performed in the RNS domain. As a result, the computational advantage of using RNS is lost both at the sending correspondent and the receiving correspondent.
Further, performing addition in the RNS domain in the above method is equivalent to performing Ci=Mi+Ri mod P, where R is a random integer number obtained from the integer vector {r1, r2, . . . , rL} using the CRT, and
Therefore, patterns in the ciphertext values Cl can be studied in the binary domain without the need to convert these values to the RNS domain. Thus, a cryptanalyst can attack the above RNS encryption in the binary domain rather in the RNS domain. This implies that neither knowledge of the value P nor knowledge of the elements {p1, p2, . . . , pL} are needed to perform cryptanalysis of the above method, which can be just as easily performed in the binary domain.
There is, in fact, an advantage in performing the cryptanalysis in the binary domain. In the above, random number generation is performed in the RNS domain rather than in the binary domain. The reason for this is that random numbers generated in the binary domain with uniform distribution do not guarantee uniform distribution of the composite modulo numbers in the RNS domain. However, in a reciprocal analogy, generating random numbers in the RNS domain with uniform distributions does not guarantee a uniform distribution in the binary domain. This makes randomization generated in the RNS domain easier to attack in the binary domain. Therefore, it is more advantageous for an attacker to perform the cryptanalysis of the above RNS method in the binary domain rather than in the RNS domain, thus bypassing the use of the RNS representation altogether. This illustrates a weakness in generating random numbers in the RNS domain. The usage of the RNS representation in the above makes the ciphertext more vulnerable to collision attacks and easier to attack in the binary domain. It should be noted that performing the attack in the binary domain implies that neither knowledge of the value P nor knowledge of the elements {p1, p2, . . . , pL} are required.
In effect, the security of the above protocol is only dependant upon protecting the shared secret key which is used for identifying the starting point of the random number generators at both the sending and receiving correspondents. Since the above cipher is equivalent to a simple modulo addition of a plaintext with a random number, breaking the shared key is relatively not difficult using exhaustive search attacks with present computers.
Thus, a message authentication code with blind factorization and randomization solving the aforementioned problems is desired.
The message authentication code with blind factorization and randomization is a computational method for improving the security of existing Message Authentication Code (MAC) methods through the use of blind integer factorization. Further, blind randomization is used as a countermeasure to minimize collision attacks where different plaintexts produce the same MAC.
A method of generating the message authentication code includes the steps of:
a) a pair of sending and receiving correspondents agreeing upon on a set of elements {p1, p2, . . . , pL} which are relatively prime with respect to one another, which are further considered as a shared secret and which are further used to form a basis β={p1, p2, . . . , pL} the sending and receiving correspondents further agreeing upon an integer glεZp
the sending correspondent then performing the following steps:
b) initializing an integer i as i=0, the following steps c) to d) then being repeated until i>u:
c) generating L integer values kl, where l=1, . . . , L, such that 0≦kl<pl from the lth block of the message bit string using a data embedding method;
d) computing the message authentication code elements ci,l modulo pl as ci,l=fl(ci-1,l, gl, ki,l) for l=1, . . . , L, wherein fl(,) represents a modulo p function;
e) combining the integer values cu,l for l=1, . . . , L to form a single integer vector {cu,1, cu,2, . . . cu,L};
f) converting the integer vector {cu,1, cu,2a, . . . , cu,L} into an integer value C using the basis β and the Chinese Remainder Theorem, the integer value C being the message authentication code value,
g) appending the bit string of the message authentication code integer value, C, to the message bit string and sending the concatenated bit string to the receiving correspondent;
the receiving correspondent then performs the following steps:
h) obtaining the message authentication code integer value C from the received message bit string;
i) computing the message authentication code elements Cu,l modulo pl as cu,l=C mod pl for l=1, . . . , L;
j) initializing the integer i as i=0, then repeating the following steps k) to l) until i>u:
k) generating L integer values kl, where l=1, . . . , L such that 0≦kl<pl from the lth block of the received message bit string using the data embedding method;
l) computing the message authentication code elements rci,l modulo pl as rci,l=fl(rci-1,l, gl, ki,l) for l=1, . . . , L; and
m) if cu,l=rci,l for all l, then the received message is authenticated.
These and other features of the present invention will become readily apparent upon further review of the following specification and drawings.
The message authentication code with blind factorization and randomization is a computational method for improving the security of existing Message Authentication Code (MAC) methods through the use of blind integer factorization. Further, blind randomization is used as a countermeasure to minimize collision attacks where different plaintexts produce the same MAC.
Blind integer factorization is performed using the following: For an unknown integer P with a known upper bound, the prime numbers pl, l=1, . . . , L, are found such that
Blind integer factorization is, essentially, the factorization of an unknown integer into its prime factors using only knowledge about the upper bound of the integer. In contrast, conventional integer factorization is performed as follows: given a known integer P, the prime numbers pl, l=1, . . . , L, are found such that
Blind integer factorization is a computationally more difficult problem than the conventional “known integer” factorization problem since, in blind integer factorization, the integer P to be factorized is not known.
Thus, blind integer factorization problem is a more general problem than factoring a known integer P into its prime factors. It should be noted that only the upper bound of the integer P could be known to an attacker, but the actual value of P remains unknown.
In the following, MAC methods based on blind integer factorization are performed using the Residue Number Representation (RNR) and the Chinese Remainder Theorem (CRT). In the following, cryptanalysis is forced to work in the RNS domain. This is achieved by generating the MAC elements in the RNS domain first and then converting the MAC to binary representation using the Chinese Remainder Theorem. As a result, an attacker is forced to perform the analysis in the RNS domain. Thus, an attacker has to perform blind integer factorization in order to perform cryptanalysis of the disclosed MAC methods.
Blind randomization is the ability to randomize the MAC of plaintext without the need for the receiving correspondent to know the randomization mechanism of the sending correspondent. The basic principle used is allowing the sending correspondent to unilaterally change the randomization mechanism without affecting the ability of the receiving correspondent to authenticate the plaintext. In other words, there is no need for synchronized randomization between the sending and the receiving correspondent. It should be noted that blind randomization is used here to randomize the MAC in order to minimize collision attacks where different plaintexts generate the same MAC.
Blind randomization is used either to randomize the RNS basis used to generate the MAC; to use additional elements in the RNS basis for randomizing the MAC; or as a combination of both. Blind randomization allows the sending correspondent to truly randomize the MAC by being able to change the random number generation mechanism without concern for any synchronization with the receiving correspondent. The receiving correspondent does not need to be informed of how the randomization was achieved at the sending correspondent.
This ability to authenticate plaintext at the receiving entity independently of the randomization used at the sending correspondent is a significant countermeasure against collision attacks. This is because the sending correspondent can change the way randomization is injected into the ciphertext without the need to synchronize with the receiving correspondent.
As will be described in greater detail below, in one embodiment, the sending correspondent can unilaterally select different elements for the RNS basis from an agreed upon set of relatively prime numbers. The sending codependent needs only to send the code which identifies which sub-set of elements is being used. This will allow the MAC of the same message to be calculated using a different basis, thus reducing the probability of collision.
In a further embodiment, not all the modulo integers that correspond to the elements of the RNS basis are used to represent the message data strings. Some of these modulo integers are used to randomize the corresponding MAC. Further, each random modulo number can be generated using a different random number generator, thus increasing the degree of randomization. Additionally, the sending correspondent can vary the elements of the RNS basis used to randomize the ciphertext without the need to send any further information about them to the receiving correspondent.
Additionally, as will be described in greater detail below, the present methodology improves the efficiency of RNS-based MAC methods by requiring the Chinese Remainder Theorem to be performed at the sending correspondent only. This has the advantage of reducing the amount of computations needed at the receiving correspondent, which could be a device that has limited power resources, such as a wireless or mobile terminal. Further, it should be noted that the below MAC methods are scalable. Scalability is achieved by using larger prime numbers or using more elements in the RNS basis.
In the below, the symbol E denotes set membership, “gcd” denotes the greatest common divisor, and Zp is used to denote the set {0, . . . , p−1}. Further, in the following it is assumed that the maximum block size that can be embedded into the L residue values {k1, k2, . . . , kL} is N, and that the message data bit string length is a multiple of N, such as (u+1)N. In other words, the number of N-bit blocks in a message bit string is (u+1).
A first embodiment of the RNS-based encryption method is as follows:
a) the sending and receiving correspondents agree upon on a set of elements {p1, p2, . . . , pL} that are relatively prime with respect to one another, which are further considered as a shared secret and which are further used to form a basis β={p1, p2, . . . , pL} the sending and receiving correspondents further agree upon an integer glεZp
The sending correspondent performs the following steps:
b) initializing an integer i as i=0, the following steps c) to d) are repeated until i>u:
c) generating from an integer value Mi of a bit string of the i-th block of the message L integer values ki,l, wherein l=1, . . . , L, such that 0≦k0,l<pl as ki,l=Mi mod pl for l=1, . . . , L;
d) computing the message authentication code elements ci,l modulo pl as ci,l=fl(ci-1,l, gl, ki,l) for l=1, . . . , L, wherein fl(,) represents a modulo p function;
e) combining the integer values cu,l for l=1, . . . , L to form a single integer vector {cu,1, cu,2, . . . , cu,L};
f) converting the integer vector {cu,1, cu,2, . . . , cu,L} into an integer value C using the basis β and the Chinese Remainder Theorem, the integer value C being the message authentication code value;
g) appending the message authentication code integer value, C, to the message bit string and sending the concatenated bit string to the receiving correspondent.
The receiver correspondent then performs the following steps:
h) obtaining the message authentication code integer value C from the received message bit string;
i) computing the message authentication code elements cu,l modulo pl as cu,l=C mod pl for l=1, . . . , L;
j) initializing the integer i as i=0, then repeating the following steps k) to l) until i>u:
k) generating from the integer value Mri of the bit string of the i-th block of the received message L integer values ki,l, where l=1, . . . , L, such that 0≦k0,l<pl as ki,l=Mri mod pl for l=1, . . . , L;
l) computing the message authentication code elements rci,l modulo pl as rci,l=fl(rci-1,l, gl, ki,l) for l=1, . . . , L; and
m) if cu,l=rci,l for all l, then the received message is authenticated.
It should be noted that in the above, the receiver does not need to compute the CRT, which is advantageous for low power devices. Further, as an alternative, the method may be varied by replacing the step of generating the L integer values with the data embedding method given below.
A further alternative embodiment utilizes blind randomization. Blind randomization is the ability to randomize the MAC of a plaintext without the need for the receiving correspondent to know the randomization mechanism of the sending correspondent. The basic principle used is to allow the sending correspondent to unilaterally change the randomization mechanism without affecting the ability of the receiving correspondent to authenticate the plaintext. In other words, there is no need for synchronized randomization between the sending and the receiving correspondent. It should be noted that blind randomization is used here to randomize the MAC in order to minimize collision attacks where different plaintexts generate the same MAC.
In the below, randomization is achieved by allowing the sending correspondent to unilaterally select the RNS basis to be used to generate the MAC from an agreed upon set of possible relatively prime numbers. The method includes the following steps:
a) a pair of sending and receiving correspondents agreeing upon on a set of elements {p1, p2, . . . , pL} which are relatively prime with respect to one another and which are further considered as a shared secret, the sending and receiving correspondents further agreeing upon an integer glεZp
the sending correspondent then performing the following steps:
b) selecting a code at random from the list of agreed upon to select (L−1) elements {p1, p2, . . . , pL-1} from the elements {p1, p2, . . . , pL} to form the basis βi={p1, p2, . . . , pL-1, pL};
c) initializing an integer i as i=0, the following steps d) to e) then being repeated until i>u:
d) generating (L−1) integer values ki,l, where l=1, . . . , (L−1) such that 0≦ki,l<pl, from the i-th block of the message bit string using a data embedding method, as described below;
e) computing the message authentication code elements ci,l modulo pl as ci,l=fl(ci-1,l, gl, ki,l) for l=1, . . . , (L−1), where fl(,) represents a modulo p function;
f) embedding the bit strings of the code used by the sending correspondent to select the (L−1) elements {p1, p2, . . . , pL-1} into the integer value cu,L, such that 0≦cu,L<pL, using the data embedding method, as described below;
g) combining the integer values cu,l for l=1, . . . , L to form a single integer vector {cu,1, cu,2, . . . , cu,L};
h) converting the integer vector {cu,1, cu,2, . . . , cu,L} into an integer value C using the basis β and the Chinese Remainder Theorem, the integer value C being the message authentication code value;
i) appending the message authentication code integer value, C, to the message bit string and sending the concatenated bit string to the receiving correspondent;
the receiving correspondent then performs the following steps:
j) obtaining the message authentication code integer value C from the received message bit string;
k) computing a message authentication code element cu,L, as the value cu,L=C mod pL;
l) recovering the code used to define the set {p1, p2, . . . , pL-1} from the value cu,L;
m) computing the message authentication code elements cu,l modulo pl as cu,l=C mod pl for l=1, . . . , (L−1);
n) initializing the integer i as i=0, then repeating the following steps o) to p) until i>u:
o) generating (L−1) integer values ki,l, where l=1, . . . , (L−1), such that 0≦ki,l<pl, from the i-th block of the message bit string using the data embedding method described below;
p) computing the message authentication code elements rci,l modulo pl as rci,l=fl(rci-1,l, gl, ki,l) for l=1, . . . , (L−1) and
q) if cu,l=rci,l for l=1, . . . , (L−1), then the received message is authenticated.
In the following alternative embodiment, randomization is achieved by allowing the sending correspondent to unilaterally select extra relatively prime elements which are used in addition to the RNS elements that are used to generate the MAC. The extra relatively prime elements are used to randomize the MAC. The method includes the following steps:
a) the sending and receiving correspondents agree upon on a set of elements {p1, p2, . . . , pL} that are relatively prime with respect to one another, which are further considered as a shared secret and which are further used to form a basis βs={p1, p2, . . . pL}, the sending and receiving correspondents further agree upon an integer glεZp
The sending correspondent performs the following steps:
b) selecting elements {q1′, q2′, . . . qJ′,} wherein j′ is an integer and j′>0, the elements being relatively prime with respect to one another and further being relatively prime with respect to the secret elements {p1, p2, . . . , pL}, the elements {q1′, q2′, . . . , qJ′,} being known to the sending correspondent only;
c) selecting a random sub-set of elements {qi,1, qi,2, . . . qi,l} from the set {q1′, q2′, . . . , qJ′,} wherein the number of bits needed to represent
is within the upper limit agreed upon by the sending and receiving correspondents;
d) forming the basis β={p1, p2, . . . , pL, q1, q2, . . . , qJ} using the elements {p1, p2, . . . , pL-1} and the elements {q1, q2, . . . , qJ};
e) initializing an integer i as i=0, then repeating the following steps f) to g) until i>u:
f) generating L integer values kl, where l=1, . . . , L such that 0≦kl<pl from the lth block of the received message bit string using a data embedding method;
g) computing the message authentication code elements ci,l modulo pl as ci,l=fl(cI-1,l, gl, ki,l) for l=1, . . . , L, wherein fl(,) represents a modulo p function;
h) generating j random values rj, where j=1, . . . , J, such that 0≦rj<qj for j=1, . . . , J;
i) combining the integer values cu,l for l=1, . . . , L and the values rj for j=1, . . . , J to form a single integer vector {cu,1, cu,2, . . . , cu,L, r1, r2, . . . , rJ};
j) converting the integer vector {cu,1, cu,2, . . . , cu,L, r1, r2, . . . , rJ} into an integer value C using the basis β and the Chinese Remainder Theorem, the integer value C being the message authentication code value;
k) appending the message authentication code integer value, C, to the message bit string and sending the concatenated bit string to the receiving correspondent.
The receiver correspondent then performs the following steps:
l) obtaining the message authentication code integer value C from the received message bit string;
m) computing the message authentication code elements cu,l modulo pl as cu,l=C mod pl for l=1, . . . , L;
n) initializing the integer i as i=0, then repeating the following steps o) to p) until i>u:
o) generating L integer values kl, where l=1, . . . , L such that 0≦kl<pl from the lth block of the received message bit string using the data embedding method;
p) computing the message authentication code elements rci,l modulo pl as rci,l=fl(rcI-1,l, gl, ki,l) for l=1, . . . , L; and
q) if cu,l=rci,l for all l, then the received message is authenticated.
It should be noted that the receiver does need to compute the CRT, which is advantageous for low power devices. Further, as an alternative, the method may be varied by replacing the step of generating the L integer values with the data embedding method given below.
A further alternative method includes the following steps:
The method includes the following steps:
a) the sending and receiving correspondents agree upon on a set of elements {p1′, p2′, . . . , pL′,} that are relatively prime with respect to one another, which are further considered as a shared secret and which are to further used to form a basis βs={p1, p2, . . . , pL}, the sending and receiving correspondents further agree upon an integer glεZp
The sending correspondent then performs the following steps:
b) selecting a code at random from the list of codes agreed upon to select (L−1) elements {p1, p2, . . . , pL-1} from the elements {p1′, p2′, . . . , pL′,} to form the basis βl={p1, p2, . . . , pL-1, pL};
c) selecting elements {q1′, q2′, . . . , qJ′,} wherein J′ is an integer and J′>0, the elements being relatively prime with respect to one another and further being relatively prime with respect to the secret elements {p1, p2, . . . , pL}, the elements {q1′, q2′, . . . , qJ′,} being known to the sending correspondent only;
d) selecting a random sub-set of elements {qi,1, qi,2, . . . , qi,l} from the set {q1′, q2′, . . . , qJ′,}, wherein the number of bits needed to represent
is within the upper limit agreed upon by the sending and receiving correspondents;
e) forming the basis β={p1, p2, . . . , pL, q1, q2, . . . , qJ} using the elements {p1, p2, . . . , pL-1} and the elements {q1, q2, . . . , qJ};
f) initializing an integer i as i=0, then repeating the following steps g) to h) until i>u:
g) embedding message bit strings of the code used by the sending correspondent to select the (L−1) elements {p1, p2, . . . , pL-1} into the integer value cu,L, such that 0≦cu,L<pL, using the data embedding method described below;
h) generating random values rj, where j=1, . . . , J, such that 0 for j=1,
i) combining the integer values cu,l for l=1, . . . , L and the values for j=1, . . . , J to form a single integer vector {cu,1, cu,2, . . . , cu,L, r1, r2, . . . , rJ};
j) converting the integer vector {cu,1, cu,2, . . . , cu,L, r1, r2, . . . , rJ} into an integer value C using the basis β and the Chinese Remainder Theorem, the integer value C being the message authentication code value;
k) appending the message authentication code integer value, C, to the message bit string and sending the concatenated bit string to the receiving correspondent.
The receiver correspondent then performs the following steps:
l) obtaining the message authentication code integer value C from the received message bit string;
m) computing the message authentication code elements cu,l, modulo pl as cu,l=C mod pl for l=1, . . . , L;
n) recovering the code used to define the set {p1, p2, . . . , pL-1} from the value cu,L;
o) computing the message authentication code elements cu,l modulo pl as cu,−1=C mod pl for l=1, . . . , (L−1);
p) initializing the integer i as i=0, then repeating the following steps q) to r) until i>u:
q) generating (L−1) integer values ki,l, for l=1, . . . , (L−1), such that 0≦ki,l<pl, from the i-th block of the message bit string using the data embedding method given below;
r) computing the message authentication code elements rci,l modulo pl as rci,l=fl(rcI-1,l, gl, ki,l) for l=1, . . . , L; and
s) if cu,l=rci,l) for all l, then the received message is authenticated.
The above methods can also be applied to find the message authentication code for bit streams of media data, such as text, audio, video, or multimedia data. The application to media data involves using a pre-processing stage which is used to compress the media data prior to the application of the MAC method. Either a lossless compression method or a lossy compression method can be used to compress the media data in the pre-processing stage. The bit string of the compressed message at the output of the pre-processing stage is then used as the input to the above MAC generations methods.
As an alternative to using a compression method as a preprocessing stage, a feature selection method can be used to generate a message bit string that captures the main features that are present in the media data. For example, for images, the main features would be those that carry more perceptually important features, such as edges. As with the above, the output of the preprocessing stage would be a message bit string which represents information about the perceptually important features. This bit string is then used as the input to the MAC generation methods given above.
In the above methods, there are many possible functions ci,l=fl(cI-1,l, gl, ki,l) which may be used to compute the MAC elements ci,l, modulo pl for l=1, . . . , L. Some possible functions are given below:
1) ci,l=cI-1,lglk
2) ci,l=g·cI-1,lk
3) ci,l=cI-1,l⊕gl⊕ki,l.
Further, there are many possible methods for embedding message data bits into an integer m modulo p. One such date embedding method is described below and includes the steps of:
a) defining Nl as the number of bits needed to represent the value of the prime number pl for l=1, . . . , L;
b) defining Nml as the message bit strings that are embedded into the residue number kl mod pl with the condition Nml<Nl;
c) setting a limit on the length of the bit string of the message data block, N, to be
d) repeating the steps e) to g) for l=1, . . . , L;
e) reading the next Nml bits of the message bit string;
f) using the Nml bits of the message data string as the Nml least significant bits of the integer kl mod pl; and
g) setting the remaining (Nl−Nml) bits of the integer kl mod pl at random.
At the receiving correspondent, the bits of the message data block can be easily recovered from the residue value m by taking the Nd least significant bits.
In order to embed the message data bit string into L integers ml modulo pl, for l=1, . . . , L, it is assumed that an integer ml modulo pl can be represented using Nl-bits. It is further assumed that the message data bit string has a length of Ns-bits. The limit on the number of bits Ns of a massage data string when using the embedding method described above is
The method is then performed as follows:
a) ordering the elements {p1, p2, . . . , pL} in numerically decreasing order;
b) repeating steps c) and d) for l=1, . . . , L:
c) reading the next (Nl−1) bits of the message data string; and d) embedding the (Nl−1) bits of the message data string into the integer ml modulo p.
The pre-processing stage used prior to embedding a message data bit string into L modulo integers mentioned above is described in the following. First, it is assumed that the message data bit string consists of Ns bits. [dN
The bits of the new vector [dhN
At the receiving correspondent, the vector [dhN
The advantage of using the pre-processing stage is to ensure that the message data block can be recovered only if all the bits of the vector [dhN
is not sufficient to break the RNS protocols described above.
In the methods described above, the elements of an RNS basis are selected from a predefined set of prime or relatively prime numbers. Assuming {z1′, z2′, . . . , zL′,} represents a predefined set of prime numbers, then one method of using a code to identify the selected elements used in an RNS basis is to use a code with L′ bits. Assuming that the set {z1′, z2′, . . . , zL′,} is ordered in a decreasing value, then there are 2L′ possible sub-sets of the set {z1′, z2′, . . . , zL′,}. Thus, there are 2L′ possible RNS bases to choose from. The code to identify which sub-set is used is constructed as follows: If the l-th bit of the L′ bit code is set to one, this implies that element zl′ is used. If the l-th bit of the L′ bit code is set to zero, this implies that element zl′ is not used.
In order to ensure random selection, the L′ bit code needs to be generated by a random number generator. Any suitable binary random number generator modulo 2L′ can be utilized.
As noted above, mathematically “difficult” or “hard” problems are used in the design of cryptographic methods. Such problems include integer factorization and the discrete logarithm problem. The integer factorization problem can be stated as follows: given an integer n, find the prime numbers {p1, p2, . . . , pL} such that
The security strength of the above methods is based on hiding the RNS basis β. The only information an attacker requires about the basis is the number of bits used to represent the ciphertext, NC. The strength of the above methods thus depends on solving the following problem: Given the maximum number of bits used to represent an integer P, NC, find the integer P and the set {p1, p2, . . . , pL} such that
and P<2N
Therefore, the security of the above methods are dependent on a problem which is computationally harder than conventional factorization, since the integer value
is not known. This more difficult problem is what is referred to as “blind factorization” in the above.
It will be understood that the MACs based on blind randomization and factorization described above may be implemented by software stored on a medium readable by a computer and executing as set of instructions on a processor (including a microprocessor, microcontroller, or the like) when loaded into main memory in order to carry out a cryptographic system of secure communications in a computer network. As used herein, a medium readable by a computer includes any form of magnetic, optical, mechanical, laser, or other media readable by a computer, including floppy disks, hard disks, compact disks (CDs), digital versatile disk (DVD), laser disk, magnetic tape, paper tape, punch cards, flash memory, etc.
It is to be understood that the present invention is not limited to the embodiments described above, but encompasses any and all embodiments within the scope of the following claims.