The following relates to a hash function which provides the aggregation property and for which computation can be parallelized.
Hash functions can be used for a variety of purposes in the areas of authentication, encryption, document signing and so on. Properties of hash functions for most such purposes have a baseline common set of desirable characteristics. These characteristics include preimage resistance, meaning that it is difficult to determine a document that can be hashed to a specified hash value. Another characteristic is second preimage resistance, meaning that it is difficult to find a second input that hashes to the same hash value as a specified input. A related characteristic is that different documents should reliably produce different hash values (non-collision).
Another, less common, property of hash functions used in the area of encryption and other related data security fields is aggregation. Most hash functions used in the field of data security require recomputation of the hash of an entire message when further data is appended to the message. Values produced in computing the hash of the original message generally are not reused in computing the hash of the appended message. A hash function that has the property of aggregation, however, can use values produced during computation of a hash value for an original message when computing the hash for the appended message.
Hash functions with these properties remain the subject of ongoing research, and further advancements in this area are useful.
The following relates to providing a hash function that can meet principal desired hash properties, including preimage, second preimage, and collision resistance, while also providing other properties that can be desirable in some applications including aggregation (e.g., providing a hash of a file comprising an original dataset and a dataset appended thereto) as well as parallelization. The aggregation property of the hash function can be useful in situations where portions of a given message are streamed over a period of time to a receiving entity, or where a file may be appended frequently, such as a log file.
In a first aspect, a method for hashing a message comprises accessing a first 2×2 matrix A and a second 2×2 matrix B. The matrices A and B are each selected from the Special Linear group SL2(R). R is a commutative field defined as R=F2[x]/(P), with F2[x] being the set of polynomials having coefficients selected from the set of F2={0,1}, and (P) being the ideal of F2[x] generated by an irreducible polynomial of order n.
The method further comprises accessing a message M as a sequence of bits. The method also comprises producing a multiplicative matrix sequence by substituting one of the B matrix and the A matrix either for binary 0 bits or for binary 1 bits in M, and substituting the other of the B matrix and the A matrix for the other of binary 0 bits and binary 1 bits. Alternatively expressed, A and B can be selected arbitrarily from suitable matrices, and one of A and B is substituted for binary 0 bits, and the other of A and B is substituted for the binary 1 bits in M.
The method further comprises computing the product of the matrices in the sequence to produce a 2×2 matrix (h). Each element of h has n bits. The method further comprises rearranging the bits of h into an l by l matrix Y, and computing g=PYQ—1. P and Q are each invertible l by l matrices with elements randomly chosen from F2. The calculated g value, or a value derived therefrom, is outputted as the hash for the message M.
As an example of the aggregation property provided by hash functions according to these disclosures, the method may further comprise producing a second matrix sequence for a message M2, that is to be concatenated with an M1, and computing the product of the matrices in the second matrix sequence to produce a 2×2 matrix (h2). Each element of h2 has n bits. The method also comprises multiplying an h1 produced for M1 and h2 to produce a 2×2 matrix h12; each element of h12 also has n bits. The method also comprises rearranging the bits of h12 into an l by l matrix Y12 and computing g′=PY12Q−1 as a hash for the concatenation of message M1 and M2. This example can be extended to a large number of message portions, such that hash computation can be distributed among a plurality of processing resources.
The bit rearranging step of the method can be performed by an invertible defolder function. The defolder function in some cases receives a 2×2 matrix, with each element having n bits, and outputs the l by l matrix Y having a total of 4n bits. In other cases, the defolder function can compress or expand the number of bits either inputted or outputted.
An entity can recover h from g=PYQ−1 by using an inverse of F, an inverse of P, and Q. Q is invertible and computing its inverse, Q−1, is straightforward. There can be a unique P and Q pair pre-shared between each pair of entities seeking to exchange messages and hashes thereof. An entity seeking to validate a message using a g value calculated therefore can independently calculate a g value based on the data received for the message and compare the g values, or the entity can recover the h value and compare a computed h to the recovered h.
The following relates to providing a hash function that can meet principal desired hash properties, including preimage, second preimage, and collision resistance, and which also has the aggregation property (providing a hash of a file comprising an original dataset and a dataset appended thereto with reuse of at least some values computed during the computation of the hash for the original dataset), and/or parallelization (concurrently performing computations of a hash function for a message in multiple processing resources).
Many hash functions well-known in the area of data security and encryption, such as MD5 and SHA-1, operate on blocks of message data. For example, MD5 operates on 512 bit blocks and produces a 128 bit hash value. These hash functions do not provide the aggregation property. Larger data sets, such as audiovisual information (e.g., streaming media, logging functions, and so on), can benefit from a hash having aggregate properties. A Tillich-Zemor hash function operates instead on a bitstream, and aspects concerning this hash function are described in more detail in an example depicted in
In the particular example of
For each bit of M, one of two 2×2 matrices, A and B, is substituted (step of mapping 210) for that bit. For example, step 210 may comprise substituting the B matrix for each 0 bit (i.e., for each binary 0), and substituting the A matrix for each 1 bit (i.e., for each binary 1). Step 210 thus produces a sequence 255 of A and B matrices (which will be multiplied together in a later step).
The A and B matrices are chosen/formed as follows. P(x) is a polynomial that is defined over the field of polynomials having coefficients selected from F2={0,1} (i.e., all coefficients of P(x) are either 1 or 0). P(x) is irreducible. The symbol α denotes a root of P(x). In an example, P(x)=x256+x127+x23+x13+1. Each element of each A and B matrix can be represented as an n-bit sized buffer for the purposes of implementing the hash.
The matrices A and B each can be selected from a set of matrices comprising matrices that each has been created to have the properties described above. The set of matrices may comprise the matrices
Each of A and B would be selected from the set without replacement, such that A and B are different.
Returning to
In the example shown in
where l has the meaning/usage described below. The degree of P(x) can be 256.
In step 225, a de-folder function F( ) is defined to input h and output a square l×l matrix, Y. Thus, by requiring
the produced h value has 4n bits, and h can be used to generate the desired l×l matrix output. F( ) is invertible. A simple F( ) can comprise parsing h into l bit segments, and arranging them into entries of the l×l matrix. F( ) can implement more complicated rearrangements of the bits of h, as desired. For example, F( ) can shuffle or otherwise transpose bits of h into an order different from an implied order of the bits in h. In an implementation n can be 256, so that l=8.
As will be made evident below, F( ) also can be selected to have the property of associativity, such that F(h1*h2)=F(h1)F(h2), where h1 is a first hash from a first message (or message portion), M1, and h2 is a second hash from a second message (or message portion), M2. F( ) can be selected to avoid either compressing or expanding the inputted bits. Alternatively, some implementations may benefit from expanding or compressing the inputted bits.
Returning now to
Both P and Q are to be invertible and their values randomly chosen from F2.
Returning to
E2 receives data intended to comprise the message (the data received identified as ME2, allowing that it may be different from M). E2 can verify that the message was received properly, including that it was not corrupted or intentionally altered during transmission by separately hashing gE2=G(ME2)=PF(H(ME2))Q−1 and checking whether gE2=g. If they match, then it can be decided that M=ME2. In this usage model, E1 and E2 would pre-share the P and Q used (their inverses also can be pre-shared or calculated, given P and Q). A further example is that E2 could compute H (M) from g, (because P and Q are invertible), separately compute H(ME2), and determine whether H(M) and H(ME2) match.
The disclosed hash function can be used in a variety of situations where it is desirable to provide hash aggregates.
In a first example, E1 desires to produce hashes for two messages, M1 and M2, and provide the hashes to E2 with the messages. E1 can separately compute g1=PF(H(M1))Q−1 and g2=PF(H(M2))Q−1 according to method 200, above.
E1 can transmit both M1 and M2, and both g1 and g2 to E2. The data received at E2 is identified respectively as M1-E2, M2-E2, g1-E2 and g2-E2, allowing that any of the data transmitted could have been tampered with or corrupted. M1 and g1 may be calculated and/or transmitted at a different time than M2 and g2. In some cases, h1 or Y1=F(h1) can be calculated prior to determining that E2 is to receive M1. After it is determined that E2 is to receive message M1, the P and Q pre-shared between E1 and E2 may be accessed, and g1 may be created.
E2 may need to be able to verify M1-E2 when it is received, M2-E2 when it is received, as well as verifying the concatenation of M1 and M2 when both have been received. It also would be desired to use the computations performed to verify M1 to verify the hash for the entirety of M1-E2 and M2-E2.
In verifying M1-E2 with g1-E2, E2 can compute F(h1-E2) (i.e., performing hash H( ) on received M1-E2), perform the pre and postmultiplication respectively with P and Q−1 (i.e., computing PF(H(M1-E2))Q−1), and compare g1-E2 with the computed PF(H(M1-E2))Q−1.
To verify the concatenation of M1-E2 and M2-E2, E2 can compute F(h2-E2) (i.e., performing hash H( ) on received M2-E2). E2 also computes g1QP−1g2 and determines whether it equals PF(h1)F(h2)Q−1 (where F(h1-E2) already was computed). Thus, E2 avoids repeating computations required to produce h1.
Of course, in some implementations, a message may be segmented into many sub-parts, or a message may be streamed over a period of time, such that practical implementations would compute a large number of hashes over the course of a large data transfer, such as a video. Especially for computation constrained or power consumption constrained devices, avoidance of unnecessary computation is valuable, as it can allow using less powerful and often cheaper parts, provide longer battery life, and so on.
Another usage involves verifying an origin of an aggregation of message components, an example of which is presented below. In such a usage, E1 computes g3=PF(h1)F(h2)F(h1*h2)Q−1. In this usage, the operator * can provide for the output of F(h1*h2) to be a 2×2 matrix, each element of such having n bits. As such, h1*h2 can be implemented as a matrix multiplication, for example. In another example, h1*h2 can be implemented as a concatenation, h1∥h2, and F( ) also can compress the bits of the concatenation to 4n bits while performing the function of producing the matrix Y.
E1 also computes g4=PF(h1∥h2)Q−1. E1 sends g1, g2, g3, and g4 to E2. E2 determines whether g3-E2=g1-E2Q−1Pg2-E2Q−1Pg4-E2 (all subscripts identifying “as received” values) to verify the source of the aggregation.
It was introduced with respect to
In a further implementation, a number of pre-computed multiplicative combinations of the A and B matrices can be provided. For example, W={A, A2, AB, B2, A2B, B2A, B2A2 . . . AcBd} can be provided, where c and d can be any integers greater than or equal to 1, and each element in W is a 2×2 matrix. Having a wider variety of potential mappings between pre-multiplied combinations of the A and B matrices can increase speed and efficiency. For example, when parsing a message in order, there can be a number of substitutions of elements from W that are valid, and among these valid selections, a choice can be made. The choice can be made to increase speed. In some implementations, the choice can be made to provide random or pseudorandom selections of these pre-computed matrices.
Method 300 includes picking (step 305) an A matrix and a B matrix from an available set of matrices. A set of multiplicative combinations of the selected A and B matrices (e.g., W) is produced (step 310).
Method 300 includes determining (step 315) respective P and Q combinations for each entity of E2 through En to be communicating with E1. In other words, a distinct P and Q pair is generated for communication between E1 and E2 and so on. Optionally, inverses for each P and Q generated can also be determined in step 310. Step 320 comprises sharing these P and Q combinations, and optionally their calculated inverses, with respective entities. Step 325 comprises receiving, at each respective entity, its P and Q combination. If inverses for P and Q were not transmitted, then each entity can calculate (step 330) those inverses.
A method 400 can be performed each time a message M is to be sent from one entity to another (e.g., from E1 to E2). Method 400 comprises receiving (step 415) or otherwise accessing the message M. Method 400 also includes segmenting (step 420) M into a plurality of blocks m1-mb. These blocks are distributed (step 425) among a plurality of processing resources (e.g., selections of one or more of different FPGAs, processing cores, threads, servers, and the like). Each such processing resource also has access to the selected matrices A and B, or receives those as well. Each block m1-mb is parsed as a sequence of bits into respective multiplicative sequences of matrices (that can be selected from the matrices calculated in step 310 of method 300, above).
A method for selecting each matrix to substitute for one or more bits of each message block m1-mb may comprise maintaining a pointer to a current position in the string of bits comprising a given block being processed, and identifying several potential matrices that can be next selected. Each potential selection may represent a differing number of bits; for example A2 represents 2 bits, while A4 represents 4 bits, even though both are still 2×2 matrices. Then, a selection of one of these potential matrices is made, and that selection is added to the matrix sequence. The pointer is moved an appropriate number of bits, and the process is repeated. In some cases, the matrix can be selected based on which matrix represents a largest number of bits, e.g., BA2 would be selected preferentially to BA if the bitstring at the current pointer location included “011 . . . .”
Returning now to
Method 400 then comprises determining (step 445) which g values are to be calculated in this instance of method 400. For example, all of E2 through En (
Also, where it is desired to stream portions of the message, it may be desired to compute several intermediate g values for portions of M. For example, a g may be calculated for blocks on 1024 block intervals, e.g., a g can be calculated for m1 . . . m1024 for each of entities E2 through En. In producing the g value for m1 . . . m1024, there can be an h value calculated concurrently for each of m1 . . . m1024 in different computing resources, and these h values can be used in arriving at g1-1024=PF(h1h2 . . . h1024)Q−1.
Also, it may be desired to provide hash origin verifiability, and so respective g3 and g4 hashes also can be calculated for each of E2 through En. Thus, method 400 illustrates the broad usage of the hash function aspects disclosed above, including using respective unique P and Q pairs between each pair of communicating entities, allowing hash aggregation, parallelization of computation, and origin verification.
Similarly, step 450 comprises accessing the P and Q combinations (pre-shared between E1 and one of E2 through En), and which are to be used in producing the g values that were determined in step 445. Step 455 comprises calculating the g values, and step 360 comprises sending the g values to their respective entities. The send step 460 also may comprise identifying the g values, where multiple g values are included, so that they may be distinguished from each other. Such identification may be implicit in an ordering of a transmission. Also, an entity generating the hash values may provide information about what matrices were used to produce the hash values; for example, information to select the matrices from a known set of matrices can be provided.
Method 500 illustrates exemplary steps that can be performed by each of E2 through En when receiving message(s) and hash value(s) to be verified from E1, and which may have been formed according to the steps of method 400, above.
Method 500 comprises receiving one or more calculated g values, and as necessary, determining what the g values represent (e.g., a hash value for a single block, for a concatenation of blocks, for verifying origin, and so on). Method 500 further comprises receiving (step 510) data representative of a message or blocks (when aggregating) of a message. Step 515 comprises accessing the P and Q for the entity performing method 500. Method 500 also comprises accessing (step 520) the selected A and B matrices that were used in producing the g value(s) provided. Such matrices can be the same for all g values or can vary.
Method 500 also comprises performing (step 525) calculations for verifying the message or message blocks for which g values were provided. These calculations can vary depending on which g values were provided, and for which blocks. For example, if a single message hash value g is to be verified, then method 500 may comprise determining a corresponding g for the message as received and comparing (step 530) them. Other calculations that can be performed are described above, and include calculations relating to hash aggregates, and origin verification.
If values calculated (step 525) can be successfully compared (step 530), then a positive determination (step 535) that the received message accurately represents the transmitted message can be made. Responsive to that determination (step 535), the message and/or components thereof can be validated (step 545). If there is a failure to obtain a successful comparison in step 530, then the message or message components can be rejected as potentially invalid (step 540).
Method 500 can also comprise looping to step 505 where more g values can be received, which can relate to further received data (step 510). The looping thus illustrates that the reception of data to be validated can occur over a period of time, as would often be the case when using hash aggregation.
The exemplary ordering of steps in any flow chart depicted does not imply that steps necessarily be completed in that order. Any of intermediate values, or final values, or other data described as being generated or produced can be saved to a computer readable medium, either temporarily, such as in a RAM, or more permanently, such as in non-volatile storage.
Chipset 622 also can interface with one or more data network interfaces 625, that can have different physical interfaces 617. Such data network interfaces can include interfaces for wired and wireless local area networks, for broadband wireless networks, as well as personal area networks. Some applications of the hashing disclosures herein can include receiving data from system 675 through physical interface 617 through data network 625, and in preparation to store the data in non-volatile storage 660, system 600 can calculate a hash value for the data, and use that hash value as an identifier for a file containing the received data. Further examples relating to file naming can include generating data through processing performed on processor 620, and which is to be stored in RAM 670 and/or non-volatile storage 660.
Methods according to the above-described examples can be implemented using computer-executable instructions that are stored or otherwise available from computer readable media. Such instructions comprise, for example, instructions and data which cause or otherwise configure a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Portions of computer resources used can be accessible over a network. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, or source code. Examples of computer-readable media that may be used to store instructions, information used, and/or information created during methods according to described examples include magnetic or optical disks, flash memory, USB devices provided with non-volatile memory, networked storage devices, and so on.
Devices implementing methods according to these disclosures can comprise hardware, firmware and/or software, and can take any of a variety of form factors. Typical examples of such form factors include laptops, smart phones, small form factor personal computers, personal digital assistants, and so on. Functionality described herein also can be embodied in peripherals or add-in cards. Such functionality also can be implemented on a circuit board among different chips or different processes executing in a single device, by way of further example.
The instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are means for providing the functions described in these disclosures.
Although a variety of examples and other information was used to explain aspects within the scope of the appended claims, no limitation of the claims should be implied based on particular features or arrangements in such examples, as one of ordinary skill would be able to use these examples to derive a wide variety of implementations. Further and although some subject matter may have been described in language specific to examples of structural features and/or method steps, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to these described features or acts. For example, such functionality can be distributed differently or performed in components other than those identified herein. Rather, the described features and steps are disclosed as examples of components of systems and methods within the scope of the appended claims.