Not applicable.
Not applicable.
A hash function is implemented to convert non-random (or not-so-random) values into uniformly distributed random numbers. The hash function is an important function in networking. Hash-based algorithms are increasingly proposed and deployed in networks, e.g., in relatively more critical and high speed components or devices. Some hash functions are implemented using software, such as the Bob Jenkin's hash and Murmurhash. Other hash functions are implemented using hardware, such as cyclic redundancy check (CRC), H3 (with fixed seed), and Pearson and Buzhash. Networking devices are increasingly dependent on probabilistic algorithms or data structures for performance. The algorithms or data structures can encounter pathological cases that can be problematic and unacceptably slow down network components or devices, e.g., routers. The problematic cases can sometime cause network failure, e.g., if triggered on multiple routers. The algorithms and data structures use hash functions to convert or reduce relatively sparse input sets into more dense and more manageable sets that can be better stored or handled in the networks. The hash functions are used to avoid at least a substantial amount of pathological cases that lead to network failure or reduced performance.
In one embodiment, the disclosure includes comprising a plurality of stages that are coupled in series and configured to implement a hash function, wherein the stages comprise a plurality of XOR arrays and one or more Substitution-Boxes (S-Boxes) that comprise a plurality of parallel gates
In another embodiment, the disclosure includes an apparatus comprising a plurality of XOR gates that are coupled in parallel, a plurality of input bits coupled to the XOR gates, and a plurality of output bits coupled to the XOR ,gates, wherein the XOR gates are configured to implement a linear mixing function of the input bits into the output bits as a stage of a non-cryptographic hash function.
In another embodiment, the disclosure includes an apparatus comprising a plurality of S-Boxes that are arranged in parallel, a plurality of input bits coupled to the S-Boxes, and a plurality of output bits coupled to the S-Boxes, wherein the S-Boxes are configured to implement a permutation and non-linear mixing function of the input bits into the output bits as a stage of a non-cryptographic hash function.
In yet another embodiment, the disclosure includes a method implemented by an apparatus comprising mixing a plurality of input bits to provide a plurality of output bits using a plurality of XOR arrays that are coupled in series in a non-cryptographic hash function architecture, and providing permutation of a plurality of input bits into a plurality of output bits using a plurality of S-Box arrays that are coupled in series with the XOR arrays in a non-cryptographic hash function architecture.
These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
Current available hash functions, e.g., that are used in networking, may not deliver sufficient randomness, may not be suitable for sufficiently low cost implementation, or both. Disclosed herein are systems and methods to provide an improved non-cryptographic hash function, which may be a general-purpose hash function and may use limited on-chip resources. The improved hash function may be based on cascading stages or blocks of XOR arrays and/or S-Box arrays to deliver improved performance of randomness. The hash function architecture may comprise a series of stages, which may comprise XOR array and/or S-Box, as described in detail below. The improved hash function may deliver improved randomness, e.g., closer to uniform distribution, compared to current available hash functions, and hence may provide better network performance. The improved hash function may also provide lower cost implementation than current available hash functions.
The hash function architecture 100 may also have a limited number of stages 110. For example, the hash function architecture 100 may have about 12 stages 110 that may be coupled in series. In other embodiments, the number of stages may range between about nine stages and about 12 stages. The limited number of stages 110 may allow for feasible hardware implementation, such as using application-specific integrated circuits (ASICs). The limited number of stages may also limit the total process time or delay of the series of stages 110, where each stage 110 may introduce a 1-cycle delay. The wire or link (connection) delay between the stages 110 may be substantially small or negligible with respect to the 1-cycle delay of the stage 110. Thus, in the case of 12 stages, the total delay may be limited to about 12 times the 1-cycle delay.
Each of the stages 110 may comprise about one gate, which may be a linear XOR array or a non-linear S-Box array, as described below and the hash function architecture 100 may not comprise a feedback in any of the stages 110. Such features may simplify the design of the hash function architecture 100. For instance, the stages 110 may comprise a determined combination of XOR and S-Box arrays in series. Each stage 110 may process a number of input bits and provide a corresponding number of output bits. The input bits to each stage 110, except the first stage 110 in the series, may be permutations of the output bits of a previous stage 110 in the series. As such, the output bits of each stage 110, except the last stage 110 in the series, may be permuted (e.g., redistributed or remixed) and then provided as input bits to the next stage 110.
The design methodology of the hash function architecture 100 may be based on multiple guidelines. One guideline is proper mixing of the input signal or bits to the first and remaining stages 110. Accordingly, a proper amount of entropy among the input bits may be provided, which may or may not be distributed evenly among the input bits, e.g., per stage 110 and/or between stages 110. For example, a Media Access Control (MAC) address may be about 48 bits, where the first or top about 24 bits may be used to indicate the manufacturer of the device. Such portion or address space of the MAC address may be sparsely populated and companies may standardize on a relatively small number of network device manufacturers. Thus, the top bits (e.g., 24 bits) may be more predictable than the remaining or low order bits, which may be considered when determining the mixing for all the bits. A suitable hash function may be configured to properly and efficiently mix the input bits (at each stage 110), and thus provide an improved (substantially random) final output (from the last stage 110). An efficient or improved hash function may establish substantial mixing of the input bits using the available (hardware) resources, e.g., as much as possible.
Another guideline for providing a proper hash function is using invertible mapping. Invertible mapping may comprise about the same number of input bits and output bits. Using invertible mappings may allow for more time to mix the entropy of the input bits among the output bits. For instance, if collisions are created early in the hash function, entropy may be lost without having a chance to mix that entropy into other bits. Using an invertible mapping at each step may guarantee reduced or no loss of entropy, and thus reduced or no algorithm-induced collisions. This may improve chances that uniform input is mapped to uniform output, as every part of the output space may be used.
Typically, hardware based hash functions may have direct control of values on the bit level and may have access to simpler building blocks, e.g., in comparison to software hash functions, which may be arranged in parallel. In hardware, bits may correspond to wires, and thus shuffling the bits of a value in a fixed pattern may be achieved by routing the wires representing that value to different locations. Relatively complex operations, such as integer multiplication and addition, may be too costly to include, in a hardware hash functions. Bit-wise operations, such as XOR, may be organized properly to mix the bits, where operations may be performed in parallel. Hardware hash performance may be measured in area (related to the number of gates and wires) and timing, which may depend on the wire length and the number of gates, e.g., on the longest path (the complete number of stages 110) from an input bit to an output bit.
In the hash function architecture 100, the stages 110 may be arranged in series to implement alternating bit mixing and permutation sequences. This design may be similar to a cryptographic substitution-permutation-network without key bits being merged at each round. Each component may be designed to be invertible, e.g., to avoid bias and losing input entropy. Additionally, the ratio of benefit to cost may be improved or maximized in each round (stage 110). A sufficient number of rounds (stages 110) may be used to achieve sufficient or substantial bit mixing
Building a substantially large mixing function may be achieved by placing gates in random looking patterns. However, such arrangement may include non-invertible components. Instead, linear functions (XOR arrays) may be used, which may be easily invertible. Linear functions may also provide relatively good mixing, although such functions may not cause any avalanche. The avalanche property may be achieved when each output bit is the non-linear mixing of every input bit. Further, building a substantially large single stage to mix all the input bits (e.g., about 128 input bits) simultaneously may have substantially high cost. Instead, since a single mixing function's size may be at least cubic in the number of bits to be mixed, a relatively low cost round (stage 110) may be used to mix bits in smaller batches (fewer than the total number of input bits). In exchange, multiple rounds may be needed for thorough mixing of all the input bits (e.g., 128 bits) into all the output bits (e.g., 64 bits).
Using multiple rounds or stages 110 and mixing bits in small clusters (of rounds), instead of using one substantially large mixing stage, may achieve relatively good avalanche properties with substantially lower costs. As such, the bits may be permuted in between rounds, e.g., so that bits may be mixed with different bits (at different rounds). Using some non-linear mixing (S-Box arrays) at a substantially small cost, operations may be repeated over many rounds to achieve a substantially complete avalanche at relatively low cost. Each mixing round or stage 110 may correspond to a linear XOR array or a non-linear S-Box array. Permutation rounds may be achieved using efficient hardware implementation of substantially large bit permutation functions by distributing and arranging wires appropriately between the stages 110. Efficient implementation may be evaluated using two metrics, cost (measured in area and delay) and diffusion (the spreading of input entropy to multiple bits). More details about implementing the features of the hash function architecture 100 are described below.
The XOR array may implement a substantially sparse invertible matrix multiplier for an input matrix (X) to obtain an output matrix (Y). The input matrix corresponds to the input bits and the output matrix corresponds to the output bits. The equivalent matrix representation of the XOR array 200 operation is also shown in
The XOR array 200 may not have any avalanche property, but may have substantially low cost and may mix bits efficiently (for that cost). Using more 3-input XOR gates may allow for a denser matrix, but the gate size may double (reach about 2× or twice the size) and the gate delay may also increase by about 60 percent. The complexity of routing may also increase since more non-adjacent bits may be needed. A similar cost and perhaps better mixing may be achieved from using two smaller stages of 2-input XOR gates instead of a 3-input XOR gate.
The XOR arrays may relatively quickly propagate bit changes but may not provide non-linearity. A hash function built using only XOR arrays may have poor avalanche property, poor random performance, and may be vulnerable to attacks. One way to avoid this pitfall is to use nonlinear block-to-block permutations, known as the S-Box in cryptographic context.
A n→n S-box may be considered as a permutation function on values 0-2n, which may get an n-bit value and return a n-bit value. Ignoring implementation considerations, a single 128→128 S-box may be used for the hash function and achieve substantially good hashing, e.g., by selecting an appropriate permutation of input values (input bits). However, building such an S-box may not be practical or may be substantially difficult. Hence, a series of simpler implementations may be used to approximate non-linearity. However, typical S-Boxes used in cryptographic applications may be substantially large, e.g., at least 6→6 and sometimes 8→8.
In the S-Box array 300, the S-Boxes 310 may be implemented using at least one of two choices, direct combinatorial logic and memory. Implementing a S-Box 310 using logic may have the advantage that the result (output) may be obtained substantially faster than using a memory lookup. The disadvantage of using logic may be size. For instance, for relatively large S-Boxes, the number of gates needed in the S-Boxes may be substantial. If the result (output) is needed in a relatively short time, e.g., for time critical applications, relatively small S-Boxes may be needed. The substantially larger delays of a memory lookup implementation may be tolerated if a large S-Box is needed.
A substantially small S-Box that may provide non-linearity is a 3→3 S-Box, which may be used as one or more S-Boxes 310. Among a plurality of possible permutations, the following permutation function (and its isomorphic equivalents) may be selected:
The hardware implementation (using gates) for the selected permutation is also shown above. The selected permutation function and the corresponding hardware implementation (using gates) for the selected permutation is also shown in
For comparison, an example of 4→4 S-Box may be represented as follows:
Q
a
=a
Q
b
=a
Q
c
=a
Q
d
=a
This 4→4 S-box may achieve better non-linearity than the 3→3 S-Box, but may have substantially higher cost. The AOI2222 gates may be available in some standard cell libraries, but may not be capable to compute Q above using a single large gate. A possible option for using larger S-Boxes (than 3→3 S-Boxes 310) may be to use multiple gates, which may result in substantially larger cost in area and timing. Using 3→3 S-Boxes 310 as building elements of the S-Box array 300 may be a trade-off between unit cost and mixing ability.
An arbitrary permutation of the state bits at each stage may be desired. Ideally, using an arbitrary 128-bit P-Box to achieve permutation between the hash function architecture stages (e.g., stages 110) may be desirable. However, this may require more space or room than may be available (e.g., on a chip). With a relatively long or wide data path, the distance from one end (first stage) to the other (last stage) may be substantial in terms of wire delay. Implementing arbitrary permutations of the input bits may also be difficult due to the cost of crossing wires between the stages. As integrated circuits are constructed in layers, in order to swap two wires, vertical connections vertically across layers may be needed. Laying out an arbitrary permutation in silicon requires substantially more area than just connecting straight through from one set of gates to the next. By constraining permutation in the hash function architecture, a relatively good quality permutation may be achieved without having substantial cost.
In an embodiment, to limit crossing wire cost (between stages), input wires may be randomly assigned at each stage to a plurality of groups, e.g., using the same number of input and output wires per group. Within a group (per stage), one input point (input bit corresponding to a stage) may be randomly connected to an output point (output bit corresponding to a previous stage), and the next input point to the next output point, until all points are covered. Continuing in this manner, the set of input bits may be rotated (redirected or reshuffled) in the corresponding group to the output bits in that group. Doing this for each group may require two layers per group, e.g., one layer for the wires where bits shift down and another layer for the wires where bits shift up.
Typically, cryptographic hash functions may implement some pre-processing of a hash key to make it more difficult for an adversary to force collisions. For example, in a plurality of currently used hash functions, the length of the input may be appended to the input itself, e.g., as part of the final block, which may be referred to sometimes as “whitening”. The hash function architecture 100 may not comprise the whitening step. However, in some embodiments, the whitening step may be included in the architecture or excluded as part of the cost/benefit trade-off in implementation. One feature of the hash function architecture 100 may be to provide a result (or output) with fewer bits than the input values or input bits. This may be achieved using a non-invertible final step or stage. If good mixing is achieved in the stages up to the final stage, all of the resulting output bits may be equally important. Thus selecting any set of the resulting output bits may be about equally good. A more complicated final step may be used in cryptographic hash functions to obscure the internal state. Such final step may not be included in the hash function architecture 100 to reduce cost. In an adversarial environment, the hash function architecture may be strengthened or better secured using a post processor to hide the internal state or details.
The network components described above may be implemented on any general-purpose network component, such as a computer or network component with sufficient processing power, memory resources, and network throughput capability to handle the necessary workload placed upon it.
The secondary storage 704 is typically comprised of one or more disk drives or erasable programmable ROM (EPROM) and is used for non-volatile storage of data. Secondary storage 704 may be used to store programs that are loaded into RAM 708 when such programs are selected for execution. The ROM 706 is used to store instructions and perhaps data that are read during program execution. ROM 706 is a non-volatile memory device that typically has a small memory capacity relative to the larger memory capacity of secondary storage 704. The RAM 708 is used to store volatile data and perhaps to store instructions. Access to both ROM 706 and RAM 708 is typically faster than to secondary storage 704.
At least one embodiment is disclosed and variations, combinations, and/or modifications of the embodiment(s) and/or features of the embodiment(s) made by a person having ordinary skill in the art are within the scope of the disclosure. Alternative embodiments that result from combining, integrating, and/or omitting features of the embodiment(s) are also within the scope of the disclosure. Where numerical ranges or limitations are expressly stated, such express ranges or limitations should be understood to include iterative ranges or limitations of like magnitude falling within the expressly stated ranges or limitations (e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater than 0.10 includes 0.11, 0.12, 0.13, etc.). For example, whenever a numerical range with a lower limit, Rl, and an upper limit, Ru, is disclosed, any number falling within the range is specifically disclosed. In particular, the following numbers within the range are specifically disclosed: R=Rl+k*(Ru−Rl), wherein k is a variable ranging from 1 percent to 100 percent with a 1 percent increment, i.e., k is 1 percent, 2 percent, 3 percent, 4 percent, 5 percent, . . . , 50 percent, 51 percent, 52 percent, . . . , 95 percent, 96 percent, 97 percent, 98 percent, 99 percent, or 100 percent. Moreover, any numerical range defined by two R numbers as defined in the above is also specifically disclosed. Use of the term “optionally” with respect to any element of a claim means that the element is required, or alternatively, the element is not required, both alternatives being within the scope of the claim. Use of broader terms such as comprises, includes, and having should be understood to provide support for narrower terms such as consisting of, consisting essentially of, and comprised substantially of. Accordingly, the scope of protection is not limited by the description set out above but is defined by the claims that follow, that scope including all equivalents of the subject matter of the claims. Each and every claim is incorporated as further disclosure into the specification and the claims are embodiment(s) of the present disclosure. The discussion of a reference in the disclosure is not an admission that it is prior art, especially any reference that has a publication date after the priority date of this application. The disclosure of all patents, patent applications, and publications cited in the disclosure are hereby incorporated by reference, to the extent that they provide exemplary, procedural, or other details supplementary to the disclosure.
While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.
The present application claims the benefit of U.S. Provisional Patent Application No. 61/439,234, filed Feb. 3, 2011 by Nan Hua et al. and entitled “Good General-Purpose Hash Function with Limited Resources,” which is incorporated herein by reference as if reproduced in its entirety.
Number | Date | Country | |
---|---|---|---|
61439234 | Feb 2011 | US |