In lattice-based cryptography, keys and ciphertexts and/or signatures are usually large, in particular, larger than their RSA/ECC counterparts.
A lattice-based signature algorithm utilizes a sender (also referred to as a signer) that conveys a signature to a receiver (also referred to as a verifier). The receiver has to be able to determine whether or not the signature is valid.
To decrease the size of the public key, the lattice-based digital signature scheme Dilithium [LDK+21] splits an uncompressed public key element t into two parts: A first portion t1 contains the high-order bits of the coefficients of t and a second portion t0 contains the low-order bits of the coefficients. Only t1 is part of the public key, which decreases the required memory size compared to returning the uncompressed public key t.
However, for the returned signatures to be valid, the omission of t0 in the public key needs to be taken into account during signing. During each signing operation, the effect caused by omitting t0 is computed, and this effect is included in the signature. The verifier then uses so-called hints to correct the missing part t0.
Computing the hints using the method described in the current Dilithium specification requires knowledge of t0, which requires to t0 be stored as part of the key. This could be a problem for devices with small memory since t0 can be quite large (i.e., up to 3.25 kB). The public component t1, though not directly required for signing, also needs to be stored on the device for several applications. For example, the device may have to transmit its public key (in the form of a signed certificate) to a verifier. Hence, with regard to the memory limitations, the signing device does not benefit from the public-key separation into t0 and t1.
Hence, it is an objective to provide a more efficient way to save memory space when handling keys on a device.
This is solved according to the features of the independent claims. Further embodiments result from the depending claims.
The examples suggested herein may, in particular, be based on at least one of the following solutions. Combinations of the following features may be utilized to reach the desired result. The features of the method could be combined with any feature(s) of the device, apparatus, or system or vice versa.
It is noted that a cryptographic key may be used for any key used in a cryptographic context. It may, e.g., be any private, secure, or public key. The cryptographic key may, in particular, be a key used in combination with post-quantum cryptography.
A method is suggested for generating a lattice-based signature (σ) comprising
Advantageously, the hints can be computed using a part of the public key, in the example described herein, the part t1, instead of a part of the secret key t0, which is larger than t1. Hence, only the part t1 needs to be stored locally on the device (but not t0).
It is yet another advantage that all additional computations are conducted on public data, which results in a reduced security overhead.
Also, as verification is part of the signature generation, this solution can be used to detect faults or errors that may occur during the signing process.
Instead of computing the effects of omitting t0 in the forward direction (using t0 to compute the difference), the effects of omitting t0 are computed in the backward direction: The uncorrected signature received is processed by the verification (using t1). The observed difference between this outcome and intermediates computed during signing can be used to compute the hints h.
According to an embodiment, the signature generation is based on or interworks with the lattice-based digital signature scheme Dilithium.
According to an embodiment, the verification is based on a first portion (t1) of an uncompressed public key element (t), wherein the second portion (t0) of the uncompressed public key element (t) is not used during signature generation.
According to an embodiment, the second portion (t0) of the uncompressed public key element is longer than the first portion (t1) of the uncompressed public key element.
According to an embodiment, the second portion (t0) of the uncompressed public key element is not stored locally.
Locally may, in particular, refer to a particular device. It can also be a device (or on several devices) which are connected via a network.
According to an embodiment, the second portion (t0) of the uncompressed public key element is not made publicly available.
According to an embodiment, the verification is used to determine whether a fault has occurred during the generation of the lattice-based signature.
According to an embodiment, the compression function is at least partly conducted on one of the following:
Also, a device is provided for generating a lattice-based signature (σ), wherein the device is arranged to execute the following steps:
According to an embodiment, the device is one of the following or comprises at least one of the following:
Further, a computer program product is provided, which is directly loadable into a memory of a digital processing device, comprising software code portions for performing the steps of the method as described herein.
Embodiments are shown and illustrated with reference to the drawings. The drawings serve to illustrate the basic principle so that only aspects necessary for understanding the basic principle are illustrated. The drawings are not to scale. In the drawings, the same reference characters denote like features.
The examples described herein allow for an efficient computation of the hints using the public key t1 instead of t0. Hence, t1 is stored locally on a device, but not t0. This is advantageous because t1 is required for verification purposes anyway.
It is also an advantage that all added computations are conducted on public data, which results in a reduced security overhead. Also, the approach can be used for fault detection purposes.
Instead of computing the effects of omitting t0 in the forward direction (using t0 to compute the difference), the effects of omitting t0 are computed in the backward direction: The uncorrected signature received is processed by parts of the verification process (using t1). The observed difference between this outcome and intermediates computed during signing can then be used to compute the hints that allow correcting the signature.
Hereinafter, a brief and simplified description of the involved parts of the Dilithium signature scheme is provided. For more details, reference is made to [LDK+21].
Dilithium operates on vectors and matrices comprising polynomials with coefficients modulo q. Polynomials are denoted using plain letters, e.g., a, whereas vectors and matrices are denoted by bold lowercase a and uppercase A letters, respectively.
This pseudo-code comprises three sections: a key generation portion, a sign (signature generation) portion, and a verify portion. These will be described next with reference to the line numbers, as shown in
A secret key sk comprises the seeds ρ and K, the hash value tr of the public key, the secret key elements s1 and s2, and the lower-order bits t0 of the uncompressed public key element t.
It is noted that ρ, tr, and to are required for signing and are therefore included in the private key, but they do not need to be kept secret and can be revealed. On the other hand, K, s1, and s2 need to be kept secret.
is true, then at least one coefficient of z falls outside the open interval (−(γ1, −β), (γ1, −β)). If this is the case for either z or r0 (for their respective range parameters), the signature output (z,h) is set to invalid ⊥ and the loop starting at line 13 enters another iteration.
First, if the value of ct0 exceeds a predefined threshold (i.e., becomes too large) at any coefficient, the functions MakeHint and UseHint are no longer capable of correcting the error, meaning that the current signature needs to be rejected.
Second, the vector h is encoded such that up to ω coefficients of
can be corrected during verification. If this number is exceeded, i.e., more than w coefficients need to be corrected, this information cannot be properly encoded anymore, and thus the errors cannot be corrected during verification. Hence, the current signature needs to be rejected.
If the checks show that the signature needs to be rejected, then the signature output (z,h) is set to invalid ⊥ resulting in another iteration of the loop starting in line 13.
If the full vector t was included in the public key, the verifier could compute
Checks (or tests) performed during the signing portion ensure that the following applies:
The verifier can thus recompute
and the verifier can determine if this {tilde over (c)} matches the signature as provided.
As explained with regard to the Dilithium implementation above, only the portion t1 (instead of the full vector t) is supplied to the verifier, which then actually computes (see line 30 of the pseudo-code shown in
To correct the added term ct0, the signer computes hints h, which represent the effect of this addition, and the signer includes these hints in the signature. Actually, the hints describe if the addition of ct0 causes a carry to ripple into the HighBits of w-cs2 (see Algorithm 1, which is computed for each coefficient). Using the hints, the verifier can subtract the carry from the coefficients and thereby recover the correct
In the existing Dilithium approach, computing the hints requires the portion t0 of the public key t.
The examples described herein do not require storing the vector to as part of the secret key.
In an exemplary embodiment, a (potential) signature output (z,c) is fed to a verification algorithm, which uses the vector t1 to compute
(see Equation (2)). This term is then compared to
which is computed during signing. By comparing the HighBits of these two terms, the hints h can be computed.
An exemplary implementation of a modified signature generation may comprise the following steps:
These steps are part of Algorithm 2, which shows a modified Dilithium signing algorithm by replacing line 23 of
Since the function MakeHint computes the sum of its two operands (when implemented according to the Dilithium specification or the simplified Algorithm 1), running
corresponds to computing
and comparing the results for equality (see Algorithm 1).
The hints h are set to 1 for coefficients where the results differ, and to 0 if they are equal. However, the variable v still needs to be computed to enable the check of ∥v∥∞.
u ≙ w − cs2 + ct0, cf. Equation (2)
v ≙ ct0
equivalent to MakeHintq(−ct0,w−cs2+ct0)
Algorithm 2 utilizing the private key sk.
A result w-cs2 is conveyed from block 202 to block 203. The hash value {tilde over (c)} and the output value z are also determined by block 202 and are conveyed to block 204 as well as supplied as part of the signature σ.
Block 204 conducts a verification according to line 16 of Algorithm 2 based on the portion t1 of the public key t. Further, block 204 outputs
according to Equation (2)) to the block 203 and to a block 206.
Block 203 computes a difference between the input from block 204 and block 202, i.e.,
which equals v and corresponds to line 17 of Algorithm 2. This vector v is supplied to block 206 and to block 205.
Block 205 conducts a check operation of
according to line 19 of Algorithm 2, which may result in a restart of the loop, as explained above.
Block 206 calculates the hints h according to line 18 of Algorithm 2 utilizing the function MakeHintq based on the inputs −v and u.
Hence, at the output of block 201, the signature
of the Message M is available.
Although not the full verification is computed in this modified sign procedure, it allows an advantageous way of including a sign-then-verify fault countermeasure, which reveals faults that might have been inserted during the sign procedure.
The difference between
must not exceed 1 (in absolute value) for each coefficient. Since the magnitude of ct0 lies within a certain range (this is verified in line 24 of
The current solution does not compute the quantity
by adding ct0 to w-cs2, but instead by evaluating a portion of the verification procedure. If, e.g., a fault is injected in the computation w=Ay during signing, then the corrupted w-cs2 will likely significantly differ from u:=Az-ct1·2d (which is equivalent to w-cs2+ct0 in the undisturbed case). Thus, by testing if the difference between HighBits(w-cs2) and HighBits(u) is at most one, certain faults can be detected.
The solution presented allows a significant reduction of memory space, which allows existing memory to be used for other purposes. A device supporting both signing and verification for a keypair may, therefore, advantageously only store the vector t1 instead of both vectors t1 and t0, allowing savings of up to 3.25 kB per key. A device merely supporting signing for a keypair also benefits because t1 can be stored instead of t0, and t1 requires less memory space than t0.
Another advantage is that the approach presented has built-in fault detection capability (if required).
It is also an advantage that the solution operates on public data (public key, signature output), which may not require additional security measures like side-channel protection.
While the computation is part of the rejection loop and certain conditions require restart signing (see [LDK+21] for details), the conditions are checked last and are thus unlikely to fail. Hence, the additional computations suggested in the modified sign procedure are likely to be computed only once.
The current reference implementation of Dilithium suggests a function MakeHint that uses
as input. In this case, ct0 can be recovered first before computing
The following refers to further examples of processing devices that can be used to implement the invention as described herein.
In this example, the CPU 501 has access to at least one crypto module 504 over a shared bus 505, to which each crypto module 504 is coupled. Each crypto module 504 may, in particular, comprise one or more crypto cores to perform certain cryptographic operations. Exemplary crypto cores are:
The CPU 501, the hardware random number generator 512, the NVM 503, the crypto module 504, the RAM 502, and the input/output interface 507 are connected to the bus 505. The input-output interface 507 may have a connection to other devices, which may be similar to the processing device 500.
The crypto module 504 may or may not be equipped with hardware-based security features.
The bus 505 itself may be masked or plain. Instructions to process the steps described herein may, in particular, be stored in the NVM 503 and processed by the CPU 505. The data processed may be stored in the NVM 503 or the RAM 502.
Supporting functions may be provided by the crypto module 504 (e.g., expansion of pseudo-random data).
Steps of the method described herein may exclusively or at least partially be conducted on the crypto module 504, e.g., on the lattice-based crypto core 508.
The processing device 500 may be a chip card powered by direct electrical contact or through an electromagnetic field. The processing device 500 may be a fixed circuit or based on reconfigurable hardware (e.g., Field Programmable Gate Array, FPGA). The processing device 500 may be coupled to a personal computer, microcontroller, FPGA, or smartphone.
The solution described herein may be used by a customer who intends to provide a secure implementation of lattice-based cryptography on a smart card or any secure element.
The HSM 601 comprises a controller 602, a hardware-random number generator (HRNG) 606, and at least one crypto module 603. The crypto module 603 exemplary comprises an AES core 604 and a lattice-based crypto (LBC) core 605.
According to one embodiment, the HSM 601 and the application processor 607 may be fabricated on the same physical chip with a tight coupling. The HSM 601 delivers cryptographic services and secured key storage while the application processor may perform computationally intensive tasks (e.g., image recognition, communication, motor control). The HSM 601 may be only accessible by a defined interface and considered independent of the rest of the system in a way that a security compromise of the application processor 607 has only a limited impact on the security of the HSM 601. The HSM 601 may perform all tasks or a subset of tasks described with respect to the processing device 600 by using the controller 602, the LBC 605, supported by, exemplary, an AES 604, and the HRNG 606. It may execute the procedures described herein (at least partially) either controlled by an internal controller or as a CMOS circuit. Moreover, the application processor 607 may perform the procedures described herein (at least partially, e.g., in collaboration with the HSM 601).
The processing device 600 with this application processor 607 and HSM 601 may be used as a central communication gateway or (electric) motor control unit in cars or other vehicles.
In one or more examples, the functions described herein may be implemented at least partially in hardware, such as specific hardware components or a processor. More generally, the techniques may be implemented in hardware, processors, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media, including any medium that facilitates the transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media, which is non-transitory, or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for the implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium, i.e., a computer-readable transmission medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Instructions may be executed by one or more processors, such as one or more central processing units (CPU), digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein, may refer to any of the foregoing structures or any other structure suitable for the implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a single hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Although various exemplary embodiments of the invention have been disclosed, it will be apparent to those skilled in the art that various changes and modifications can be made that will achieve some of the advantages of the invention without departing from the spirit and scope of the invention. It will be obvious to those reasonably skilled in the art that other components performing the same functions may be suitably substituted. It should be mentioned that features explained with reference to a specific figure may be combined with features of other figures, even in those cases in which this has not explicitly been mentioned. Further, the methods of the invention may be achieved in either all software implementations, using the appropriate processor instructions, or in hybrid implementations that utilize a combination of hardware logic and software logic to achieve the same results.
Such modifications to the inventive concept are intended to be covered by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
102023101800.0 | Jan 2023 | DE | national |