Elliptic curve point multiplication

BACKGROUND

Elliptic Curve Cryptosystems (ECCs) constitute a new emerging class of public key cryptosystems, and have been widely applied in applications such as smart cards and embedded systems. The security of most ECCs is based upon the difficulty of solving a discrete logarithm problem based on a group of points on the elliptic curve. Where the elliptic curve is chosen correctly, the best of known methods configured for finding the discrete logarithm are of exponentially increasing difficulty. Thus, ECC exploits the fact that there is no sub-exponential method to solve the discrete logarithm problem on elliptic curves. Compared with other public key cryptosystem such as RSA, ECC uses shorter key sizes for the same level of security. This translates into fewer requirements on storage, memory, and computing power.

Unfortunately, conventional methods of operation in ECC are vulnerable to side-channel attacks. Side channel attacks measure observable parameters such as timings or power consumptions during cryptographic operations to deduce all or part of the secret information within the cryptosystem. For example, the comb method and other efficient point multiplication methods are vulnerable to power-analysis attacks. Power analysis attacks are based on an analysis of power consumed by a system. Information on the power used by a system assists the attacker to make assumptions on the operations performed by the system, and ultimately, to make guesses about secrets contained within the system.

Scalar multiplication, e.g. elliptic curve point multiplication, plays a critical role in ECCs. In fact, the method by which such multiplications are performed has a tremendous influence on whether different side-channel attacks are effective. Therefore, improved methods would result in safer ECCs.

SUMMARY

Systems and methods configured for recoding an odd integer and for elliptic curve point multiplication are disclosed, having general utility and also specific application to elliptic curve point multiplication and cryptosystems. In one implementation, the recoding is performed by converting an odd integer k into a binary representation. The binary representation could be, for example, coefficients for powers of two representing the odd integer. The binary representation is then configured as comb bit-columns, wherein every bit-column is a signed odd integer. Another implementation applies this recoding method and discloses a variation of comb methods that computes elliptic curve point multiplication more efficiently and with less saved points than known comb methods. The disclosed point multiplication methods are then modified to be Simple Power Analysis (SPA)-resistant.

BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.

FIGS. 1-4 illustrate exemplary existing elliptic curve point multiplication methods.

FIGS. 5 and 6 illustrate exemplary existing elliptic curve point multiplication methods or corresponding recoding methods resistant to simple power analysis.

FIG. 7 illustrates an example of signed odd-only comb recoding method for an odd scalar.

FIG. 8 illustrates an example of a signed odd-only comb elliptic curve point multiplication method for odd or even scalars.

FIG. 9 illustrates an example of a signed odd-only comb elliptic curve point multiplication method for a point of odd order.

FIG. 10 illustrates an example of a signed odd-only comb elliptic curve point multiplication method for n divisible by w.

FIG. 11 illustrates an example of an SPA-resistant signed odd-only comb elliptic curve point multiplication method.

FIG. 12 illustrates an example of an SPA-resistant signed odd-only comb elliptic curve point multiplication method for n=┌log₂ρ┐ divisible by w.

FIG. 13 illustrates an example embodiment of recoding, which has general utility or could be utilized for comb elliptic curve point multiplication or within a cryptosystem.

FIG. 14 illustrates an example embodiment of operation of a comb elliptic curve point multiplication system.

FIG. 15 illustrates an exemplary computing environment suitable for implementing a comb elliptic curve point multiplication method.

DETAILED DESCRIPTION

Overview

Recoding an odd integer and elliptic curve point multiplication, having general utility and specific application to elliptic curve cryptosystems, are disclosed. In one implementation, a signed odd-only recoding method is presented that converts an odd scalar into a signed, nonzero representation wherein every comb bit-column custom character _iis a signed odd integer. Another implementation applies this recoding method and discloses a variation of comb methods that computes elliptic curve point multiplication more efficiently and with less saved points than known comb methods. The disclosed point multiplication methods are then modified to be Simple Power Analysis (SPA)-resistant by exploiting the fact that point addition and point subtraction are virtually of the same computational complexity in elliptic curves systems. The disclosed SPA-resistant comb methods are configured to inherit a comb method's advantage—running much faster than other SPA-resistant window methods when pre-computation points are calculated in advance or elsewhere. Combined with the techniques that convert a SPA-resistant method into a DPA-resistant method, the disclosed SPA-resistant comb embodiments are secure to all side-channel attacks. Accordingly, implementations disclosed herein are of use generally, and are particularly adapted for use in smart card and embedded applications wherein power and computing resources are at a premium.

Preliminaries

Elliptic curve equations are a central feature of Elliptic Curve Cryptosystems (ECCs). An elliptic curve over a field K can be expressed by its Weierstrass form:

E: y+a₁xy+a₃y=x³+a₂x²+a₄x+a₆, where a_i∈ F.

We denote by E(F) the set of points (x, y) ∈ F²satisfying the above equation plus the “point at infinity” custom character . With the chord-tangent process, E(F) forms an Ablelian group with the point at infinity as the zero. Let P₁(x₁, y₁) and P₂(x₂, y₂) be two finite points on E. The point addition P₃(x₃, y₃)=P₁+P₂is computed as follows:

- 1. If x₁=x₂=x and y₁+y₂=−(a₁x+a₃), then P₃=.
- 2. If P₁=P₂=(x, y) and 2y ≠−(a₁x+a₃), then
  ${\begin{matrix} x_{3} = {(\frac{3 x^{2} + 2 a_{2} x + a_{4} - a_{1} y}{2 y + a_{1} x + a_{3}})}^{2} + a_{1} (\frac{3 x^{2} + 2 a_{2} x + a_{4} - a_{1} y}{2 y + a_{1} x + a_{3}}) - a_{2} - 2 x \\ y_{3} = \frac{3 x^{2} + 2 a_{2} x + a_{4} - a_{1} y}{2 y + a_{1} x + a_{3}} (x - x_{3}) - y - (a_{1} x_{3} + a_{3}) \end{matrix}$
- 3. If P₁≠±P₂, then
  ${\begin{matrix} x_{3} = {(\frac{y_{2} - y_{1}}{x_{2} - x_{1}})}^{2} + a_{1} (\frac{y_{2} - y_{1}}{x_{2} - x_{1}}) - a_{2} - x_{1} - x_{2} \\ y_{3} = \frac{y_{2} - y_{1}}{x_{2} - x_{1}} (x_{1} - x_{3}) - y_{1} - (a_{1} x_{3} + a_{3}) \end{matrix}$

The group E(F) generated by an elliptic curve over some finite field F meets the public key cryptography requirements that the discrete logarithm problem is very difficult to solve. Therefore, ECCs have been used in many standards and applications. In order to speed up cryptographic operations, projective coordinates are used to represent elliptic curve points. By using the Jacobian projective coordinates, the Weierstrass equation can be written in the form:

E: Y²+a₁XYZ+a₃YZ³=X³+a₂X²Z²+a₄XZ⁴+a₆Z⁶.

The point at infinity custom character is then represented by (θ², θ³, 0) for all θ ∈ F*. An affine point (x, y) is represented by (θ²x, θ³y, θ) for all θ ∈ F* and a projective point (X, Y, Z)≠ corresponds to an affine point (X/Z², Y/Z³). Using projective coordinates, point addition can be calculated without costly field inversions at the cost of more field multiplications. A field multiplication is usually much faster than a field inversion, resulting in faster elliptic curve point addition. A special point addition wherein a point is added to itself is called doubling. The cost of point doubling is usually different from the cost of general point addition.

Elliptic curves used in cryptography are elliptic curves defined over fields F₂_m. or fields F_pwhere m is a large integer and p is a big prime. Over these two kinds of fields, Weierstrass equations of elliptic curves can be reduced to the following simpler form:

E: y²+xy=x³+ax²+b

defined over F₂_m, or

E: y²=x³+ax+b

defined over F_p, where p is a large prime. These equations simplify point addition and doubling, and may be referred to as the “short Weierstrass form”. In practical cryptographic applications, elliptic curves are usually required to have a subgroup of a large prime-order ρ. An elliptic curve cryptosystem typically employs only points in this ρ-order subgroup of E(F).

Adding a point P to itself k times is called scalar multiplication or point multiplication, and is denoted as Q=kP, where k is a positive integer. The simplest efficient method for scalar multiplication is the following double-and-add method that expands k into a binary representation.

FIG. 1 shows an exemplary double-and-add method 100 of scalar multiplication adaptable to an ECC context. At block 102, input to the double-and-add method 100 is a point P and an n-bit integer
$_{} k = \sum_{i = 0}^{n - 1} b_{i} 2^{i}$

with b_i∈ {0,1}. At block 104, the point Q is set initially to the point at infinity, i.e. we set Q= custom character . At block 106, a loop is entered, such as a for-loop with index i. The loop counts down from i=n−1 to 0 by −1. Within the loop: at block 108 set Q=2Q; and at block 110, if (b_i=1) then Q=Q+P. At block 112, the exemplary double-and-add method 100 of scalar multiplication returns Q, in the form Q=kP.

An integer k can be represented in a (binary) Signed Digit (SD) representation
$_{} k = \sum_{i = 0}^{n - 1} s_{i} 2^{i},$

where s_i∈ {−1,0,1}. A special SD representation called Non-Adjacent Form (NAF) is of particular interest, where there are no adjacent non-zero digits, i.e. s_i×s_i+1=0 for all i≧0. Every integer k has a unique NAF, and the NAF has the lowest weight among all SD representations of k. The expected weight of an NAF of length n bits is n/3, as compared with the expected weight of n/2 for the binary representation. Elliptic curve point subtraction has virtually the same cost as point addition. Therefore, the add-and-subtract method 200 of FIG. 2, using NAF for k, is more efficient than the double-and-add method 100.

FIG. 2 shows an add-and-subtract method 200 for scalar multiplication (i.e. point multiplication). At block 202, input to the add-and-subtract method 200 includes a point P and an n-bit NAF integer
$_{} k = \sum_{i = 0}^{n - 1} s_{i} 2^{i},$

s_i∈ {−1,0,1}. At block 204, the point Q is set initially at the point at infinity, i.e. set Q= custom character . At block 206, a loop is entered, such as a for-loop with index i. The loop counts down from i=n−1 to 0 by −1. Within the loop: at block 208, set Q=2Q; at block 210, if (s_i==1) then Q=Q+P; and at block 212, if (s₁==−1) then Q=Q−P. At block 214, return output Q, wherein Q=kP.

An integer k can be expanded into m-ary representation, where m is usually equal to 2^wfor some integer w≧2. The m-ary method 300 of FIG. 3 splits scalar multiplication into pre-computation and evaluation stages to trade storage for speed. The double-and-add method 300 is a special case of this method where w=1.

FIG. 3 shows an m-ary method 300 for scalar multiplication. At block 302, input to the m-ary method 300 includes a point P and an integer
$_{} k = \sum_{i = 0}^{d - 1} a_{i} m^{i},$

a_i∈ {0,1, . . . , m−1}. Blocks 304-308 represent the pre-computation stage. At block 304, set point P₁=P. At block 306, a one-step loop is entered, such as a for-loop with index i. The loop counts from i=2 to m−i. Within the one-step loop: at block 306, set P_i=P_i−1+P. At block 308, the point Q is set initially at the point at infinity, i.e. set Q= custom character . Blocks 310-314 represent an evaluation stage. At block 310, a loop is entered, such as a for-loop with index i. The loop counts down from i=d−1 to 0 by −1. Within the loop: at block 312, set Q=mQ, which requires w doublings; and at block 314, set Q=Q+P_a_i. At block, 316, return Q, wherein Q=kP.

The pre-computation stage in the m-ary method needs to store m−1≡2^w−1 points, including the storage for the input point P. A storage estimation in this disclosure is made with the input of a point P. An estimate of the time cost for the method may be made. In the cost estimation, operations involving custom character are not counted. The pre-computation stage needs to calculate 2P, 3P, . . . , [2^w−1]P. Since the sum of doublings and additions remains the same, and since doubling is more efficient than addition with the projective coordinates that are typically used in applications, use of doubling operations is generally preferred at this stage. An efficient scheme is to calculate in the following way:
$\begin{matrix} {\begin{matrix} P_{2 i} & = 2 P_{i,} \\ P_{2 i + 1} & = P_{2 i} + P . \end{matrix} & (1) \end{matrix}$

Using this scheme, the pre-computation costs (2^w−1−1)D+(2^w−1−1)A, where D means point doubling and A means point addition. As for the time cost for the evaluation stage, if an assumption is made that the most significant digit is not zero, i.e., a_d−1≠0, then the number of doublings in this stage is w(d−1). If a_i=0, then the addition in block 314 is not needed. If an assumption is made that k is uniformly distributed, then the average number of additions in the evaluation stage is
$\frac{2^{w} - 1}{2^{w}} (d - 1) .$

Therefore, the average time cost in the evaluation stage is approximately
${w (d - 1)} D + {\frac{2^{w} - 1}{2^{w}} (d - 1)} A,$

and the average total time cost is approximately
${2^{w - 1} - 1 + w (d - 1)} D + {2^{w - 1} - 1 + \frac{2^{w} - 1}{2^{w}} (d - 1)} A .$

Modification of m-ary methods can make them more efficient. For example, m-ary methods can also be extended to the sliding window methods. Comb methods are a further category of point multiplication methods. A comb method that can also efficiently calculate scalar multiplication is known. Let an n-bit integer
$k = \sum_{i = 0}^{n - 1} b_{i} 2^{i}$

with b_i∈ {0,1}. For an integer w≧2, set
$d = ⌈ \frac{n}{w} ⌉ .$

Then define: [b_w−1, b_w−2, . . . , b₁, b₀]≡b_w−12^(w−1)d+ . . . +b₁2^d+b₀, where (b_w−1, b_w−2, . . . b₁, b₀) ∈ Z₂^w. In a manner that is consistent with these calculations, the comb method 400 of FIG. 4 uses a binary matrix of w rows and d columns to represent an integer k, and processes the matrix column-wise.

FIG. 4 shows a fixed-base comb method 400 for scalar multiplication. At block 402, input to the fixed-base comb method 400 includes a point P, an integer
$k = \sum_{i = 0}^{n - 1} b_{i} 2^{i},$

b_i∈ {0,1}, and a window width w≧2. A pre-computational stage is seen in blocks 404-408. At block 404, [b_w−1, . . . b₁, b₀]P are computed for all (b_w−1, b_w−2, . . . b₁, b₀) ∈ Z₂^w. At block 406, determine k=K^w−1∥ . . . ∥K¹∥K⁰, where each K^jis a bit-string of length d. It may be necessary to pad with one or more 0 (zero) on the left in this operation. Let K_i^jdenote the i-th bit of K^j. Define custom character _i≡[K_i^w−1, . . . , K_i¹, K_i⁰]. At block 408, the point Q is set initially at the point at infinity, i.e. set Q=.

Blocks 410-414 represent an evaluation stage. At block 410 a loop is entered, such as a for-loop with index i. The loop counts down from i=d−1 to 0 by −1. Within the loop: at block 412, set Q=2Q; and at block 414, set Q=Q+ custom character _iP. At block 416, return Q, wherein Q=kP.

The comb method stores 2^w−1 points in the pre-computation stage. An estimate can be made of the time cost of the comb method. In the pre-computation stage, [b_w−1, . . . , b₁, b₀]P needs to be calculated for (b_w−1, . . . , b₁, b₀) ∈ Z₂^w. To achieve this, 2^dP, 2^2dP, . . . , 2^(w−1)dP are first calculated, which costs (w−1)d doubling operations. Then all possible combinations of only two nonzero bits in [b_w−1, . . . , b₁, b₀]P are calculated from the results of doubling operations. There are C_w²such combinations. Each combination uses one point addition. There are C_w²point additions in this step. In the next step, all combinations with exactly three nonzero bits are calculated. There are C_w³such combinations. Each needs one point addition from the previously calculated results. Therefore, this step costs C_w³point additions. This procedure continues until all bits are nonzero. The total number of point additions in the pre-computation stage is therefore
$\sum_{i = 2}^{w} C_{w}^{i} = \sum_{i = 0}^{w} C_{w}^{i} - C_{w}^{1} - C_{w}^{0} = 2^{w} - w - 1.$

And therefore, the time cost in the pre-computation stage is

{(w−1)d}D+{(2^w−w−1)}A.

To estimate the time cost in the evaluation stage, we assume the most significant column of { custom character _i} is not zero, i.e., _d−1≠0. Then the number of doubling operations in the evaluation stage is (d−1). If _i=0, then the point addition in block 414 is not needed. If we assume k is uniformly distributed, the probability that _i≠0 is
$\frac{2^{w} - 1}{2^{w}},$

and the average number of point additions is
$\frac{2^{w} - 1}{2^{w}} (d - 1) .$

Therefore, the average time cost in the evaluation stage is approximately
${(d - 1)} D + {\frac{2^{w} - 1}{2^{w}} (d - 1)} A .$

This cost is much smaller than the time cost in the evaluation stage for the m-ary method, which is
${w (d - 1)} D + {\frac{2^{w} - 1}{2^{w}} (d - 1)} A$

as derived previously. This gain at the evaluation stage is at the cost of higher time cost at the pre-computation stage for the comb method. The total time cost of the comb method is
${(w - 1) d + (d - 1)} D + {(2^{w} - w - 1) + \frac{2^{w} - 1}{2^{w}} (d - 1)} A = {w (d - 1)} D + {(2^{w} - w - 1) + \frac{2^{w} - 1}{2^{w}} (d - 1)} A .$

Side-Channel Attacks and Countermeasures

Two types of power analysis attacks are the Simple Power Analysis (SPA) and the Differential Power Analysis (DPA). The SPA analyzes a single trace of power consumption in a crypto-device during scalar multiplication. A branch instruction condition can be identified from the recorded power consumption data. This represents continuity of elliptic curve doubling operation. If the double-and-add method is used in calculating scalar multiplication, the value b_iof each bit of the secret multiplier k is revealed by this attack. For scalar multiplication methods, though SPA cannot deduce the value for each digit s_ior a_ifor add-and-subtract or m-ary methods, or custom character _iin the comb method, it can detect if the digit or _iis zero or not, which constitutes an information leak.

The DPA records many power traces of scalar multiplications, and uses correlation among the records and error correction techniques to deduce some or all digits of the secret k. DPA is more complex yet powerful than SPA. An SPA-resistant scalar multiplication method is not necessarily resistant to DPA attacks, however, many countermeasures can be used to transform an SPA-resistant method to a DPA-resistant method. A countermeasure is to make execution, and thus power consumption, different for identical inputs. Randomization is usually employed to achieve this effect. A number of randomizing approaches are feasible, including: randomizing input point in projective coordinates; randomizing exponential parameter representation; randomizing elliptic curve equation; and randomizing field representation. Additionally, each of these randomizing approaches can be applied to transform SPA-resistant methods disclosed herein to be resistant to DPA attacks.

To be effective, an ECC must provide countermeasures to SPA attacks. The particular approach of the implementations of the ECC disclosed herein is to make execution of scalar multiplication independent of any specific value of the multiplier k. For example, a simple way to make the double-and-add method resistant to SPA attacks is to remove the branching operation in the method so that the same operations are applied no matter b_iis 0 or 1. Accordingly, a double-and-add-always method may be configured.

FIG. 5 shows a double-and-add-always method 500 to provide SPA resistance. At block 502, input to the double-and-add-always method 500 includes a point P and an n-bit integer
$k = \sum_{i = 0}^{n - 1} b_{i} 2^{i},$

b_i∈ {0,1}. At block 504, the point Q₀is set initially at the point at infinity, i.e. set Q₀= custom character . At block 506, a loop is entered, such as a for-loop with index i. The loop counts down from i=n−1 to 0 by −1. Within the loop: at block 508, set Q₀=2 Q₀; at block 510, set Q₁=Q₀+P; and at block 512, set Q₀=Q_b_i. At block 514, return Q=Q₀, wherein Q is of the form Q=kP.

Another strategy to protect against SPA attack is to extend custom character _iin the comb method 400 of FIG. 4 to a signed representation (′_i, s_i), where each ′_iis nonzero. The following procedure is used to obtain such a signed representation (′_i, s_i) for an odd integer k represented by _i, 0≦i<d, in the comb method. Let s₀=1 and construct the rest by setting

( custom character ′_i, s_i)=(_i−1, s_i−1)
(′_i−1, s_i−1)=(_i−1, -s_i−1)

if K_i=0, and

(z,1 ′_i, s_i)=(_i, s_i−1)
(′_i−1, s_i−1)=(_i−1, s_i−1)

otherwise.

The comb method uses this signed representation to the conventional comb method to calculate (k+1)P for even k and (k+2)P for odd k. The point 2P is then calculated. The point P or 2P is subtracted from the result of the conventional comb method to obtain the desired point kP. The comb method has the same time and space cost as the original comb method has in the pre-computation stage, i.e., storage of 2^w−1 points and {(w−1)d}D+{(2^w−w−1)}A. The evaluation stage costs d−1 point additions and d−1 doublings. The last stage after the conventional comb method costs one doubling and one subtraction. Therefore the total cost is (w−1)d+(d−1)+1=wd doubling operations and (2^w−w−1)+(d−1)+1=2^w−w+d−1 adding operations. Compared with the method 400 of FIG. 4, this method has the same storage cost and a little higher time cost.

A further SPA-resistant m-ary scalar multiplication method for an integer
$k = \sum_{i = 0}^{d} a_{i} 2^{w i}$

with a_i∈ {0,1, . . . , 2^w−1} first converts k to another representation
$k = \sum_{i = 0}^{d^{'}} a_{i}^{'} 2^{w i}$

such that a′_i∈ {−2^w,±1,±2, . . . ,±(2^w−1−1), 2^w−1} and d′ is either d or d+1. Intuitively, this recoding method replaces 0 digits by −2^wand adjusts the next more significant digit to keep k unchanged. The recoding method is expressed recursively with two auxiliary values c_iand t_i, 0≦c_i≦2 and 0≦t_i≦2^w+1. Set c₀=0. Then, for i=0, . . . , d+1, let t_i=a_i+c_iand
$\begin{matrix} = (1, - 2^{w}) & if t_{i} = 0 \\ = (0, t_{i}) & if 0 < t_{i} < 2^{w - 1} \\ (c_{i + 1}, a_{i}^{'}) = (1, - 2^{w} + t_{i}) & if 2^{w - 1} < t_{i} < 2^{w} \\ = (2, - 2^{w}) & if t_{i} = 2^{w} \\ = (1, 1) & if t_{i} = 2^{w} + 1 \end{matrix}$

Note that the equation c_i+1·2^w+a′_i=t_ialways holds.

After the conversion, this scalar multiplication method is exactly the same as the m-ary method 300 of FIG. 3 for m=2^w, except that the pre-computation stage needs to calculate 2P, 3P, . . . , [2^w−1]P, and [−2^w]P. Since the sum of doublings and additions remains the same and doubling is more efficient than addition with projective coordinates that is typically used in applications, we would like to use doubling operations as many as possible in this stage. The most efficient scheme is to calculate using Eq. 1, resulting in the cost (2^w−1+1)D+(2^w−2−1)A for the pre-computation stage. The evaluation stage costs w(d−1)D+(d−1)A if d′=d or wdD+dA if d′=d+1. Therefore, the total cost is at least (2^w−2+wd−w+1)D+(2^w−2+d−2)A. Additionally, the pre-computation stage remembers 2^w−1+1 points.

A further SPA-resistant odd-only m-ary scalar multiplication method is based on the idea of converting an odd integer
$k = \sum_{i = 0}^{d} a_{i} 2^{w i}$

with a_i∈ {0,1, . . . , 2^w−1} to another representation
$k = \sum_{i = 0}^{d} a_{i}^{'} 2^{wi}$

such that a′_i∈ {±1,±3, . . . ,±(2^w−1)}. This can be achieved with the following recoding method.

FIG. 6 shows a further SPA-resistant odd-only m-ary recoding method 600 for an odd scalar. At block 602, input to the odd-only m-ary recoding method 600 includes an odd n-bit integer
$k = \sum_{i = 0}^{d} a_{i} 2^{wi}$

with a_i∈ {0,1, . . . , 2^w−1}. At block 604, a loop is entered, such as a for-loop with index i. The loop counts from i=0 to d−1 by 1. Within the loop: at block 606, if a_iis odd, then set a′_i=a_i; and at block 608, if a_iis even, then set a′_i=a_i+1 and a′_i−1−a′_i−1−2^w. At block 610, return output
$k = \sum_{i = 0}^{d} a_{i}^{'} 2^{wi}$

with a′_i∈ {±1,±3, . . . ,±(2^w−1)}.

Using this conversion, the m-ary method 300 of FIG. 3 is used with m=2^wto calculate [k+1]P for even k and [k+2]P for odd k. P or 2P is then subtracted from the result of method 300 of FIG. 3 to obtain the desired point kP. Method 600 needs to store 2^w−1 points P, 3P, 5P, . . . , [2^w−1]P. These points can be calculated in the pre-computation stage by first calculating 2P and then the rest iteratively with the equation [i]P=2P+[i−2]P. The cost in this stage is 1D+(2^w−1−1)A. The for-loop in the evaluation stage costs w(d−1)D+(d−1)A and the post-processing stage costs one doubling to calculate 2P and one subtraction to subtract either P or 2P. The total cost for OT's method is therefore (wd−w+2)D+(2^w−1+d−1)A.

Odd-Only Comb Method

An embodiment of an odd-only comb method includes significant advancements over known comb methods in ECCs. The odd-only comb embodiment transforms all the comb bit-columns { custom character _i≡[K_i^w−1, . . . , K_i¹K_i¹]} of a scalar k into a signed nonzero representation {′_i≡[K′_i^w−1, . . . , K′_i⁰K′_i⁰]≠0}. In a significant departure from known technology, every ′_igenerated is a signed odd integer. More specifically, the embodiment generates K′_i⁰∈ {1, 1} and K′_i^j∈ {0,K′_i⁰}, j≠0 for each comb bit-column custom character ′_i≡[K′_i^w−1, . . . , K′_i¹K′_i⁰], where 1 is defined as −1. Advantageously, the pre-computation stage only needs to calculate and save half of the points in conventional comb methods in ECCs.

FIG. 7 shows exemplary detail of a signed odd-only comb recoding method 700 for an odd scalar. Method 700 performs elliptic curve calculations adapted for use in comb elliptic curve point multiplication. The recoding method 700 is configured for a window width w≧2 where
$d = ⌈ \frac{n + 1}{w} ⌉ .$

Input to the method 700 includes an odd n-bit integer
$k = \sum_{i = 0}^{n - 1} b_{i} 2^{i}$

with b_i∈ {0,1}. Output from the method includes
$k = \sum_{i = 0}^{wd - 1} b_{i}^{'} 2^{i} \equiv K^{' w - 1}  \dots  K^{′1}  K^{′0},$

where each K′^jis a binary string of d bits long, and padding with 0 on the left if necessary. Within FIG. 7, let K′_r^jdenote the r-th bit of K′^j, i.e., K′_r^j≡b′_jd+r. Define custom character _r≡[K′_r^w−1, . . . , K′_r^{, K′}_r⁰]. The output satisfies K′_r⁰∈ {1, 1} and K′_r^j∈ {0,K′_r⁰} for j≠0 and 0≦r<d.

Referring to FIG. 7, the operation of the signed odd-only comb recoding method 700 for an odd scalar can be understood. At block 702, input is set to an odd n-bit integer according to
$k = \sum_{i = 0}^{n - 1} b_{i} 2^{i}$

with b_i∈ {0,1}. At block 704, a loop is entered, such as a for-loop with index i. The loop counts from i=0 to d−1 by 1. Within the loop: at block 706, if b_i=1 then set b′_i=1; and at block 708, if b_i=0 then set b′_i=1 and b′_i−1= 1. Upon leaving the loop at block 710, set
$e = ⌊ \frac{k}{2^{d}} ⌋$

and i=d. At block 712, a loop is entered, such as a while-loop with index i, configured to continue while i<wd. Within the loop: at block 714, the situation wherein e is odd and b′_{i mod d}= 1 is addressed by setting b′_i= 1 and
$e = ⌊ \frac{e}{2} ⌋ + 1;$

and at block 716, the alternative situation is addressed by setting b′_i=e mod 2, and
$e = ⌊ \frac{e}{2} ⌋ .$

At block 718, i is incremented by setting i=i+1. At block 720, the output
$k = \sum_{i = 0}^{wd - 1} b_{i}^{'} 2^{i} \equiv K^{' w - 1}  \dots  K^{′1}  K^{′0},$

where each K′^jis a binary string d bits long, having {b′_i} as constituent bits, and padding with 0 on the left if necessary, is returned.

The method 700 first converts each bit of the last (i.e., the least significant) d bits to either 1 or 1 in by exploiting the fact that 1≡1 11 . . . 1 In other words, each bit K′_r⁰, 0≦r<d, in K′⁰is either 1 or 1. The rest of the recoding method processes each bit from the d-th bit towards the highest bit. If the current bit is 1 and has a different sign as the least significant bit in the same comb bit-column K_i, the current bit is set to 1 and the value consisting of the remaining higher bits is added by 1 to keep the value of k unchanged. This process generates wd bits {b′_i} to represent an odd n-bit integer k.

Therefore, the method 700 of FIG. 7, when given an odd scalar k, outputs a sequence of bit-strings {b′_i} and { custom character ′_r≡[K′_r^w−1, . . . , K′_r¹, K′_r⁰]} such as
$k = \sum_{i = 0}^{wd - 1} b_{i}^{'} 2^{i}$

and for z,1 ′_r, K′_r¹, K′_r⁰∈ {1, 1} and K′_r^j∈ {1, K′_r⁰} for j≠0 and 0≦r <d, where K′_r^j≡b′_jd+r. The preceding statement can be proved as follows. It is obvious that each custom character ′_igenerated by method 700 satisfies the conditions that K′_r⁰∈ {1, 1} and K′_r^j∈ {0,K′_r⁰} for j≠0: Since k is odd, b₀=1. Blocks 704-708 set the last bit b′_rin ′_rto be either 1 or 1. Therefore, K′_r⁰≡b′_r∈ {1, 1}. In method 700, if b′_{i mod d}= 1 b′_iis set to 1 by block 714 or to 0 by block 716. If b′_{i mod d}=1, b′_iis set to either 0 or 1 by block 716. This means that all bits except the least significant bit in each custom character ′_reither is 0 or has the same value as the least significant bit.

To prove
$k = \sum_{i = 0}^{wd - 1} b_{i}^{'} 2^{i},$

we first prove
$\begin{matrix} \sum_{i = 0}^{d - 1} b_{i} 2^{i} = \sum_{i = 0}^{d - 1} b_{i}^{'} 2^{i} . & (2) \end{matrix}$

This can be done by induction to prove
$\sum_{i = 0}^{j} b_{i} 2^{i} = \sum_{i = 0}^{j} b_{i}^{'} 2^{i}$

for 0≦j<d. The equation holds for j=0 since k is odd. If the equation is true for j−1<d, then Steps 2 and 3 in method 700 ensures that the equation is also true for j<d. By setting j=d−1, we have the desired equation.

Denote the value of e as e_iwhen it comes into the i-th loop before block 714, where i>d. We assert that
$\begin{matrix} e_{i} 2^{i} + \sum_{j = 0}^{i - 1} b_{i}^{'} 2^{j} = k & (3) \end{matrix}$

is always true for i≧d. This can be done by induction. By using Eq. 2, we have
$k = e_{d} 2^{d} + \sum_{i = 0}^{d - 1} b_{i} 2^{i} = e_{d} 2^{d} + \sum_{i = 0}^{d - 1} b_{i}^{'} 2^{i} .$

This proves that Eq. 3 holds for i=d. Assume Eq. 3 is true for i≧d. If e_iis odd and b′_{i mod d}= 1, we have b′_i= 1 and
$e_{i + 1} = ⌊ \frac{e_{i}}{2} ⌋ + 1 = \frac{e_{i} - 1}{2} + 1,$

and
$e_{i + 1} 2^{i + 1} + \sum_{j = 0}^{i} b_{i}^{'} 2^{j} = (\frac{e_{i} - 1}{2} + 1) 2^{i + 1} - 2^{i} + \sum_{j = 0}^{i - 1} b_{i}^{'} 2^{j} = e_{i} 2^{i} + \sum_{j = 0}^{i - 1} b_{i}^{'} 2^{j} = k .$

The same procedure can be used to prove that
$e_{i + 1} 2^{i + 1} + \sum_{j = 0}^{i} b_{i}^{'} 2^{j} = e_{i} 2^{i} + \sum_{j = 0}^{i - 1} b_{i}^{'} 2^{j} = k$

when e_iis even or b′_{i mod d}=1. This means that Eq. 3 is also true for i+1. Therefore Eq. 3 holds for i≧d.

The last thing we need to prove is e_wd=0. Because k is an integer of n bits,
$e_{d} = ⌊ \frac{k}{2^{d}} ⌋$

is an integer of n−d bits: e_d<2^n−d. We would like to use induction to prove that for n≧i≧d

e≦2ⁿ⁻ⁱ. (4)

We have already proved when i=d. Suppose it is true for i, n>i≧d. If e_iis odd, the inequality implies that e_i≦2ⁿ⁻ⁱ−1. In this case, blocks 714-716 give
$e_{i + 1} \leq ⌊ \frac{e_{i}}{2} ⌋ + 1 \leq ⌊ \frac{2^{n - i} - 1}{2} ⌋ + 1 = 2^{n - (i + 1)} - 1 + 1 = 2^{n - (i + 1)},$

i.e., Eq. 3 is true for i+1 in this case. If e_iis even, then from block 716,
$e_{i + 1} \leq ⌊ \frac{e_{i}}{2} ⌋ \leq ⌊ \frac{2^{n - i}}{2} ⌋ = 2^{n - (i + 1)} .$

Eq. 3 still holds. In other words, Eq. 4 is true for i+1≦n. Therefore, Equation (4) is proved to be true for n≧i≧d. Eq. 4 derives that e_n≦2⁰=1.

Since
$d = ⌈ \frac{n + 1}{w} ⌉,$

we have n+1≦wd. If n+1=wd, then e_wd−1≡e_n≦1. If n+1<wd, then from blocks 714-716,
$e_{n + 1} \leq ⌊ \frac{e_{n}}{2} ⌋ + 1 \leq ⌊ \frac{1}{2} ⌋ + 1 = 1.$

Continuing with this process, we also have e_wd−1≦1. In other words, we always have e_wd−≦1. From blocks 706-708, we have b′_d−1=1. When i=wd−1 in the loop of blocks 712-718, since b′_{wd−1 mod d}=b′_d−1=1, block 716 is executed, i.e.,
$e_{wd} = ⌊ \frac{e_{wd - 1}}{2} ⌋ \leq ⌊ \frac{1}{2} ⌋ = 0.$

Applying this result to Eq. 3 yields the desired result,
$k = \sum_{i = 0}^{wd - 1} b_{i}^{'} 2^{i} .$

The recoding method 700 of FIG. 7 works only with odd scalars. If a scalar k is an even integer, the method 800 of FIG. 8 may be utilized. In FIG. 8, we first calculate the point multiplication for the odd scalar k′=k+1, and then subtract P from the result to obtain the desired result kP.

FIG. 8 shows the operation of the signed odd-only comb method 800. At block 802, input including a point P and an integer k>0 is received. A pre-computational stage includes blocks 804-810. At block 804, pre-computational calculations are made, including computation of [b_w−1, . . . , b₂, b₁, 1]P for all (b_w−1, . . . , b₂, b₁)∈(Z₂)^w−. At block 806, if k is even then set k′32 k+1; otherwise set k′=k . (Note that we could also set k′=k−1 if k is even and set k′=k otherwise.) At block 808, method 700 of FIG. 7 is applied to k′ to compute the corresponding comb bit-columns custom character ′₀, ′₁, . . . ′_d−1. At block 810, the point Q is set initially at the point at infinity, i.e. set Q=. Blocks 812-818 constitute an evaluation stage. At block 812, a loop is entered, such as a for-loop with index i. The loop counts down from i=d−1 to 0 by −1. Within the loop: at block 814, set Q=2Q; and at block 816, set Q=Q+ custom character ′_iP. At block 818, the output in the form of Q=kP is returned, wherein if k is even then Q−P is returned, otherwise Q is returned. (Note that if k′=k−1, then we add P when k is even.)

If the least significant bit of custom character ′_iis 1, we have ′_i=−|′_i|. In this case, block 816 in method 800 actually executes Q=Q−|′_i|P.

In practical ECC applications, only elliptic curve points in a subgroup with a large prime order ρ are actually used. In this case, the signed odd-only comb method 800 can be modified to remove the post-processing block 818 by exploiting that facts that ρ−k is odd for even k, and [ρ−k]P=−kP. This modified method is described in FIG. 9.

FIG. 9 shows the operation of a signed odd-only comb method for a point of odd order. At block 902, input including a point P of odd order ρ and an integer k>0 are received. A pre-computational stage includes blocks 904-910. At block 904, pre-computational calculations are made. For example, [b₋₁, . . . , b₂, b₁, 1]P are computed for all (b_w−1, . . . , b₂, b_1)∈(Z₂)^w−1. At block 906, k′ is set. In particular, if k is odd, then set k′=k, else set k′=ρ−k. At block 908, method 700 of FIG. 7 is applied to k′, thereby computing the comb bit-columns custom character ′₀, ′₁, . . . , ′_d−1corresponding to k′. At block 910, the point Q is set initially at the point at infinity, i.e. set Q=. Blocks 912-916 constitute an evaluation stage. At block 912, a loop is entered, such as a for-loop with index i. The loop counts down from i=d−l to 0 by −1. Within the loop: at block 914, set Q=2Q; and at block 916, set Q=Q+(−1)^k+1 custom character ′_iP. At block 918, Q in the form Q=kP is returned. Note that in block 916 above, Q is set to Q+′_iP for odd k, or Q−′_iP for even k.

In the recoding method 700 of FIG. 7, d is defined as
$⌈ \frac{n + 1}{w} ⌉$

instead of
$⌈ \frac{n}{w} ⌉$

used in the original fixed base comb method 400 of FIG. 4. If n is indivisible by w, then d in our recoding method is exactly the same as the original comb method, i.e.,
$⌈ \frac{n + 1}{w} ⌉ = ⌈ \frac{n}{w} ⌉ .$

But if n is divisible by w, our method's d is one larger than the d used in method 400, i.e.,
$⌈ \frac{n + 1}{w} ⌉ = 1 + ⌈ \frac{n}{w} ⌉ .$

Increasing d by one would lead to w−1 additional doublings in the pre-computation stage, and one additional addition and one additional doubling in the evaluation stage. Any additional operations are undesirable. Fortunately, most of the additional operations when n is divisible by w can be eliminated by thoughtful manipulation as described in the following modified comb method.

FIG. 10 shows the operation of a signed odd-only comb method for n divisible by w. At block 1002, input including a point P and an n-bit integer k>0 is received. A pre-computation stage includes blocks 1004-1010. At block 1004, pre-computational calculations are made. For example, [b_w−1, . . . , b₂, b₁, 1]P are computed for all (b_w−1, . . . , b₂, b₁)∈(Z₂)^w−1. At block 1006, k mod 4 is evaluated, and k′ is set. In particular: if k mod 4=0, then set k′=k/2+1; if k mod 4=1, then set k′=┌k/2┐; if k mod 4=2, then set k′=k/2; and if k mod 4=3, then set k′=└k/2┘. At block 1008, method 700 is applied to k′ to compute the corresponding comb bit-columns custom character ′₀, ′₁, . . . ′_d−1. At block 1010, the point Q is set initially at the point at infinity, i.e. set Q=. An evaluation stage includes blocks 1012-1026. At block 1012, a loop is entered, such as a for-loop with index i. The loop counts down from i=d−1 to 0 by −1. Within the loop: at block 1014, set Q=2Q; and at block 1016, set Q=Q+ custom character ′_iP. At block 1018, if k mod 4=0, then set Q=Q−P. At block 1020, set Q=2Q. At block 1022, if k mod 4=1, then set Q=Q−P; and at block 1024, if k mod 4=3, then set Q=Q+P. At block 1026, return Q in the form Q=kP.

Due to block 1006, the value of d used in method 1000 is equivalent to
$⌈ \frac{n}{w} ⌉,$

the same as the original comb method 400. Compared with method 800, method 1000 saves w−1 doublings in the pre-computation stage when n is divisible by w. In this case, one addition is also saved in the evaluation stage if k mod 4=2. More performance comparison of various methods will be given later in this specification.

Methods 800 and 1000 are not SPA-resistant. Even though all the comb bit-columns { custom character ′_i} are nonzero in both methods, the value of the last bit of a scalar k is detectable in block 818 of FIG. 8 by SPA, and information of the last two bits of a scalar k may leak out from the steps following block 1016 in FIG. 10 by SPA. Since all _i≠0, the operations in the for loop for all the three comb methods 800, 900 and 1000 are a sequence of alternative point doubling and point addition, DADA . . . DADA, therefore do not leak any information about the secret scalar k to SPA. This implies that method 900 is a SPA-resistant comb method if we take every scalar k (or more specifically k′ in block 906 of the FIG. 9) as an integer of ┌log₂ρ┐ bits, where ρ is the order of the point P. That is a typical assumption in studying SPA-resistant methods. By inserting potential dummy operations after the for-loop, we can convert the above SPA-nonresistant methods to SPA-resistant methods. Method 800 can be modified to the following SPA-resistant comb method.

FIG. 11 shows operation of an SPA-resistant signed odd-only comb method 1100. At block 1102, input a point P and an integer k>0 is received. A pre-computation stage includes blocks 1104-1110. At block 1104, pre-computational calculations are made. For example, [b_w−1, . . . , b₂, b₁, 1]P are computed for all (b_w−1, . . . , b₂, b₁)∈(Z₂)^w−1. At block 1106, if k is even then set k′=k+1, else set k′=k+2. At block 1108, method 700 is applied to k′ to compute the corresponding comb bit-columns custom character ′₀, ′₁, . . . ′_d−1. At block 1110, the point Q is set initially at the point at infinity, i.e. set Q=. An evaluation stage includes blocks 1112-1120. At block 1112, a loop is entered, such as a for-loop with index i. The loop counts down from i=d−1 to 0 by −1. Within the loop: at block 1114, set Q=2Q; at block 1116, set Q=Q+ custom character ′_iP; and at block 1118, set P₂=2P. At block 1120, output in the form Q=kP is returned. In particular, if k is even then return Q−P else return Q−P₂.

As we mentioned previously, in studying SPA-resistant methods a scalar is considered as an integer of ┌log₂ρ┐ bits where ρ is the order of the point P. i.e., n=┌log₂ρ┐ in method 700 of FIG. 7. In this case, if ┌log₂ρ┐ is divisible by w, the value of d used in method 1100 of FIG. 11 is one larger than the d in the original comb method 400 of FIG. 4, resulting in higher computational complexity. The following SPA-resistant method 1200 does not increase d and therefore removes the increased computational complexity.

FIG. 12 shows the operation of an SPA-resistant signed odd-only comb method 1200 for n=┌log₂ρ┐ divisible by w. At block 1202, input including a point P of order ρ and an n-bit integer k>0 is received. A pre-computation stage includes blocks 1204-1212. At block 1204, pre-computational calculations are made. For example, [b_w−1, . . . , b₂, b₁, 1]P are computed for all (b_w−1, . . . , b₂, b₁)∈(Z₂)^w−1. At block 1206, if
$k > \frac{ρ}{2}$

then set k*=ρ−k, else set k*=k. At block 1208, if k* is even then set k′=k*+1, else set k′=k*+2. At block 1210, method 700 is applied to k′ to compute the corresponding comb bit-columns custom character ′₀, ′₁, ′_d−1. At block 1212, the point Q is set initially at the point at infinity, i.e. set Q=. An evaluation stage includes blocks 1214-1226. At block 1214, a loop is entered, such as a for-loop with index i. The loop counts down from i=d−1 to 0 by −1. Within the loop: at block 1216 set Q=2Q; and at block 1218, if
$k > \frac{ρ}{2}$

then set Q=Q− custom character ′_iP else set Q=Q+′_iP. At block 1220, set P₂=2P. At block 1222, if
$k > \frac{ρ}{2}$

then set Δ=1 else set Δ=−1. At block 1224, if k* is even set Q=Q+Δ P else set Q=Q+Δ P₂. At block 1226, Q is returned as output where Q=kP.

Note that in method 1200 of FIG. 12, block 1206 ensures k* less than or equal to ρ/2. Also, k′ should be an integer of ┌log₂ρ┐−1 bits to ensure that d used in the method is equal to
$\frac{⌈ \log_{2} ρ ⌉}{w} .$

If not, we can set k′ to k*−1 for even k* and k*−2 for odd k* in block 1206 to achieve the desired result. In this case, block 1222 sets Δ to −1 for
$k > \frac{ρ}{2}$

and 1 otherwise.

Security of Operation

The security of the disclosed point multiplication methods against power analysis is discussed in this section. Security against SPA is addressed first, followed by description of how to transform the disclosed methods to resist DPA, second-order DPA, and other side channel attacks.

The comb methods 900, 1100, and 1200 exploit the fact that point subtraction is virtually the same as point addition for power analysis. In addition, the disclosed comb methods perform one point addition (or point subtraction) and one doubling at each loop in calculating point multiplication. This means that the same sequences are executed for all scalar k. Therefore SPA cannot extract any information about the secret k by examine the power consumption of the executions. In other words, comb methods 900, 1100, and 1200 are really SPA-resistant.

However, SPA-resistant point multiplication methods are not necessarily resistant to DPA attacks. Randomization projective coordinates or random isomorphic curves can be used to transform the disclosed methods into DPA-resistant methods.

Second-order DPA attacks might still be successfully applied to comb methods 900, 1100, and 1200 when the just-mentioned randomization schemes are used. Such second order attacks exploit the correlation between the power consumption and the hamming weight of the loaded data to determine which custom character ′_iis loaded. To thwart such second-order DPA attacks, a scheme to randomize all pre-computed points after getting the point in the table may be used, so that there is no fixed hamming weight.

A recently proposed DPA attack is a refinement attack on many randomization schemes. This attack employs special points with one of coordinates being zero. Such a DPA attack may be addressed by simply choosing elliptic curves E: y²=x³+ax+b defined over F_p(p>3) wherein b is not a quadratic residue of modulo p, and to reject any point (x, 0) as an input point in applications of the disclosed scalar multiplication methods. If the cardinality #E(F_p) is a big prime number, points (x, 0) cannot be eligible input points since they are not on elliptic curves. Combining with the aforementioned randomization techniques and measures, the disclosed comb methods can thwart all power-analysis attacks.

Efficiency of Operation

The comb methods 800-1200 require storage of 2^w−1points. In the pre-computation stage of these comb methods, ₂^dP, 2^2dP, . . . , 2^(w−1)dP are first calculated. This costs (w−1)d point doublings. Then all possible combinations [b_w−1, . . . , b₂, b₁, 1]P with (b_w−1, . . , b₂, b₁)∈(Z₂)^w−1are calculated in the same way as the pre-computation stage for method 400 of FIG. 4, which costs 2^w−1−1 point additions. The total cost of the disclosed comb methods in the pre-computation stage is therefore {(w−1)d}D+{2^w−1−1}A. The time costs of the comb methods 800-1200 in the evaluation stage vary a little due to post-processing after the for-loop. Assume that the scalar k is randomly distributed, then the average cost in the evaluation stage is
$(d - 1) D + (d - \frac{1}{2}) A$

for method 800 of FIG. 8, (d−1)D+(d−1)A for method 900 of FIG. 9,
$dD + (d - \frac{1}{4}) A$

for method 1000 of FIG. 10, and dD+dA for both methods 1100 and 1200 of FIGS. 11 and 12.

A comparison can be made between the fixed-base comb methods 800, 900 and 1000 of FIGS. 8-10 with the original fixed-base comb method 400 of FIG. 4. Table 1 lists the space and time costs for those methods. Comb methods 800-1000 in Table 1 store 2^w−1, which is about half of the stored points of 2^w−1in the original comb method 400. In addition, comb methods 800-1000 save 2^w−1−w point additions than the original comb method 400 of FIG. 4 in the pre-computation stage. The evaluation stage has a similar time cost for all the four methods in Table 1. To maintain about the same storage space for pre-computed points, methods 800-1000 can be utilized in a manner such that the value of w is selected as w=w₁+1, one larger than the value w=w₁used in comb method 400. This results in similar storage (2^w¹vs. 2^w¹−1) as the comb method 400 yet faster computation in the evaluation stage, thanks to smaller d used in methods 800-1000.

A comparison can be made between the disclosed SPA-resistant comb methods 900, 1100, and 1200 of FIGS. 9, 11, and 12 with a modification of method 400 that is SPA-resistant. The space and time costs for those SPA-resistant methods are listed in Table 2. Again, SPA-resistant comb methods 900, 1100, and 1200 use about half of the storage for pre-computed points than a modified comb method 400, yet save 2^w−−w point additions in the pre-computation stage. Methods 1100 and 1200 have the same time cost as modified method 400 in the evaluation stage, while disclosed method 900 saves one point doubling and one point addition in this stage. From the data in Table 2, method 900 is the most efficient SPA-resistant comb method if n is not divisible by w. When n is divisible by w, method 1200 is recommended since
$⌈ \frac{n + 1}{w} ⌉$

is one larger than n/w in this case, resulting in higher time cost.

TABLE 1Comparison of space and average time costs for fixed-base comb method400 and fixed-base comb methods 800-1000.Method 400Method 800Method 900Method 1000d

⌈ \frac{n}{w} ⌉

⌈ \frac{n + 1}{w} ⌉

⌈ \frac{n + 1}{w} ⌉

\frac{n}{w} where w ❘ n

Storage2^w− 12^w− 12^w− 12^w− 1Pre-Comp.(w − 1)dD(w − 1)dD(w − 1)dD(w − 1)dDStage(2^w− w − 1)A(2^w−1− 1)A(2^w−1− 1)A(2^w−1− 1)AEvaluation Stage

\begin{matrix} (d - 1) D \\ \frac{2^{w} - 1}{2^{w}} (d - 1) A \end{matrix}

\begin{matrix} (d - 1) D \\ (d - \frac{1}{2}) A \end{matrix}

(d − 1)D (d − 1)A

\begin{matrix} dD \\ (d - \frac{1}{4}) A \end{matrix}

Total Cost

\begin{matrix} (wd - 1) D \\ (2^{w} - w - 1 + \frac{2^{w} - 1}{2^{w}} (d - 1)) A \end{matrix}

\begin{matrix} (wd - 1) D \\ (2^{w - 1} + d - \frac{3}{2}) A \end{matrix}

(wd − 1)D (2^w−1+ d − 2)A

\begin{matrix} (wd - 1) D \\ (2^{w - 1} + d - \frac{5}{4}) A \end{matrix}

A comparison of the SPA-resistant embodiments disclosed herein with other SPA-resistant point multiplication methods is instructive. The space and time costs for the two SPA-resistant m-ary scalar multiplication methods (one is an extended m-ary method that is SPA-resistant, and the other is an m-ary method using the recoding method 600 of FIG. 6) are listed in the second and third columns of Table 3. From Tables 2 and 3, the disclosed SPA-resistant comb methods 900, 1100, and 1200 of FIGS. 9, 11, and 12 store the same number of pre-computed points as the m-ary method using the recoding method 600, which is one point less than the other SPA-resistant m-ary method.

To make a point multiplication method resistant to SPA attacks, additional operations are needed to remove dependency of the cryptographic execution procedure on the specific value of k. This means that efficiency of a SPA-resistant method is lowered in general as compared to non-SPA-resistant method of a similar approach. The most efficient point multiplication method, if total cost in both pre-computation and evaluation stages combined is considered, is the signed m-ary window method that is not SPA-resistant. The space and time cost for the signed m-ary window method costs is listed in the first column of Table 3. This method requires fewer points to be stored yet runs faster than all SPA-resistant methods. The space and time penalty is of little consequence in many applications wherein security is the top priority. If only the evaluation stage is considered, the disclosed comb methods are faster than the signed m-ary window method.

TABLE 2Comparison of space and average time costs for SPA-resistant combmethods (n = ┌log₂ρ┐).ConventionalMethod 900Method 1100Method 1200SPA-resistant Combof FIG. 9of FIG. 11of FIG. 12d

⌈ \frac{n}{w} ⌉

⌈ \frac{n + 1}{w} ⌉

⌈ \frac{n + 1}{w} ⌉

⌈ \frac{n}{w} ⌉

Storage2^w− 12^w−12^w−12^w−1Pre-Comp.(w − 1) dD(w − 1) dD(w − 1) dD(w − 1) dDStage(2^w− w − 1)A(2^w−1− 1)A(2^w−1− 1)A(2^w−1− 1)AEvaluationdD(d − 1)DdDdDStagedA(d − 1)AdAdATotalwdD(wd − 1)DwdDwdDCost(2^w− w + d − 1)A(2^w−1+ d − 2)A(2^w−1+ d − 1)A(2^w−1+ d − 1)A

TABLE 3

Space and average time costs for non-SPA-resistant signed

m-ary method (first column) and two conventional SPA-resistant

m-ary methods (n = ┌log₂ρ┐).

SPA-resistant m-ary

A conventional
method using

Signed m-ary
SPA-resistant
recoding method

window
m-ary method
600 of FIG. 6

d

⌈ \frac{n}{w} ⌉

⌈ \frac{n}{w} ⌉

⌈ \frac{n}{w} ⌉

Storage
2^w−2
2^w−1+ 1
2^w−1

Pre
1D
(2^w−2+ 1)D
1D

Computa-
(2^w−2− 1)A
(2^w−2− 1)A
(2^w−1− 1)A

tion-

Stage

Evaluation Stage

\begin{matrix} (wd - w) D \\ (\frac{wd + 1}{w + 1} - 1) A \end{matrix}

(wd − w)D (d − 1)A
(wd − w + 1)D dA

Total Cost

\begin{matrix} (wd - w + 1) D \\ (2^{w - 2} + \frac{wd + 1}{w + 1} - 2) \end{matrix}

(2^w−2+ wd −w + 1)D (2^w−2+d − 2)A
(wd − w + 2)D (2^w−1+ d − 1)A

APPLICATION EXAMPLES

The methods 800-1200 can be used in many ECC applications. They are particularly efficient if pre-computation can be computed in advance or by somebody else, which is the case in many applications. This section describes a couple of such application scenarios. One example is a system wherein a smart card is used to provide a tamper-resistant and secure environment to store secrets and execute critical cryptographic operations, while a separate, more powerful computing subsystem, is responsible for other operations. Cellular phones and wireless application protocol (WAP) devices are typical examples of such a system. In a cellular phone, the Subscriber Identification Module (SIM) card is a smart card to store securely critical subscriber's information and authentication and encryption methods responsible for providing legitimate access to the wireless network. The phone's CPU, memory, and storage are responsible for other operations. A Wireless Identity Module (WIM) card plays a similar role in a WAP device. In such a system, it may be possible to delegate the pre-computation to the more powerful device's CPU while using the smart card to execute the evaluation stage, if the computed points by the device's CPU are observable but not tamperable. Note that pre-computation does not contain any secrets unless the point itself is a secret.

The Elliptic Curve Digital Signature Algorithm (ECDSA) is another example. ECDSA is the elliptic curve analogue of the Digital Signature Algorithm (DSA) specified in a U.S. government standard called the Digital Signature Standard. ECDSA has been accepted in many standards. It includes signature generation and verification. Let P be a publicly known elliptic curve point and ρ be the prime order of the point P. A signature is generated and verified in the following way: ECDSA Signature Generation: To sign a message m, an entity A associated with a key pair (d, Q) executes the following steps:

1. Select a random or pseudorandom integer k, such that 1≦k<ρ.
2. Compute kP=(x₁, y₁) and r=x₁mod ρ. If r=0, go to Step 1.
3. Compute k⁻¹mod ρ.
4. Compute e=SHA-1(m), where SHA-1 is a Secure Hash Algorithm (SHA) specified in the Secure Hash Standard.
5. Compute s=k⁻¹(e+dr) mod ρ. If s=0, go to Step 1.
6. The signature generated by A for the message m is (r, s).

ECDSA Signature Verification: To verify A's signature (r, s) on a message m, an entity B obtains A's public key Q and executes the following steps:
1. Verify that r and s are integers in the interval [1, ρ−1].
2. Compute e=SHA-1(m).
3. Compute w=s⁻¹mod ρ.
4. Compute u₁=ew mod ρ and u₂=rw mod ρ.
5. Compute X=u₁P+u₂Q=(x₁, y₁). If X=O, reject the signature. Otherwise, compute v=x₁mod ρ.
6. Accept the signature if and only if v=r.

For an ECDSA signature, P, ρ and r, s, e are public values. The scalar k has to be kept secret. Otherwise the private key d can be derived from the equation s=k⁻¹(e+dr) mod ρ. Therefore, the private key d and the ECDSA signature generation must be securely stored and executed. This can be conveniently achieved with a smart card, which stores P's pre-computed points and uses a point multiplication method resistant to power analysis to calculate kP. The herein disclosed SPA-resistant comb methods are ideal for this application, since only the evaluation stage is executed in generating a signature. ECDSA signature verification, on the other hand, does not use any secret key. In verifying an ECDSA signature, the herein disclosed SPA-nonresistant comb methods can be used to compute u₁P and the signed m-ary window method is used to compute u₂Q. This is an efficient combination of point multiplication methods since P's pre-computation points can be calculated in advance, and a comb method is more efficient than other methods in this case. As a contrast, Q's pre-computation points cannot be calculated in advance since the public key Q varies from one entity to another. The signed m-ary window method is appropriate in this case.

Exemplary Methods

FIG. 13 illustrates an example implementation 1300 showing operation of a recoding system, which could be adapted for use in elliptic curve point multiplication of an elliptic curve cryptosystem. The implementation 1300 is configured for use in many different computing environments, such as a smart card, cellular telephone, workstation or mainframe computing environment. At block 1302, an odd integer k is converted into a binary representation. The binary representation can be, for example, coefficients of powers of two. The conversion of block 1302 may be performed in a number of ways, such as the exemplary conversion discussed at blocks 1304-1308. At block 1304, each bit of a least significant d bits of the binary representation of the odd integer k are converted. The conversion is performed in a manner consistent with having a least significant bit in each bit-column that is either a 1 or a 1. Notice that this conversion could be performed in the manner disclosed at blocks 704-708 in FIG. 7. At block 1306, bits of the binary representation having significance greater than d are converted, such that a resulting bit at an i^thposition is either a 0 or a signed 1, wherein the signed 1 is signed according to a bit at an (i mod d)^thposition. The conversion of block 1306 can be performed in a variety of different manners, such as that shown in blocks 712-718 of FIG. 7 or at bock 1308. At block 1308, a sign of a current bit is compared to a sign of a least significant bit in a same comb bit-column. Where the signs are different and the current bit is a 1, the current bit is set to 1 and a value of a next higher significant bit is incremented by 1 to maintain k's value. At block 1310, the binary representation is configured as comb bit-columns, wherein every bit-columns is a signed odd integer. More specifically, the implementation 1300 could generate K′_i⁰∈ {1, 1} and K′_i^j∈ {0,K′_i⁰}, j≠0 for each comb bit-column custom character ′_i≡[K′_i^w−1, . . . , K′_i¹, K′_i⁰], where 1 is defined as −1.

FIG. 14 illustrates an example of operation 1400 of an elliptic curve point multiplication system. At block 1402, a point P and an integer k are received, wherein k is an even or odd integer. At block 1404, k′ is set based at least in part on a value of k, wherein k′ is an odd integer. For example, at block 806 of FIG. 8, k′ is set based on k's status as even or odd. At block 1406, the odd integer k′ is converted into a binary representation comprising bit-columns, wherein every bit-column is a signed odd integer. For example, this could be performed according to the method 700 of FIG. 7. Accordingly, the operation 1406 may produce K′_i⁰∈ {1, 1} and K′_i^j∈ {0,K′_i⁰}, j≠0 for each comb bit-column custom character ′_i≡[K′_i^w−1, . . . , K′_i¹, K′_i⁰], where 1 could be defined as −1. At block 1408, the comb bit-columns and the point P are processed to calculate a value Q, wherein Q may be set differently for different values of k. For example, blocks 818, 916, 1018, 1022-1024, 1120 and 1222-1224 all set Q based in part on k. At block 1410, Q is output, wherein Q represents kP.

Any of the methods described herein may be performed on a smart card, computer system or any digital processing device and/or component. For example, the methods may be performed by execution of instructions defined on a processor- and/or computer-readable medium. A “processor-readable medium,” as used herein, can be any means that can contain or store instructions for use by or execution by a processor, whether on a smart card, computer or any computing or calculating device and/or component. A processor-readable medium can be, without limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or component. More specific examples of a processor-readable medium include, among others, memory components on a smart card, a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable-read-only memory (EPROM or Flash memory), optical and/or magnetic memory, a rewritable compact disc (CD-RW), and a portable compact disc read-only memory (CDROM).

While one or more methods have been disclosed by means of flow diagrams and text associated with the blocks of the flow diagrams, it is to be understood that the blocks do not necessarily have to be performed in the order in which they were presented, and that an alternative order may result in similar advantages. Furthermore, the methods are not exclusive and can be performed alone or in combination with one another.

Exemplary Computing Environment

FIG. 15 illustrates an exemplary computing environment suitable for implementing an elliptic curve cryptosystem. Although one specific configuration is shown, any computing environment could be substituted to meet the needs of any particular application. The computing environment 1500 includes a general-purpose computing system in the form of a computer 1502. The components of computer 1502 can include, but are not limited to, one or more processors or processing units 1504, a system memory 1506, and a system bus 1508 that couples various system components including the processor 1504 to the system memory 1506. The system bus 1508 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a Peripheral Component Interconnect (PCI) bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.

Computer 1502 typically includes a variety of computer readable media. Such media can be any available media that is accessible by computer 1502 and includes both volatile and non-volatile media, removable and non-removable media. The system memory 1506 includes computer readable media in the form of volatile memory, such as random access memory (RAM) 1510, and/or non-volatile memory, such as read only memory (ROM) 1512. A basic input/output system (BIOS) 1514, containing the basic routines that help to transfer information between elements within computer 1502, such as during start-up, is stored in ROM 1512. RAM 1510 typically contains data and/or program modules that are immediately accessible to and/or presently operated on by the processing unit 1504.

Computer 1502 can also include other removable/non-removable, volatile/non-volatile computer storage media. By way of example, FIG. 15 illustrates a hard disk drive 1516 for reading from and writing to a non-removable, non-volatile magnetic media (not shown), a magnetic disk drive 1518 for reading from and writing to a removable, non-volatile magnetic disk 1520 (e.g., a “floppy disk”), and an optical disk drive 1522 for reading from and/or writing to a removable, non-volatile optical disk 1524 such as a CD-ROM, DVD-ROM, or other optical media. The hard disk drive 1516, magnetic disk drive 1518, and optical disk drive 1522 are each connected to the system bus 1508 by one or more data media interfaces 1525. Alternatively, the hard disk drive 1516, magnetic disk drive 1518, and optical disk drive 1522 can be connected to the system bus 1508 by a SCSI interface (not shown).

The disk drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules, and other data for computer 1502. Although the example illustrates a hard disk 1516, a removable magnetic disk 1520, and a removable optical disk 1524, it is to be appreciated that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like, can also be utilized to implement the exemplary computing system and environment.

Any number of program modules can be stored on the hard disk 1516, magnetic disk 1520, optical disk 1524, ROM 1512, and/or RAM 1510, including by way of example, an operating system 1526, one or more application programs 1528, other program modules 1530, and program data 1532. Each of such operating system 1526, one or more application programs 1528, other program modules 1530, and program data 1532 (or some combination thereof) may include an embodiment of a caching scheme for user network access information.

Computer 1502 can include a variety of computer/processor readable media identified as communication media. Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also included within the scope of computer readable media.

A user can enter commands and information into computer system 1502 via input devices such as a keyboard 1534 and a pointing device 1536 (e.g., a “mouse”). Other input and/or peripheral devices 1538 (not shown specifically) may include a smart card and/or smart card reader, microphone, joystick, game pad, satellite dish, serial port, scanner, and/or the like. These and other input devices are connected to the processing unit 1504 via input/output interfaces 1540 that are coupled to the system bus 1508, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).

A monitor 1542 or other type of display device can also be connected to the system bus 1508 via an interface, such as a video adapter 1544. In addition to the monitor 1542, other output peripheral devices can include components such as speakers (not shown) and a printer 1546 that can be connected to computer 1502 via the input/output interfaces 1540.

Computer 1502 can operate in a networked environment using logical connections to one or more remote computers, such as a remote computing device 1548. By way of example, the remote computing device 1548 can be a personal computer, portable computer, a server, a router, a network computer, a peer device or other common network node, and the like. The remote computing device 1548 is illustrated as a portable computer that can include many or all of the elements and features described herein relative to computer system 1502.

Logical connections between computer 1502 and the remote computer 1548 are depicted as a local area network (LAN) 1550 and a general wide area network (WAN) 1552. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet. When implemented in a LAN networking environment, the computer 1502 is connected to a local network 1550 via a network interface or adapter 1554. When implemented in a WAN networking environment, the computer 1502 typically includes a modem 1556 or other means for establishing communications over the wide network 1552. The modem 1556, which can be internal or external to computer 1502, can be connected to the system bus 1508 via the input/output interfaces 1540 or other appropriate mechanisms. It is to be appreciated that the illustrated network connections are exemplary and that other means of establishing communication link(s) between the computers 1502 and 1548 can be employed.

In a networked environment, such as that illustrated with computing environment 1500, program modules depicted relative to the computer 1502, or portions thereof, may be stored in a remote memory storage device. By way of example, remote application programs 1558 reside on a memory device of remote computer 1548. For purposes of illustration, application programs and other executable program components, such as the operating system, are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computer system 1502, and are executed by the data processor(s) of the computer.

A cellular phone 1560 having an internal smart card 1562 provides a further environment wherein the comb methods 800-1400 could be utilized. The cellular phone may be configured to communicate with the computer 1502, which may be associated with a cellular telephone carrier or service. In such an environment, the ECC methods could be used, for example, to establish the authenticity of the cell phone 1560 as a legitimate customer of the cellular service provider, in this case operating the computer 1502.

CONCLUSION

A novel signed odd-only comb recoding method 700 of FIG. 7 converts comb bit-columns of the comb's sequence for an odd integer to a signed odd-only nonzero representation. Using this recoding method, several novel non-SPA-resistant and SPA-resistant comb methods 800-1200 of FIGS. 8-12 can be configured to calculate point multiplication for elliptic curve cryptosystems. The disclosed comb methods store fewer points and run faster than the known non-SPA-resistant and SPA-resistant comb methods. In addition, the disclosed signed odd-only comb methods inherit a comb method's advantage—running much faster than window methods and non-SPA-resistant signed m-ary window methods, when pre-computation points are calculated in advance or elsewhere. Accordingly, the signed odd-only comb methods 800-1200 of FIGS. 8-12 are well suited for use in smart cards and embedded systems where power and computing resources are at a premium. When combined with randomization techniques and certain precautions in selecting elliptic curves and parameters, the SPA-resistant comb methods can thwart all side-channel attacks.

In closing, although aspects of this disclosure include language specifically describing structural and/or methodological features of preferred embodiments, it is to be understood that the appended claims are not limited to the specific features or acts described. Rather, the specific features and acts are disclosed only as exemplary implementations, and are representative of more general concepts. Furthermore, a number of features were described herein by first identifying exemplary problems that these features can address. This manner of explication does not constitute an admission that others have appreciated and/or articulated the problems in the manner specified herein. Appreciation and articulation of the problems present in the relevant arts are to be understood as part of the present invention. More specifically, there is no admission herein that the features described in the Background section of this disclosure constitute prior art. Further, the subject matter set forth in the Summary section and the Abstract of this disclosure do not limit the subject matter set forth in the claims.

Elliptic curve point multiplication

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims