System, method and apparatus for multiplying large numbers in a single iteration using graphs

BACKGROUND

1. Field

The embodiments relate to processing operands, and in particular to a method, apparatus and system for processing large operands.

2. Description of the Related Art

The Karatsuba algorithm (A. Karatsuba and Y. Ofman, Multiplication of Multidigit Numbers on Automata, Soviet Physics-Doklady, 7 (1963), pages 595-596) was proposed in 1962 as an attempt to reduce the number of scalar multiplications required for computing the product of two large numbers. The classic algorithm accepts as input two polynomials of degree equal to 1, i.e., a(x)=a₁x+a₀and b(x)=b₁x+b₀and computes their product a(x)b(x)=a₁b₁x²+(a₁b₀+a₀b₁)x+a₀b₀using three scalar multiplications. This technique is different from the naïve (also called the ‘schoolbook’) way of multiplying polynomials a(x) and b(x) which is to perform 4 scalar multiplications, i.e., find the products a₀b₀, a₀b₁, a₁b₀and a₁b₁.

Karatsuba showed that you only need to do three scalar multiplications, i.e., you only need to find the products a₁b₁, (a₁+a₀)(b₁+b₀) and a₀b₀. The missing coefficient (a₁b₀+a₀b₁) can be computed as the difference (a₁+a₀)(b₁+b₀)−a₀b₀−a₁b₁once scalar multiplications are performed. For operands of a larger size, the Karatsuba algorithm is applied recursively.

Karatsuba is not only applicable to polynomials but, also large numbers. Large numbers can be converted to polynomials by substituting any power of 2 with the variable x. One of the most important open problems associated with using Karatsuba is how to apply the algorithm to large numbers without having to lose processing time due to recursion. There are three reasons why recursion is not desirable. First, recursive Karatsuba processes interleave dependent additions with multiplications. As a result, recursive Karatsuba processes cannot take full advantage of any hardware-level parallelism supported by a processor architecture or chipset. Second, because of recursion, intermediate scalar terms produced by recursive Karatsuba need more than one processor word to be represented. Hence, a single scalar multiplication or addition requires more than one processor operation to be realized. Such overhead is significant. Third, recursive Karatsuba incurs the function call overhead.

Cetin Koc et. al. from Oregon Sate University (S. S. Erdem and C. K. Koc. “A less recursive variant of Karatsuba-Ofman algorithm for multiplying operands of size a power of two”, Proceedings, 16th IEEE Symposium on Computer Arithmetic, J.-C. Bajard and M. Schulte, editors, pages 28-35, IEEE Computer Society Press, Santiago de Compostela, Spain, Jun. 15-18, 2003) describes a less recursive variant of Karatsuba where the size of the input operands needs to be a power of 2. This variant, however, still requires recursive invocations and only applies to operands of a particular size.

BRIEF DESCRIPTION OF THE DRAWINGS

The embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1 illustrates flow of an embodiment of a process illustrating a 4 by 4 example;

FIG. 2 illustrates examples of complete graphs;

FIG. 3 illustrates examples of graph isomorphism;

FIG. 4 illustrates graph representations of an embodiment for an 18 by 18 example;

FIG. 5 illustrates a representation of a spanning plane of an embodiment using a local index sequence notation;

FIG. 6 illustrates a representation of spanning planes of an embodiment using a semi-local index sequence and global index notations;

FIG. 7 illustrates an alternative representation of a spanning plane;

FIG. 8 illustrates another example of a 9 by 9 spanning plane;

FIG. 9 illustrates an embodiment representation of edge to spanning edge, and spanning plane mapping;

FIG. 10 illustrates a graphical representation of subtraction generation of an embodiment;

FIG. 11A-B illustrate a block diagram of an embodiment;

FIG. 12 illustrates comparison of prior art processes with an embodiment; and

FIG. 13 illustrates an embodiment of an apparatus in a system.

DETAILED DESCRIPTION

The embodiments discussed herein generally relate to an apparatus, system and method for processing large numbers/operands. Referring to the figures, exemplary embodiments will now be described. The exemplary embodiments are provided to illustrate the embodiments and should not be construed as limiting the scope of the embodiments.

FIG. 1 illustrates an example of generating the terms of a 4 by 4 product using graphs using an embodiment. As illustrated in FIG. 1 the input operands are of size 4 words. In other embodiments, the operand size is the native operand size of a machine, such as a computing device (e.g., a computer). The operands are the polynomials a(x)=a₃x³+a₂x²+a₁x+a₀and b(x)=b₃x³+b₂x²+b₁x+b₀. Because of the fact that the input operand size is 4 the embodiment builds a complete square. The vertices of the square are indexed 0, 1, 2, and 3 as illustrated in FIG. 1. The complete square is constructed in a first part of a process of an embodiment (see FIG. 11A). In a second part of a process of an embodiment, a set of complete sub-graphs are selected and each sub-graph is mapped to a scalar product (see FIG. 11B).

A complete sub-graph connecting vertices i₀, i₂, . . . , i_m−1is mapped to the scalar product (a_i₀+a_i₁+ . . . +a_i_m−1)·(b_i₀+b_i₁+ . . . +b_i_m−1). The complete sub-graphs selected in the example illustrated in FIG. 1 are the vertices 0, 1, 2 and 3, the edges 0-1, 2-3, 0-2 and 1-3, and the entire square 0-1-2-3. The scalar products defined in the second part of the process are a₀b₀, a₁b₁, a₂b₂, a₃b₃, (a₀+a₁)(b₀+b₁), (a₂+a₃)(b₂+b₃), (a₀+a₂)(b₀+b₂), (a₁+a₃)(b₁+b₃), and (a₀+a₁+a₂+a₃)(b₀+b₁+b₂+b₃). In the last part of the process a number of subtractions are performed (see FIG. 11B, 1165).

As an example, the edges 0-1 and 2-3 (with their adjacent vertices), and 0-2 and 1-3 (without their adjacent vertices) are subtracted from the complete square 0-1-2-3. What remains is the diagonals 0-3 and 1-2. These diagonals correspond to the term a₁b₂+a₂b₁+a₃b₀+a₀b₃, which is the coefficient of x³of the result. In one embodiment the differences produced by the subtractions of sets of formulae represent diagonals of complete graphs where the number of vertices in these graphs is a power of 2 (i.e., squares, cubes, hyper-cubes, etc.). The terms that result from the subtractions, if added to one another, create the coefficients of the final product.

To explain in more detail, the following definitions are first noted. N represents the size of the input (i.e., the number of terms in each input polynomial). N is the product of L integers n₀, n₁, . . . , n_L-1. The number L represents the number of levels of multiplication.

N=n₀·n₁· . . . ·n_L-1 Eq. 1

For L levels, where a ‘level’ defines a set of complete graphs, the set of graphs of level l is represented as G^(l). The cardinality of the set G^(l)is represented as |G^(l)|. The i-th element of the set G^(l)is represented as G_i^(l). Each set of graphs G^(l)has a finite number of elements. The cardinality of the set G^(l)is defined as:

$\begin{matrix} \langle G^{(l)} \rangle = {\begin{matrix} \prod_{i = 0}^{l - 1} n_{i}, & l > 0 \\ 1, & l = 0 \end{matrix} & Eq . 2 \end{matrix}$

Each element of the set G^(l)is isomorphic to a complete graph K_n_i. The formal definition of the set of graphs G^(l)is illustrated in Eq. 3:

G^(l)={G_i^(l): iε[0,|G^(l)|−1], G_i^(l)≅K_n_i} Eq. 3

A complete graph K_ais a graph consisting of a vertices indexed 0, 1, 2, . . . , a-1, where each vertex is connected with each other vertex of the graph with an edge. FIG. 2 illustrates examples of complete graphs. Two graphs A and B are called isomorphic if there exists a vertex mapping functions and an edge mapping function ƒ_esuch that for every edge e of A the function ƒ_vmaps the endpoints of e to the endpoints of ƒ_e(e). Both the edge ƒ_e(e) and it endpoints belong to graph B. FIG. 3 illustrates an example of two isomorphic graphs.

In one embodiment an element of the set G^(l)can be indexed in two ways. One way is by using a unique index i which can take all possible values between 0 and |G^(l)|−1, where the cardinality |G^(l)| is given by Eq. 2. Such an element is represented as G_i^(l). This way of representing graphs is denoted as a ‘global index’. That is, the index used for representing a graph at a particular level is called global index.

Another way to index the element G_i^(l)is by using a set of l indexes i₀, i₁, . . . i_l−1, with l>0. This type of index sequence is denoted as a ‘local index’ sequence. In the trivial case where l=0, the local index sequence consists of one index only, which is equal to zero. The local indexes i₀, i₁, . . . , i_l−1are related with the global index i of a particular element G_i^(l)in a manner illustrated in Eq. 4.

i=(((i₀·n₁)+i₁)·n₂+i₂)·n₃+ . . . +i_l−1 Eq. 4

Eq. 4 can also be written in closed form as:

$\begin{matrix} i = i_{0} \cdot n_{1} \cdot n_{2} \cdot \dots \cdot n_{l - 1} + i_{1} \cdot n_{2} \cdot \dots \cdot n_{l - 1} + \dots + i_{l - 2} \cdot n_{l - 1} + i_{l - 1} = \sum_{j = 0}^{l - 1} (i_{j} \cdot \prod_{k = j + 1}^{l - 1} n_{k}) & Eq . 5 \end{matrix}$

The local indexes i₀, i₁, . . . , i_l−1satisfy the following inequalities:

0≦i₀≦n₀−1
0≦i₁≦n₁−1
. . . 0≦i_l−1≦n_l−1 Eq. 6

In one embodiment the value of a global index i related to a local index sequence i₀, i₁, . . . , i_l−1is between 0 and |G^(l)| if inequalities (6) hold and the cardinality |G^(l)| is given by (2). This is proved by the following: from Eq. 4 it can be seen that i is a non-decreasing function of i₀, i₁, . . . , i_l−1. Therefore, the smallest value of i is produced by setting each local index equal to zero. Therefore, the smallest i is zero. The highest value of i is obtained by setting each local index i₀, i₁, . . . , i_l−1to be equal to its maximum value. Substituting each local index i_jwith n_j−1 for 0≦j≦l−1 results in:

$\begin{matrix} i_{\max} = (n_{0} - 1) \cdot n_{1} \cdot n_{2} \cdot \dots \cdot n_{l - 1} + (n_{1} - 1) \cdot n_{2} \cdot \dots \cdot n_{l - 1} + \dots + n_{l - 1} - 1 = n_{0} \cdot n_{1} \cdot n_{2} \cdot \dots \cdot n_{l - 1} - n_{1} \cdot n_{2} \cdot n_{3} \cdot \dots \cdot n_{l - 1} + n_{1} \cdot n_{2} \cdot n_{3} \cdot \dots \cdot n_{l - 1} - n_{2} \cdot n_{3} \cdot n_{4} \cdot \dots \cdot n_{l - 1} + n_{2} \cdot n_{3} \cdot n_{4} \cdot \dots \cdot n_{l - 1} - n_{3} \cdot n_{4} \cdot n_{5} \cdot \dots \cdot n_{l - 1} + \dots - n_{l - 1} + n_{l - 1} - 1 = n_{0} \cdot n_{1} \cdot n_{2} \cdot \dots \cdot n_{l - 1} - 1 = \langle G^{(l)} \rangle - 1 & Eq . 7 \end{matrix}$

In one embodiment for each global index i between 0 and |G^(l)|−1 there exists a unique sequence of local indexes i₀, i₁, . . . , i_l−1satisfying Eq. 5 and the inequalities in Eq. 6. This is proved by the following: to prove that for a global index i such that 0≦i≦|G^(l)|−1 there exists at least one sequence of local indexes i₀, i₁, . . . , i_l−1satisfying Eq. 5 and Eq. 6, in one embodiment, the following pseudo code represents the construction of such a sequence of local indexes:

LOCAL_INDEXES (i)

1. for j \leftarrow 0 to l - 1

2. do if j + 1 \leq l - 1

3. then

4. i_{j} \leftarrow i div \prod_{k = j + 1}^{l - 1} n_{k}

5. i \leftarrow i \mod \prod_{k = j + 1}^{l - 1} n_{k}

6. else

7. i_{j} \leftarrow i \mod n_{l - 1}

8. return {i_{0}, i_{1}, \dots, i_{l - 1}}

It can be seen that the local index sequence i₀, i₁, . . . , i_l−1produced by the LOCAL_INDEXES satisfies both Eq. 5 and the inequalities in Eq. 6. Therefore, the existence of a local index sequence associated with a global index is proven.

To prove the uniqueness of the local index sequence, it is noted that if two sequences i₀, i₁, . . . , i_l−1and i₀′, i₁′, . . . i_l−1′, satisfy Eq. 5 and Eq. 6, then it is not possible for some index q, 0≦q≦l−1, to have i_q′≠i_q. Assume the opposite, i.e., that there are m indexes q₀, q₁, . . . , q_m−1such that i_q₀′≠i_q₀, i_q₁′≠i_q₁, . . . , i_q_m−1. Also assume that that for all other indexes the sequences i₀, i₁, . . . , i_l−1and i₀′, i₁′, . . . , i_l−1′ are identical. Since both sequences satisfy Eq. 5 the following identity is true:

(i_q₀−i_q₀′)·n_q₀₊₁· . . . ·n_l−1+(i_q₁−i_q₁′)·n_q₁₊₁· . . . ·n_l−1+ . . . +(i_q_m−1−i_q_m−1′)·n_q_m−₊₁· . . . ·n_l−1=0 Eq. 8

Without loss of generality, assume that q₀<q₁< . . . <q_m−1. The number (i_q₀−i_q₀′)·n_q₀₊₁· . . . ·n_l−1is clearly a multiple of n_q₀₊₁· . . . ·n_l−1. The addition of the term (i_q₁−i_q₁′)·n_q₁₊₁· . . . ·n_l−1to this number is not possible to make the sum (i_q₀−i_q₀′)·n_q₀₊₁· . . . ·n_l−1+(i_q₁−i_q₁′)·n_q₁₊₁· . . . ·n_l−1equal to zero since |i_q₁−i_q₁′|≦n_q₁−1<n_q₁≦n_q₀₊₁· . . . ·n_q₁. The same can be said about the addition of all other terms up to (i_q_m−1−i_q_m−1′)·n_q_m−1₊₁· . . . ·n_l−1. As a result, it is not possible for Eq. 8 to hold. Therefore, the uniqueness of the local index sequence is proven.

The following notation is used to represent a graph associated with global index i and local index sequence i₀, i₁, . . . , i_l−1
G_i^(l)=G_i₀_)(i₁_{) . . . (i}_l−1₎^(l) Eq. 9

Consider the graph G_i^(l)(or G_i₀_)(i₁_{) . . . (i}_l−1₎^(l)) of level l. This graph is by definition isomorphic to K_n₁. This means that this graph consists of n_lvertices and n_l·(n_l−1)/2 edges, where each vertex is connected to every other vertex with an edge. The set V_i^(l)(or V_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)) is defined as the set of all vertices of the graph G_i^(l)(or G_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)). In one embodiment three alternative ways are used to represent the vertices of a graph. One way is using the local index sequence notation. The i_l-th vertex of a graph G_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)is represented as V_(i₀_)(i₁_{) . . . (i}_l−1₎^(l), where 0≦i_l≦n_l−1. Using the local index sequence notation, the set of all vertices of a graph G_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)is defined as:

V_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)={v_(i₀_)(i₁_{) . . . (i}_l−1₎^(l): 0≦i_l≦n_l−1} Eq. 10

A second way to represent the vertices of a graph is using a ‘semi-local’ index sequence notation. In one embodiment a semi-local index sequence consists of a global index of a graph and a local index associated with a vertex. Using the semi-local index sequence notation, the i_l-th vertex of a graph G_i^(l)is represented as v_i,i_l^(l), where 0≦i_l≦n_l−1. In this way, the set of all vertices of a graph G_i^(l)is defined as:

V_i^(l)={v_i,i_l^(l): 0≦i_l≦n_l−1} Eq. 11

In one embodiment, for each vertex v_i,i_l^(l)a unique global index i_g←i·n_l+i_lis assigned. It is shown that 0≦i_g≦|G^(l+1)|−1 and for every semi-local index sequence i, i_lthere exists a unique global index i_gsuch that i_g=i·n_l+i_l; also for every global index i_gthere exists a unique semi-local index sequence i, i_lsuch that i_g=i·n_l+i_l.

Substituting i with

$\sum_{j = 0}^{l - 1} (i_{j} \cdot \prod_{k = j + 1}^{l - 1} n_{k})$

according to Eq. 5, the global index i_gof a vertex is associated with a local index sequence i₀, i₁, . . . , i_l−1, i_l. The indexes i₀, i₁, . . . . , i_l−1characterize the graph that contains the vertex whereas the index i_lcharacterizes the vertex itself. The relationship between i_gand i₀, i₁, . . . , i_l−1, i_lis given in Eq. 12:

$\begin{matrix} i_{g} = \sum_{j = 0}^{l} (i_{j} \cdot \prod_{k = j + 1}^{l} n_{k}) & Eq . 12 \end{matrix}$

In one embodiment a global index i_gassociated with some vertex of a graph at level l has an one-to-one correspondence to a unique sequence of local indexes i₀, i₁, . . . , i_l−1, i_lsatisfying identity (12), the inequalities (6) and 0≦i_l≦n_l−1.

Using the global index notation, the set of all vertices of a graph G_i^(l)(or G_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)) is defined as:

$\begin{matrix} V_{i}^{(l)} = {v_{i_{g}}^{(l)} : i_{g} = i \cdot n_{l} + i_{l}, 0 \leq i_{l} \leq n_{l} - 1} or & Eq . 13 \\ V_{(i_{0}) (i_{1}) \dots (i_{l - 1})}^{(l)} = {v_{i_{g}}^{(l)} : i_{g} = \sum_{j = 0}^{l} (i_{j} \cdot \prod_{k = j + 1}^{l} n_{k}), 0 \leq i_{l} \leq n_{l} - 1} & Eq . 14 \end{matrix}$

The edge which connects two vertices v_j^(l)and v_k^(l)of a graph at level l is represented as e_j-k^(l). If two vertices v_i,i_l^(l)and v_i,i_g_′^(l)are represented using the semi-local index sequence notation, the edge which connects these two vertices is represented as e_i,i_l_−i,i_l_′^(l). Finally, if two vertices v_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)and v_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)are represented using the local index sequence notation, the edge which connects these two vertices is represented as e_(i₀_)(i₁_{) . . . (i}_l−1_)(i_l_−(i₀_)(i_l−1_)(i_l_′)^(l)x. The set of all edges of a graph G_i^(l)(or G_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)) is represented as E_i^(l)(or E_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)). This set is formally defined as:

E_(i₀_)(i₁_{) . . . (i}_l−1₎^(l)={e_(i₀_)(i₁_{) . . . (i}_l−1_)(i_l_)−(i₀_)(i_l_{) . . . (i}_l−1_)(i₁_′)^(l): 0≦i_l≦n_l−1, 0 ≦i_l′≦n_l−1, i_l≠i_l′} Eq. 15
or
E_i^(l)={e_i,i_l_−i,i_l_′^(l): 0≦i_l≦n_l−1, 0≦i_l′≦n_l−1, i_l≠i_l′} Eq. 16
or
E_i^(l)={e_i_g_−i_g_′^(l): i_g=i·n_l+i_l, i_g′=i·n_l+i_l′, 0≦i_l≦n_l−1, 0≦i_l′≦n_l−1, i_l≠i_l′} Eq. 17

In one embodiment, the notation used for edges between vertices of different graphs of the same level is the same as the notation used for edges between vertices of the same graph. For example, an edge connecting two vertices v_(i₀_)(i₁_{) . . . (i}_l−1_)(i_l₎^(l)and v_(i₀_′)(i₁_{′) . . . (i}_l−1_′)(i_l_′)^(l), which are represented using the local index sequence notation is denoted as e_(i₀_)(i₁_{) . . . (i}_l−1)_)(i_l_)−(i₀_′)(i_l_{′) . . . (i}_l−1_′)(i_l_′)^(l).

In one embodiment alternative notations for the sets of vertices and edges of a graph G are V(G) and E(G) respectively. In addition, the term ‘simple’ from graph theory is used to refer to graphs, vertices and edges associated with the last level L−1. The graphs, vertices and edges of all other levels l, l<L−1 are referred to as ‘generalized’. The level associated with a particular graph G, vertex v or edge e is denoted as l(G), l(v) or l(e) respectively.

A vertex to graph mapping function ƒ^v→Gis defined as a function that accepts as input a vertex of a graph at a particular level l, l<L−1 and returns a graph at a next level l+1 that is associated with the same global index or local index sequence as the input vertex.

ƒ^v→g(v_i,i_l^(l))=G_n_l_·i+i_l^(l+1) Eq. 18

Alternative definitions of the function ƒ^v→gare:

ƒ^v→g(v_i^(l))=G_i^(l+1) Eq. 19
and
ƒ^v→g(v_(i₀_)(i_l_{) . . . (i}_l−1_)(i_l₎^(l))=G_(i₀_)(i₁_{) . . . (i}_l−1_)(i₁₎^(l+1) Eq. 20

Similarly, a graph to vertex mapping function ƒ_g→vis defined as a function that accepts as input a graph at a particular level l, l>0 and returns a vertex at a previous level l−1 that is associated with the same global index or local index sequence as the input graph.

ƒ^g→v(G_i^(l)=v_└i/n_l−1_{┘, i mod n}_l−1^(l-1) Eq. 21

Alternative definitions of the function ƒ^g→vare:

ƒ^g→v(G_i^(l)=v_i^(l−1) Eq. 22
and
ƒ^g→v(G_(i₀_)(i₁_{) . . . (i}_l−1₎^(l))=v_(i₀_)(i_l_{) . . . (i}_l−1₎ Eq. 23

The significance of the vertex to graph and graph to vertex mapping functions lies on the fact that they allow us to represent pictorially all graphs of all levels defined for a particular operand input size. First, each vertex of a graph is represented as a circle. Second, inside each circle, a graph is drawn at the next level, which maps to the vertex represented by the circle. As an example, FIG. 4 illustrates how the graphs are drawn defined for an 18 by 18 multiplication.

In the example illustrated in FIG. 4, N=18. N can be written as the product of three factors, i.e., 2, 3 and 3. Setting the number of levels L to be equal to 3 and n₀=2, n₁=n₂=3, the graphs are drawn for all levels associated with the multiplication as shown in FIG. 4. It can be seen that the vertices of the graphs at the last level do not contain any other graphs. This is the reason they are called ‘simple’. It can also be seen that each vertex at a particular level contains as many sets of graphs as the number of levels below. This is the reason why sets of graphs are referred to as ‘levels’.

In one embodiment the term ‘spanning’ is overloaded from graph theory. The term spanning is used to refer to edges or collections of edges that connect vertices of different graphs at a particular level.

A spanning plane is defined as a graph resulting from the join ‘+’ operation between two sub-graphs of two different graphs of the same level. Each of the two sub-graphs consists of a single edge connecting two vertices. Such two sub-graphs are described below:

{{v_(i₀_)(i₁_{) . . . (i}_l−1_)(i_l₎^(l), v_(i₀_)(i₁_{) . . . (i}_l−1_)(î_l₎^(l)}, e_(i₀_)(i₁_{) . . . (i}_l−1_)(i_l₎₋_(i₀_)(i₁_{) . . . (i}_l−1_)(î_l₎}, and
{{v_(i₀_′)(i₁_{′) . . . (i}_l−1_′)(i_l_′)^(l), v_(i₀_′)(i₁_{′) . . . (i}_l−1_′)(î_l_′)^(l)}, e_(i₀_′)(i₁_{′) . . . (i}_l−1_′)(i_l_′)−_(i₀_′)(i₁_{′) . . . (i}_l−1_′)(î_l_′)} Eq. 24

In addition, the local index sequences characterizing the two edges which are joined for producing a spanning plane need to satisfy the following conditions:

i₀=i₀′, i₁=i₁′, . . . , i_q≠_q′, . . . , i_l=i_l′, î_l=î_l′ Eq. 25

Eq. 25 can be also written in closed form as follows:

(∃q, qε[0, l−1]: i_q≠i_q′)^(∀jε[0, l], j≠q: i_j=i_j′)^(î_l=î_l′) Eq. 26

Eq. 25 or Eq. 26 indicate that all corresponding local indexes of the joined edges in a spanning plane are identical apart from the indexes in a position q, where 0≦q≦l−1. Since i_q≠i_q′, this means that the two edges that are joined to form a spanning plane are associated with different graphs. In the special case where q=l−1, the two graphs containing the joined edges of a spanning plane map to vertices of the same graph at level l−1, since i₀=i₀′, i₁=i_l′, . . . , i_l−2=i_l−2′.

The join operation ‘+’ between two graphs is defined as a new graph consisting of the two operands of ‘+’ plus new edges connecting every vertex of the first operand to every vertex of the second operand. A spanning plane produced by joining the two sub-graphs of Eq. 24 with Eq. 26 holding and q=l−1 is illustrated in FIG. 5. As illustrated in FIG. 5, vertices and edges are represented using the local index sequence notation.

Using the local index sequence notation, a spanning plane can be formally defined as:

s_(i₀_)(i_l_{) . . . (i}_q_−i_q_{′) . . . (i}_l−1_)(i_l_−î_l₎^p(l)={{v_(i₀_{) . . . (i}_q_{) . . . (i}_l−1_)(i_l₎^(l), v_(i₀_{) . . . (i}_q_{) . . . (i}_l−1_)(î_l₎^(l)}, e_(i₀_{) . . . (i}_q_{) . . . (i}_l−1_)(i_l_)−(i₀_{) . . . (i}_q_{) . . . (i}_l−1_)(î_l₎^(l)}+{{v_(i₀_{) . . . (i}_q_{′) . . . (i}_l−1_)(i_l₎^(l), v_(i₀_{) . . . (i}_q_{′) . . . (i}_l−1_)(î_l₎^(l)}, e_(i₀_{) . . . (i}_q_{′) . . . (i}_l−1_)(i_l_)−(i₀_{) . . . (i}_q_{′) . . . (i}_l−1_)(î_l₎^(l)} Eq. 27

Since the local index sequence notation is lengthy, the shorter ‘semi-local’ index sequence notation is used for representing a spanning plane:

s_i,i_l_−i,î_l_−i′,i_l_−i′,î_l^p(l)={{_i,i_l^(l)}, e_i,i_l_−i,î_l^(l)}+{{v_i′,i_l^(l), v_i′,î_l^(l)}, e_i′,i_l_−i′.î_l^(l)} EQ. 28

In the definition of Eq. 28 above, the value of the index i is given by identity Eq. 5 and:

i′=i₀·n₁·n₂· . . . ·n_l−1+i₁·n₂· . . . ·n_l−1+ . . . +i_q′·n_q+1· . . . +i_l−2·n_l−1+i_l−1 Eq. 29

In one embodiment global index notation is used for representing a spanning plane. Using the global index notation, a spanning plane is defined as:

s_i_g_−î_g_−i_g′−î_g_′^p(l)={{v_i_g^(l), v_î_g^(l)}, e_i_g_−î_g^(l)}+{{v_i_g_′^(l), v_î_g_′^(l), e_i_g_′−î_g_′^(l)} Eq. 30

In the Eq. 30 notation above:

i_g=i·n_l+i_l, î_g=i·n_l+î_l, i_g′=i′·n_l+i_l, î_g′=i′·n_l+î_l Eq. 31

The index i in identity (31) is given by identity (5) whereas the index i′ in (31) is given by identity (29). A pictorial representation of spanning planes using the semi-local index sequence and global index notations is given in FIG. 6.

In another embodiment, an alternative pictorial representation of a spanning plane used as illustrated in FIG. 7. The vertices shown in FIG. 7 are represented using the global index notation. The level of the vertices is omitted for simplicity.

An example of a spanning plane is illustrated in FIG. 8. The example shows the graphs built for an 9-by-9 multiplication and the global indexes of all simple vertices. The example also shows the spanning plane defined by the edges e_l−2⁽¹⁾and e_4-5⁽¹⁾.

A spanning edge is an edge that connects two vertices v_(i₀_)(i₁_{) . . . (i}_l−1_)(i_l₎^(l)and v_(i₀_′)(i₁_{′) . . . (i}_l−1_′)(i_l_′)^(l)of different graphs of the same level. The local index sequences i₀, i₁, . . . , i_land i₀′, i₁′, . . . , i_l′ which describe the two vertices need to satisfy the following conditions:

i₀=i₀′, i₁=i₁′, . . . , i_q≠i_q′, . . . , i_l=i_l′ Eq. 32

or (in closed form):

(∃q, qε[0, l−1]: i_q≠i_q′)^(∀jε[0, l], j≠q: i_j=i_j′) Eq. 33

From the conditions in Eq. 33 it is evident that a spanning edge connects vertices with the same last local index (i, =i_l′). Second, the vertices which are endpoints of a spanning edge are associated with different graphs of G^(l)since i_q≠i_q′. Third, in the special case where q=l−1, the two graphs containing the endpoints of a spanning edge map to vertices of the same graph at level l−1, since

i₀=i₀′, i_l=i₁′, . . . , i_l−2=i_l−2′.

A spanning edge can be represented formally using the local index sequence notation as follows:

s_(i₀_)(i_l_{) . . . (i}_q_−i_q_{′) . . . (i}_l₎^e(l)={v_(i₀_)(i_l_{) . . . (i}_q_{) . . . (i}_l₎^(l)}+{v_(i₀_)(i_l_{) . . . (i}_q_{) . . . (i}_l₎^(l)={{v_(i₀_)(i_l_{) . . . (i}_q_{) . . . (i}_l₎^(l), v_(i₀_)(i_l_{) . . . (i}_q_{′) . . . (i}_l₎^(l)}, e_(i₀_)(i_l_{) . . . (i}_q_{) . . . (i}_l_)−(i₀_)(i_l_{) . . . (i}_q_{′) . . . (i}_l₎^(l)} Eq. 34

A spanning edge can be also represented formally using the semi-local index sequence notation:

s_i_g_−i_g_′^e(l)={v_i_g^(l)}+{v_i_g_′^(l)}={{v_i_g^(l), v_i_g_′^(l)}, e_i_g_−i_g_′^(l)} Eq. 35

In the definition in Eq. 35, the value of the index i is given by identity shown in Eq. 5 and:

i′=i₀·n₁·n₂· . . . ·n_l−1+i_l·n₂· . . . ·n_l−1+ . . . +i_q′·n_q+1· . . . ·n_l−1+ . . . +i_l−2·n_l−1+i_l−1 Eq. 36

In another embodiment a third way to represent a spanning edge is by using the global index notation:

s_i_g_−i_g_′^e(l)={v_i_g^(l)}+{v_i_g_′^(l)}={{v_i_g^(l), v_i_g_′^(l)}, e_i_g_−i_g_′^(l)} Eq. 37

To further aid in understanding, a set of mappings defined between edges, spanning edges and spanning planes are introduced. In what follows the term ‘corresponding’ is used to refer to vertices of different graphs of the same level that are associated with the same last local index. Two edges of different graphs of the same level are called ‘corresponding’ if they are connecting corresponding endpoints.

A generalized edge (i.e., an edge of a graph G_i^(l), 0≦l≦L−1) or a spanning edge can map to a set of spanning edges and spanning planes through a mapping function ƒ^e→s. The function ƒ_e→saccepts as input an edge (if it is a spanning edge, the endpoints are excluded) and returns the set of all possible spanning edges and spanning planes that can be considered between the corresponding vertices and edges of the graphs that map to the endpoints of the input edge through the function ƒ_v→g.

Before the ƒ^e→smapping is described formally an example is introduced. In the example illustrated in FIG. 9, the generalized edge e (its level and indexes are omitted for simplicity) connects two vertices that map to the triangles 0-1-2 and 3-4-5. This mapping is done through the function ƒ^v→g. Edge e maps to three spanning edges and three spanning planes as shown in FIG. 9 through the function ƒ^e→s. The spanning edges are those connecting the vertices with global indexes 0 and 3, 1 and 4, and 2 and 5 respectively. The spanning planes are those which are produced by the join operation between edges 0-1 and 3-4, 0-2 and 3-5, and 1-2 and 4-5 respectively.

Using the local index sequence location the function ƒ^e→scan be formally defined as:

ƒ^e→s(e_(i₀_{) . . . (i}_q_{) . . . (i}_l−1_)(i_l_)−(i₀_{) . . . (i}_q_{′) . . . (i}_l−1_)(i_l₎^(l))={s_(i₀_{) . . . (i}_q_−i_q_{′) . . . (i}_l−1_)(i_l_)(j)^e(l+1): 0≦j≦n_l+1−1}∪{s_(i₀_{) . . . (i}_q_−i_q_{′) . . . (i}_l−1_)(i_l_)(j−k)^p(l+1: 0≦j≦n_l+1−1, 0≦k≦n_l+1−1, j≠k} Eq. 38

In the definition in Eq. 38 the index position q takes all possible values from the set [0, l].

The mapping ƒ^e→s^eis defined between edges and spanning edges only and the mapping ƒ^e→s^pis defined between edges and spanning planes only.

ƒ^e→s^e(e_(i₀_{) . . . (i}_q_{) . . . (i}_l−1_)(i_l_)−(i₀_{) . . . (i}_q_{′) . . . (i}_l−1_)(i_l₎^(l))={s_(i₀_{) . . . (i}_q_−i_q_{′) . . . (i}_l−1_)(i_l_)(j)^e(l+1): 0≦j≦n_l+1−1} Eq. 39
and
ƒ^e→s^p(e_(i₀_{) . . . (i}_q_{) . . . (i}_l−1_)(i_l_)−(i₀_{) . . . (i}_q_{′) . . . (i}_l−1_)(i_l₎^(l))={s_(i₀_{) . . . (i}_q_−i_q_{′) . . . (i}_l−1_)(i_l_)(j−k)^p(l+1): 0≦j≦n_l+1−1, 0≦k≦n_l+1−1, j≠k}) Eq. 40

The definitions in Eq. 39 and Eq. 40 the index position q takes all possible values from the set [0, l].

In one embodiment mappings between sets of vertices and products are defined. The inputs to a multiplication process of an embodiment are the polynomials a(x) b(x) of degree N−1:

a(x)=a_N−1·x^N−1+a_N−2·x^N−2+ . . . +a₁·x+a₀,
b(x)=b_N−1·x^N−1+b_N−2·x^N−2+ . . . +b₁·x+b₀ Eq. 41

In one embodiment the coefficients of the polynomials a(x) and b(x) are real or complex numbers. In other embodiments the coefficients of the polynomials a(x) and b(x) are elements of a finite field.

The set V of m vertices are defined as:

V={v_i₀, v_i₁, . . . , v_i_m−1} Eq. 42

The elements of V are described using the global index notation and their level is omitted for the sake of simplicity. Three mappings P(V), P₁(V) and P₂(V) are defined between the set V and products as follows:

P(V)=(a_i₀+a_i₁+ . . . +a_i_m−1)·(b_i₀+b_i₁+ . . . +b_i_m−1) Eq. 43
P₁(V)={a_i_q·b_i_q: 0≦q≦m−1} Eq. 44
P₂(V)={(a_i+a_j)·(b_i+b_j): i,jε{i₀, i₁, . . . , i_m−1}, i≠j} Eq. 45

The product generation process accepts as input two polynomials of degree N−1 as shown in Eq. 41. The degree N of the polynomials can be factorized as shown in Eq. 1. The product generation process of an embodiment is the first stage of a two step process which generates a Karatsuba-like multiplication routine that computes c(x)=a(x) b(x). Since the polynomials a(x) and b(x) are of degree N−1, the polynomial c(x) must be of degree 2N−2. The polynomial c(x) is represented as:

c(x)=c_2N−2·x^2N−2+c_2N−3·x^2N−3+ . . . +c₁·x+c₀ Eq. 46

Where

$\begin{matrix} c_{i} = {\begin{matrix} \sum_{j = 0}^{i} a_{j} \cdot b_{i - j}, & if i \in [0, N - 1] \\ \sum_{j = i - N + 1}^{N - 1} a_{j} \cdot b_{i - j} & if i \in [N, 2 N - 2] \end{matrix} & Eq . 47 \end{matrix}$

The expression in Eq. 47 can be also written as:

c₀=a₀·b₀
c₁=a₀·b₁+a₁·b₀
. . .
c_N−1=a_N−1·b₀+a_N−2·b₁+ . . . +a₀·b_N−1
c_N=a_N−1·b₁+a_N−2·b₂+ . . . +a₁·b_N−1
. . .
c_2N−2=a_N−1·b_N−1 Eq. 48

Our framework produces a multiplication process that computes all coefficients c₀, c₁, . . . , c_2N−2. At the preprocessing stage, the product generation process generates all graphs G_i^(l)for every level l, 0≦l≦L−1. The generation of products is realized by executing a product creation process of an embodiment, shown in pseudo code as CREATE_PRODUCTS:

CREATE_PRODUCTS( )

1. P^a← Ø

2. for i ← 0 to |G^(L−1)|−1

3. do P^a← P^a∪ P₁(V(G_i^(L−1)))

4. P^a← P^a∪ P₂(V(G_i^(L−1)))

5. GENERALIZED_EDGE_PROCESS( )

6. return P^a

The process GENERALIZED_EDGE_PROCESS of an embodiment is described below in pseudo code.

GENERALIZED_EDGE_PROCESS( )

1.
for l ← 0 to L−2

2.
do for i ← 0 to |G^(l)|−1

3.
do for j ← 0 to n_l−1

4.
do for k ← 0 to n_l−1

5.
do if j = k

6.
then

7.
continue

8.
else

9.
S₁← f^e→s^e(e_i,j−i,k^(l))

10.
S₂← f^e→s^p(e_i,j−i,k^(l))

11.
if l+1=L−1

12.
then

13.
for every s ∈S₁∪S₂

14.
do P^a← P^a∪ P(V(s))

15.
else

16.
for every s ∈ S₁

17.
do SPANNING_EDGE_PROCESS(s)

18.
for every s ∈ S₂

19.
do SPANNING_PLANE_PROCESS(s)

20.
return

A shown above, the process GENERALIZED_EDGE_PROCESS( ) processes each generalized edge from the set G^(l)one-by-one. If the level of a generalized edge is less than L−2, then the procedure GENERALIZED_EDGE_PROCESS( ) invokes two other processes for processing the spanning edges and spanning planes associated with the generalized edge. The first of the two, SPANNING_EDGE_PROCESS( ), is shown below in pseudo code:

SPANNING_EDGE_PROCESS(s)

1.
l ← l(s)

2.
S₁← f^e→s^e(s)

3.
S₂← f^e→s^p(s)

4.
if l+1= L−1

5.
then

6.
for every s′ ∈ S₁∪ S₂

7.
do P^a← P^a∪ P(V(s′))

8.
else

9.
for every s′ ∈ S₁

10.
do SPANNING_EDGE_PROCESS(s′)

11.
for every s′ ∈ S₂

12.
do SPANNING_PLANE_PROCESS(s′)

13.
return

The second process, SPANNING_PLANE_PROCESS( ), is shown below in pseudo code:

SPANNING_PLANE_PROCESS(s)

1.
l ← l(s)

2.
if l= L−1

3.
then

4.
P^a← P^a∪ P(V(s))

5.
else

6.
V ← { V(s) }

7.
while l < L−1

8.
do V ← EXPAND_VERTEX_SETS( V)

9.
l ← l+1

10.
for every ν′ ∈ V

11.
do P^a← P^a∪ P(ν′)

12.
return

In one embodiment the process EXPAND_VERTEX_SETS( ) is shown below in pseudo code. The notation g(v) is used to refer to the global index of a vertex v.

EXPAND_VERTEX_SETS( V)

1.
V_r← Ø

2.
for every V′ ∈ V

3.
do V_r←

V_r∪ EXPAND_SINGLE_VERTEX_SET(V′)

4.
return V_r

EXPAND_SINGLE_VERTEX_SET(V )

1.
V_r← Ø

2.
let ν ∈ V

3.
l ← l(ν)

4.
for p ← 0 to n_l+1−1

5.
do for q ← 0 to n_l+1−1

6.
do if p = q

7.
then

8.
continue

9.
else

10.
U_pq← Ø

11.
for i ← 0 to |V |−1

12.
do let ν_i← the i-th element of V

13.
g_i← g(ν_i)

14.
U_pq← U_pq∪ {ν_gi,p^(l+1)} ∪ {ν_gi,p^(l+1)}

15.
V_r← V_r∪U_pq

16.
for q ← 0 to n_l+1−1

17.
do U_q← Ø

18.
for i ← 0 to |V |−1

19.
do let ν_i← the i-th element of V

20.
g_i← g(ν_i)

21.
U_q← U_q∪ {ν_gi,q^(l+1)}

22.
V_r← V_r∪U_q

23.
return V_r

In one embodiment for all simple graphs, the products associated with simple vertices and simple edges are determined and these products are added to the set P^a. This occurs in lines 3 and 4 of the process CREATE_PRODUCTS( ). Second, for all generalized edges at each level, one embodiment does the following: first, each generalized edge is decomposed into its associated spanning edges and spanning planes. This occurs in lines 9 and 10 of the process GENERALIZED_EDGE_PROCESS( ).

To find products associated with each spanning edge, it is determined if a spanning edge connects simple vertices. If it does, the process computes the product associated with the spanning edge from the global indexes of the endpoints of the edge. This occurs in line 14 of the process GENERALIZED_EDGE_PROCESS( ). If a spanning edge does not connect simple vertices, this spanning edge is further decomposed into its associated spanning edges and spanning planes. This occurs in lines 2 and 3 of the process SPANNING_EDGE_PROCESS( ). For each resulting spanning edge that is not at the last level the process SPANNING_EDGE_PROCESS( )is performed recursively. This occurs in line 10 of the process SPANNING_EDGE_PROCESS( ).

To find products associated with each spanning plane, it is determined if the vertices of a spanning plane are simple or not. If they are simple, the product associated with the global indexes of the plane's vertices is formed and it is added to the set P^a(line 14 of the process GENERALIZED_EDGE_PROCESS( )). If the vertices of a plane are not simple, then the process expands these generalized vertices into graphs and creates sets of corresponding vertices and edge endpoints. This occurs in lines 14 and 21 of the process EXPAND_SINGLE_VERTEX_SET( ). For each such set the expansion is performed down to the last level. This occurs in lines 7-9 of the process SPANNING_PLANE_PROCESS( ).

There are four types of products created. The first type includes all products created from simple vertices. The set of such products P₁^ais:

P₁^a=P({v_(i₀_)(i₁_{) . . . (i}_L−2_)(i_L−1₎^(L+1)}): i_jε[o,n_j−1]∀jε[0,L−1]} Eq. 49

A second type of products includes those products formed by the endpoints of simple edges. The set of such products P₂^ais:

P₂^a=P({v_(i₀_)(i₁_{) . . . (i}_L−2_)(i_L−1₎^(L+1), v_(i₀_)(i₁_{) . . . (i}_L−2_)(î_L−1₎^(L+1)}): i_jε[o,n_j−1]∀jε[0, L−1], î_lε[0,n_L−1−1],i_l≠î₁} Eq. 50

A third type of products includes all products formed by endpoints of spanning edges. These spanning edges result from recursive spanning edge decomposition down to the last level L−1. The set of such products P₃^ahas the following form:

P₃^a={P({v_(i₀_)(i_l_{) . . . (i}_q_{) . . . (i}_L−1₎^(L−1), v_(i₀_)(i_l_{) . . . (i}_q_{′) . . . (i}_L−1₎^(L−1)}): i_jε[o,n_j−1]∀jε[0,L−1], i_q′ε[0,n_q−1], qε[0,L−2],i_q≠i_q′} Eq. 51

A fourth type of products includes those products formed from spanning planes after successive vertex set expansions have taken place. One can show by induction that this set of products P₄^ahas the following form:

P^a={P({v_(i₀_{) . . . (i}_q0_{) . . . (i}_q1_{) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{′) . . . (i}_q1_{) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{) . . . (i}_q1_{′) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{′) . . . (i}_q1_{′) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), . . . v_(i_q0_{) . . . (i}_q0_{′) . . . (i}_q1_{′) . . . (i}_qm−1_{′) . . . (i}_L−1₎^(L−1)}): i_jε[o,n_j−1]∀jε[0,L−1], (i_q_kε[0,n_q_k−1]^i_q_k≠i_q_k′)∀kε[0,m−1], 0≦q₀≦q₁≦ . . . ≦q_m−1, mε[2, L]} Eq. 52

The set P₄^aconsists of all products formed from sets of vertices characterized by identical local indexes apart from those indexes at some index positions q₀, q₁, . . . , q_m−1. For these index positions vertices take all possible different values from among the pairs of local indexes: (i_q0, i_q0′), (i_q1, i_q1′) , . . . , (i_q_m−1, i_q_m−′). All possible 2^mlocal index sequences formed this way are included into the specification of the products of the set P₄^a. The number of index positions m for which vertices differ needs to be greater than, or equal to 2. The structure of the set P₄^ais very similar to the structure of the set of all products generated by our process

$P^{a} = \underset{i = 1}{⋃^{4}} P_{i}^{a} .$

The set P^aof all products generated by executing the process CREATE_PRODUCTS is given by the expression in Eq. 53 below.

The expression in Eq. 53 is identical to Eq. 52 with one exception: The number of index positions m for which vertices differ may also take the values 0 and 1. The set P^aresults from the union of P₁^a, P₂^a, P₃^aand P₄^a. It can be seen that by adding the elements of P₁^ainto P₄^aone covers the case for which m=0. By further adding the elements of P₂^aand P₃^ainto P₄^aalso covers the case for which m=1.

P^a={P({v_(i₀_{) . . . (i}_q0_{) . . . (i}_q1_{) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{′) . . . (i}_q1_{) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{) . . . (i}_q1_{′) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{′) . . . (i}_q1_{′) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), . . . v_(i_q0_{) . . . (i}_q0_{′) . . . (i}_q1_{′) . . . (i}_qm−1_{′) . . . (i}_L−1₎^(L−1)}): i_jε[0,n_j−1]∀jε[0,L−1], (i_q_kε[0,n_q_k−1]^i_q_k≠i_q_k′)∀kε[0,m−1], 0≦q₀≦q₁≦ . . . ≦q_m−1, mε[0,L]} Eq. 53

The expression in Eq. 53 is in a closed form that can be used for generating the products without performing spanning plane and spanning edge decomposition. In one embodiment all local index sequences defined in Eq. 53 are generated and form the products associated with these local index sequences. Spanning edges and spanning planes offer a graphical interpretation of the product generation process and can help with visualizing product generation for small operand sizes (e.g., N=9 or N=18).

The number of elements in the set P^agenerated by executing the process CREATE_PRODUCTS is equal to the number of scalar multiplications performed by generalized recursive Karatsuba for the same operand size N, and factors n₀, n₁, . . . , n_L−1such that N=n₀·n₁· . . . ·n_l−1.

This is true because the number of scalar multiplications performed by generalized recursive Karatsuba as defined by Paar and Weimerskirch is:

$\begin{matrix} \langle P^{r} \rangle = \frac{n_{0} \cdot (n_{0} + 1)}{2} \cdot \frac{n_{1} \cdot (n_{1} + 1)}{2} \cdot \dots \cdot \frac{n_{L - 1} \cdot (n_{L - 1} + 1)}{2} = \frac{\prod_{i = 0}^{L - 1} n_{i} \cdot (n_{i} + 1)}{2^{L}} & Eq . 54 \end{matrix}$

In Eq. 49-52 the sets P₁^a, P₂^a, P₃^aand P₄^ado not contain any common elements. Therefore, the cardinality |P^a| of the set P^ais given by:

$\begin{matrix} \langle P^{a} \rangle = \sum_{i = 1}^{4} \langle p_{i}^{a} \rangle & Eq . 55 \end{matrix}$

The set P₁^acontains all products formed by sets which contain a single vertex only. Each single vertex is characterized by some arbitrary local index sequence. Hence the cardinality |P₁^a| of the set P₁^ais given by:

$\begin{matrix} \langle P_{1}^{a} \rangle = n_{0} \cdot n_{1} \cdot \dots \cdot n_{L - 1} = \prod_{i = 0}^{L - 1} n_{i} & Eq . 56 \end{matrix}$

The set P₂^acontains products formed by sets which contain two vertices. These vertices are characterized by identical local indexes for all index positions apart from the last one L−1. Since the number of all possible pairs of distinct values that can be considered from 0 to n_L−1is n_L−1·(n_L−1−1)/2, the cardinality of the set P₂^ais equal to:

$\begin{matrix} \langle P_{2}^{a} \rangle = \frac{n_{0} \cdot n_{1} \cdot \dots \cdot n_{L - 1} \cdot (n_{L - 1})}{2} = (\prod_{i = 0}^{L - 1} n_{i}) \cdot \frac{(n_{L - 1} - 1)}{2} & Eq . 57 \end{matrix}$

The set P₃^acontains products formed by sets which contain two vertices as well. The products of the set P₃^aare formed differently from P₂^a, however. The vertices that form the products of P₃^aare characterized by identical local indexes for all index positions apart from one position between 0 and L−2. Since the number of all possible pairs of local index values the can be considered for an index position j is n_j·(n_j−1)/2, the cardinality of the set P₃^ais equal to:

$\begin{matrix} \langle P_{3}^{a} \rangle = \frac{n_{0} \cdot (n_{0} - 1)}{2} \cdot n_{1} \cdot n_{2} \cdot \dots \cdot n_{L - 1} + n_{0} \cdot \frac{n_{1} \cdot (n_{1} - 1)}{2} \cdot n_{2} \cdot \dots \cdot n_{L - 1} + \dots + n_{0} \cdot n_{1} \cdot n_{2} \cdot \dots \cdot \frac{n_{L - 2} \cdot (n_{L - 2} - 1)}{2} \cdot n_{L - 1} = (\prod_{i = 0}^{L - 1} n_{i}) \cdot \sum_{i = 0}^{L - 2} \frac{n_{i} - 1}{2} & Eq . 58 \end{matrix}$

Finally, the set P₄^ais characterized by the expression in Eq. 52. The cardinality of the set P₄^ais equal to:

$\begin{matrix} \langle P_{4}^{a} \rangle = \frac{n_{0} \cdot (n_{0} - 1)}{2} \cdot \frac{n_{1} \cdot (n_{1} - 1)}{2} \cdot n_{2} \cdot n_{3} \cdot \dots \cdot n_{L - 1} + n_{0} \cdot \frac{n_{1} \cdot (n_{1} - 1)}{2} \cdot \frac{n_{2} \cdot (n_{2} - 1)}{2} \cdot n_{3} \cdot \dots \cdot n_{L - 1} + \dots + n_{0} \cdot n_{1} \cdot \dots \cdot \frac{n_{L - 2} \cdot (n_{L - 2} - 1)}{2} \cdot \frac{n_{L - 1} \cdot (n_{L - 1} - 1)}{2} + \frac{n_{0} \cdot (n_{0} - 1)}{2} \cdot \frac{n_{1} \cdot (n_{1} - 1)}{2} \cdot \frac{n_{2} \cdot (n_{2} - 1)}{2} \cdot n_{3} \cdot n_{4} \cdot \dots \cdot n_{L - 1} + \frac{n_{0} \cdot (n_{0} - 1)}{2} \cdot \frac{n_{1} \cdot (n_{1} - 1)}{2} \cdot n_{2} \cdot \frac{n_{3} \cdot (n_{3} - 1)}{2} \cdot n_{4} \cdot \dots \cdot n_{L - 1} + \dots + n_{0} \cdot n_{1} \cdot \dots \cdot \frac{n_{L - 3} \cdot (n_{L - 3} - 1)}{2} \cdot \frac{n_{L - 2} \cdot (n_{L - 2} - 1)}{2} \cdot \frac{n_{L - 1} \cdot (n_{L - 1} - 1)}{2} + \dots + \frac{n_{0} \cdot (n_{0} - 1)}{2} \cdot \frac{n_{1} \cdot (n_{1} - 1)}{2} \cdot \dots \cdot \frac{n_{L - 1} \cdot (n_{L - 1} - 1)}{2} & Eq . 59 \end{matrix}$

Summing up the cardinalities of the sets P₁^a, P₂^a, P₃^aand P₄^a:

$\begin{matrix} \langle P^{a} \rangle = \sum_{i = 1}^{4} \langle P_{i}^{a} \rangle = \frac{n_{0} \cdot n_{1} \cdot \dots \cdot n_{L - 1}}{2^{L}} \cdot [2^{L} + 2^{L - 1} \cdot [(n_{0} - 1) + (n_{1} - 1) + \dots + (n_{L - 1} - 1)] 2^{L - 2} \cdot [(n_{0} - 1) \cdot (n_{1} - 1) + (n_{0} - 1) \cdot (n_{2} - 1) + \dots + (n_{L - 2} - 1) \cdot (n_{L - 1} - 1)] + \dots + (n_{0} - 1) \cdot (n_{1} - 1) \cdot \dots \cdot (n_{L - 1} - 1)] & Eq . 60 \end{matrix}$

To prove that |P^r|=P^a| the identity that follows is used:

(a₀+k)·(a₁+k) · . . . ·(a_m−1+k)=k^m+k^m−1·(a₀+a₁+ . . . +a_m−1)+k^m−2·(a₀·a₁+a₀·a₂+ . . . +a_m−2·a_m−1)+ . . . +a₀·a₁· . . . ·a_m−1 Eq. 61

By substituting a_iwith (n_i−1), m with L, and k with 2 in Eq. 60 and by combining Eq. 60 and Eq. 61 results in Eq. 62:

$\begin{matrix} \begin{matrix} \langle P^{a} \rangle = \frac{n_{0} \cdot n_{1} \cdot \dots \cdot n_{L - 1}}{2^{L}} \cdot (n_{0} - 1 + 2) \cdot \\ (n_{1} - 1 + 2) \cdot \dots \cdot (n_{L - 1} - 1 + 2) \\ = \frac{\prod_{i = 0}^{L - 1} n_{i} \cdot (n_{i} + 1)}{2^{L}} = \langle P^{r} \rangle \end{matrix} & Eq . 62 \end{matrix}$

Therefore, it is proven that the number of products generated by an embodiment process is equal to the number of multiplication performed by using a generalized recursive Karatsuba process. It should be noted that the number of products generated by an embodiment process is substantially smaller than the number of scalar multiplication performed by the one-iteration Karatsuba solution of Paar and Weimerskirch (A. Weimerskirch and C. Paar, “Generalizations of the Karatsuba Algorithm for Efficient Implementations”, Technical Report, University of Ruhr, Bochum, Germany, 2003), which is N·(N+1)/2.

A typical product p from the set P^ais

p=P({v_(i₀_{) . . . (i}_q0_{) . . . (i}_q1_{) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{′) . . . (i}_q1_{) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{) . . . (i}_q1_{′) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{′) . . . (i}_q1_{′) . . . (i}_qm−1_{) . . . (i}_L−1₎^(L−1), v_(i₀_{) . . . (i}_q0_{′) . . . (i}_q1_{′) . . . (i}_qm−1_{′) . . . (i}_L−1₎^(L−1)}): i_jε[o,n_j−1]∀jε[0,L−1], (i_q_k′ε[0,n_q_k−1]^i_q_k≠i_q_k′)∀kε[0,m−1], 0≦q₀≦q₁≦ . . . ≦q_m−1, mε[0,L] Eq. 63

For the product p, a ‘surface’ in the m-k dimensions (0≦k≦m) associated with ‘free’ index positions q_f₀,q_f₁, . . . , q_f_m−k−1, ‘occupied’ index positions q_p₀, q_p_{, . . . , q}_p_k−1, and indexes for the occupied positions î_q_P0, î_q_p1, . . . , î_q_pk−1is defined as the product that derives from p by setting the local indexes of all vertices of p to be equal to î_q_P0, î_q_p1, . . . , î_q_pk−1at the occupied index positions, and by allowing the indexes at the free positions to take any value between i_q_f0and i_q_f0′, i_q_f1and i_q_q1′, . . . , and i_d_fm−k−1and i_q_fm−k−1′.

The sets of the free and occupied index positions satisfy the following conditions:

{q_f₀, q_f₁, . . . , q_f_m−k−1}⊂{q₀, q₁, q_m−1},
{q_p₀, q_p₁, . . . , q_p_k−1}⊂{q₀, q₁, q_m−1},
{q_f₀, q_f₁, . . . , q_f_m−k−1}∩{q_p₀, q_p₁, q_p_k−1}=Ø
{q_f₀, q_f₁, . . . , q_f_m−k−1}∪{q_p₀, q_p₁, q_p_k−1}={q₀, q₁, q_m−1} Eq. 64

In addition the indexes for the occupied positions

${\overset{⋒}{i}}_{q_{p 0}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{pk - 1}}$

satisfy:

î_q_p0ε{i_q_p0, i_q_p0}, î_q_p1ε{i_q_p1, i_q_p1′}, . . . , î_q_pk−1ε{i_q_qk−1, i_q_pk−1′} Eq. 65

Such surface is denoted as

From the definition of Eq. 66 is it evident that a surface

$u_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}^{p; m - k; {\overset{⋒}{i}}_{q_{p_{0}}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{p_{k - 1}}}}$

associated with a product p is also an element of the set P^aand is generated by the procedure CREATE_PRODUCTS. From the definition in Eq. 66 is it is also evident that whereas p is formed by a set of 2^mvertices, the surface

$u_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}^{p; m - k; {\overset{⋒}{i}}_{q_{p_{0}}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{p_{k - 1}}}}$

is formed by a set of 2^m−kvertices. Finally, from the definition of the mapping in Eq. 43 and Eq. 66 it is evident that

$\begin{matrix} u_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}^{p; m - k; {\overset{⋒}{i}}_{q_{p_{0}}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{p_{k - 1}}}} = P ({v_{(i_{0}) \dots ({\overset{⋒}{i}}_{q_{p_{0}}}) \dots (i_{q_{f_{0}}}) \dots (i_{q_{f_{1}}}) \dots (i_{q_{f_{m - k - 1}}}) \dots ({\overset{⋒}{i}}_{q_{p_{k - 1}}}) \dots (i_{L - 1})}^{(L - 1)}, v_{(i_{0}) \dots ({\overset{⋒}{i}}_{q_{p_{0}}}) \dots (i_{q_{f_{0}}}^{'}) \dots (i_{q_{f_{1}}}) \dots (i_{q_{f_{m - k - 1}}}) \dots ({\overset{⋒}{i}}_{q_{p_{k - 1}}}) \dots (i_{L - 1})}^{(L - 1)}, v_{(i_{0}) \dots ({\overset{⋒}{i}}_{q_{p_{0}}}) \dots (i_{q_{f_{0}}}) \dots (i_{q_{f_{1}}}^{'}) \dots (i_{q_{f_{m - k - 1}}}) \dots ({\overset{⋒}{i}}_{q_{p_{k - 1}}}) \dots (i_{L - 1})}^{(L - 1)}, v_{(i_{0}) \dots ({\overset{⋒}{i}}_{q_{p_{0}}}) \dots (i_{q_{f_{0}}}^{'}) \dots (i_{q_{f_{1}}}^{'}) \dots (i_{q_{f_{m - k - 1}}}) \dots ({\overset{⋒}{i}}_{q_{p_{k - 1}}}) \dots (i_{L - 1})}^{(L - 1)}, \dots, v_{(i_{0}) \dots ({\overset{⋒}{i}}_{q_{p_{0}}}) \dots (i_{q_{f_{0}}}^{'}) \dots (i_{q_{f_{1}}}^{'}) \dots (i_{q_{f_{m - k - 1}}}^{'}) \dots ({\overset{⋒}{i}}_{q_{p_{k - 1}}}) \dots (i_{L - 1})}^{(L - 1)},}) : {i_{q_{f_{0}}}, i_{q_{f_{1}}}, \dots, i_{q_{f_{m - k - 1}}}} \in {i_{q_{0}}, i_{q_{1}}, \dots, i_{q_{m - 1}}}, {i_{q_{f_{0}}}^{'}, i_{q_{f_{1}}}^{'}, \dots, i_{q_{f_{m - k - 1}}}^{'}} \in {i_{q_{0}}^{'}, i_{q_{1}}^{'}, \dots, i_{q_{m - 1}}^{'}} and conditions (65) and (66) hold} & Eq . 66 \end{matrix}$

The set of all surfaces in the m−k dimensions associated with a product p, free index positions q_f₀,q_f₁, . . . , q_f_m−k−1and occupied index positions q_p₀, q_p₁, . . . , q_p_k−1are defined as the union:

$\begin{matrix} U_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}^{p; m - k} = ⋃_{{\overset{⋒}{i}}_{q_{p 0}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{pk - 1}}} U_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}^{p; m - k; {\overset{⋒}{i}}_{q_{p 0}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{pk - 1}}} & Eq . 67 \end{matrix}$

Next, the set of all surfaces in the m−k dimensions associated with a product p are defined as the union:

$\begin{matrix} U^{p; m - k} = \underset{q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}{⋃_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}},}} U_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}^{p; m - k} & Eq . 68 \end{matrix}$

A ‘parent’ surface custom character (u) of a particular surface

$u = u_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 1}}}^{p; m - k; {\overset{⋒}{i}}_{q_{p_{0}}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{p_{k - 1}}}}$

is defined as the surface associated with the product p, occupied index positions q_p₀, q_p1, . . . , q_p_k−2, free index positions q_f₀, q_f₁, . . . , q_f_m−k−1, q_p_k−1, and indexes at the occupied positions î_q_p0, î_q_p1, . . . , î_q_pk−2:

$\begin{matrix} ℘ (u) = u_{q_{f_{0}}, q_{f_{1}}, \dots, q_{f_{m - k - 1}}, q_{p_{k - 1}}; q_{p_{0}}, q_{p_{1}}, \dots, q_{p_{k - 2}}}^{p; m - k + 1; {\overset{⋒}{i}}_{q_{p_{0}}}, {\overset{⋒}{i}}_{q_{p 1}}, \dots, {\overset{⋒}{i}}_{q_{p_{k - 2}}}} & Eq . 69 \end{matrix}$

The set of ‘children’ of a surface uεU^{p; m−k}is defined as the set:

l(u)={v: vεU^{p; m−k−1}, u= custom character (v)} Eq. 70

In one embodiment, a process that generates subtraction formulae uses a matrix M which size is equal to the cardinality of P^a, i.e., the number of all products generated by the procedure CREATE_PRODUCTS( ). The cardinality of P^ais also equal to the number of unique surfaces that can be defined in all possible dimensions for all products of P^a. This is because each surface of a product is also a product by itself. For each possible product p, or surface u, the matrix M is initialized as M[p]←p, or equivalently M[u]←u. Initialization takes place every time a set of subtractions is generated for a product p of P^a.

Subtractions are generated by a generate subtractions process GENERATE_SUBTRACTIONS( ), which pseudo code is listed below. The subtraction formulae which are generated by generate subtractions process GENERATE_SUBTRACTIONS( ) are returned in the set S^a.

1. GENERATE_SUBTRACTIONS( )

2. S^a← Ø

3. for every p ∈ P^a

4. do INIT_M( )

5. GENERATE_SUBTRACTIONS_FOR_PRODUCT(p)

6. return S^a

The procedure INIT_M( ) is listed below:

INIT_M( )

1. for every p ∈ P^a

2. do M[p] ← p

3. return

A process GENERATE_SUBTRACTIONS_FOR_PRODUCT( ), that is also invoked by GENERATE_SUBTRACTIONS( ), is listed below in pseudo code:

GENERATE_SUBTRACTIONS_FOR_PRODUCT(p)

1. m ← the number free index positions in p

2. for l ← 0 to m−1

3. for every u_i∈ U^p;l

4.

5. do s ← (M[ custom character

(u_i)] ← M[ custom character

(u_i)] − M[u_i])

6. if s ∉ S^a

7. then

8. S^a← S^a∪ s

9. return

For each product p of P^athe subtractions generated by a process GENERATE_SUBTRACTIONS( ) reduce its value. Let μ(p) the final value of the table entry M[p] after the procedure GENERATE_SUBTRACTIONS_FOR_PRODUCT( ) is executed for the product p. It can be seen that μ(p) is in fact the product p minus all surfaces of p defined in the m−1 dimensions, plus all surfaces of p defined in the m−2 dimensions, . . . , minus (plus) all surfaces of p defined in 0 dimensions (i.e., products of single vertices). By m it is meant that the number of free index positions of p.

Next, it is determined how the subtractions generated by the process GENERATE_SUBTRACTIONS( ) can be interpreted graphically. Consider an example of an 18 by 18 multiplication. One of the products generated by the procedure CREATE_PRODUCTS( ) is formed from the set of vertices with global indexes 0, 1, 6, 7, 9, 10, 15, 16. This is the product (a₀+a₁+a₆+a₇+a₉+a₁₀+a₁₅+a₁₆)□(b₀+b₁+b₆+b₇+b₉+b₁₀+b₁₅+b₁₆).

Consider the complete graph which is formed from the vertices of this product. This graph has the shape of a cube but it also contains the diagonals that connect every other vertex, as shown in FIG. 10. The product has 6 associated surfaces defined in 2 dimensions, 12 surfaces defined in 1 dimension and 8 surfaces defined in 0 dimensions. The surfaces defined in 2 dimensions are the products (a₀+a₁+a₆+a₇)·(b₀+b₁+b₆+b₇), (a₀+a₁+a₉+a₁₀)·(b₀+b₁+b₉+b₁₀), (a₆+a₇+a₁₅+a₁₆)·(b₆+b₇+b₁₅+b₁₆), (a₉+a₁₀+a₁₅+a₁₆)·(b₉+b₁₀+b₁₅+b₁₆), (a₁+a₇+a₁₀+a₁₆)·(b₁+b₇+b₁₀+b₆), and (a₀+a₆+a₉+a₁₅)·(b₀+b₆+b₉+b₁₅). These products are formed from sets of 4 vertices. The complete graphs of these sets form squares which together with their diagonals cover the cube associated with the product (a₀+a₁+a₆+a₇+a₉+a₁₀+a₁₅+a₁₆)·(b₀+b₁+b₆+b₇+b₉+b₁₀+b₁₅+b₁₆). This is the reason why the term ‘surfaces’ is used to refer to such products.

The surfaces defined in a single dimension are the products (a₀+a₁)·(b₀+b₁), (a₀+a₆)·(b₀+b₆), (a_1+a₇)·(b₁+b₇), (a₆+a₇)·(b₆+b₇), (a₉+a₁₀)·(b₉+b₁₀), (a₉+a₁₅)·(b₉+b₁₅), (a₁₀+a₁₆)·(b₁₀+b₁₆), (a₁₅+a₁₆)·(b₁₅+b₁₆), (a₁+a₁₀) (b₁+b₁₀), (a₀+a₉)·(b₀+b₉), (a₇+a₁₆)·(b₇+b₁₆), and (a₆+a₁₅) (b₆+b₁₅). These products are formed from sets of 2 vertices. The complete graphs of these sets form the edges of the cube associated with the product (a₀+a₁+a₆+a₇+a₉+a₁₀+a₁₅+a₁₆)·(b₀+b₁+b₆+b₇+b₉+b₁₀+b₁₅+b₁₆). Finally, the surfaces defined in 0 dimensions are products formed from single vertices. These are the products a₀·b₀, a₁·b₁, a₆·b₆, a₇·b₇, a₉·b₉, a₁₀·b₁₀, a₁₅·b₁₅, and a₁₆·b₁₆.

Next, it is determined what remains if from the product (a₀+a₁+a₆+a₇+a₉+a₁₀+a₁₅+a₁₆)·(b₀+b₁+b₆+b₇+b₉+b₁₀+b₁₅+b₁₆) are subtracted all the surfaces defined in 2 dimensions, added all surfaces defined in 1 dimension and subtracted all surfaces defined in 0 dimensions. It can be seen that what remains is the term a₀·b₁₆+a₁₆·b₀+a₁·b₁₅+a₁₅·b₁+a₆·b₁₀+a₁₀·b₆+a₉·b₇+a₇·b₉. This term is part of the coefficient c₁₆of the output. The derivation of this term can be interpreted graphically as the subtraction of all covering squares from a cube, the addition of its edges and the subtraction of its vertices. What remains from these subtractions are the diagonals of the cube, excluding their end-points.

To prove the correctness of the embodiments, it is shown that every term μ(p) produced by the subtractions of the process GENERATE_SUBTRACTIONS( ) is part of one coefficient of a Karatsuba output c(x). It is also shown that for two different products p, {tilde over (p)}εP^a, the terms μ(p) and μ({tilde over (p)}) do not include common terms of the form a_i₁·b_i₂+a_i₂·b_i₁. Also, it is shown that each term of the form a_I₁·b_I₂+a_I₂·b_I₁of every coefficient of the Karatsuba output c(x) is part of some term μ(p) resulting from a product pεP^a.

Consider a product pεP^adefined by Eq. 63. If m>0, then μ(p) is the sum of all possible terms of the form a_I₁·b_I₂+a_I₂·b_I₁that satisfy the following conditions:

I₁=i₀·n₁· . . . ·n_L−1+ . . . +î_q₀·n_q₀₊₁· . . . ·n_l−1+ . . . +î_q_m−1·n_q_m−1₊₁· . . . n_l−1+ . . . +i_L−1,
I₂=i₀·n₁· . . . ·n_L−1+ . . . +{hacek over (i)}_q₀·n_q₀₊₁· . . . ·n_l−1+ . . . +{hacek over (i)}_q_m−1·n_q_m−1₊₁· . . . n_l−1+ . . . +i_L−1,
î_q₀, {hacek over (i)}_q₀ε{i_q₀, i_q′}, î_q₀≠{hacek over (i)}_q₀, . . . î_q_m−1ε{i_q_m−1, i_q_m−1′}, î_q_m−1≠{hacek over (i)}_q_m−1 Eq. 71

This means that μ(p) is the sum of all terms of the form a_I₁·b_I₂+a_I₂·b_I₁such that the global index I₁in each term a_I₁·b_I₂+a_I₂·b_I₁is created by selecting some local index values î_q0, . . . î_qm−1from among {i_q0, i_q0′}, . . . , {i_qm−1, i_qm−1}, whereas the global index I₂in the same term is created by selecting those local index values not used by I₁.

From Eq. 63 it is evident that the product p is the sum of terms which are either of the form a_I₁·b_I₂+a_I₂·b_I₁or a_I₁·b_I₁. The term μ(p) is derived from p by sequentially subtracting and adding surfaces of m−1, m−2, . . . , 0 dimensions. These surfaces are also sums of terms of the forms a_I₁·b_I₂+a_I₂·b_I₁or a_I₁·b_I₁(from Eq. 66). In addition every term of the forms a_I₁·b_I₂+a_I₂·b_I₁or a_I₁·b_I₁of every surface of p is included in p.

Next, it is shown that μ(p) does not contain terms of the form a_I₁·b_I₁and that the terms of the form a_I₁·b_I₂+a_I₂·b_I₁satisfy Eq. 71. Assume for the moment that there exist a term a_I₁·b_I₂+a_I₂·b_I₁in μ(p) that does not satisfy Eq. 71. For this term, there exists a subset of local index positions {q_e₀, q_e₁, . . . , q_e_l−1}ε{q₀, q₁. . . , q_m−1} for which the global indexes I₁and I₂are associated with the same local index values. Because of this reason this term is part of

$(\begin{matrix} l \\ l \end{matrix})$

surfaces of m dimensions,

$(\begin{matrix} l \\ l - 1 \end{matrix})$

surfaces of m−1 dimensions,

$(\begin{matrix} l \\ l - 2 \end{matrix})$

surfaces of m−2 dimensions, . . . , and

$(\begin{matrix} l \\ 0 \end{matrix})$

surfaces of m-l dimensions. From the manner in which the mapping P(V) is defined, it evident that the term a_I₁·b_I₂+a_I₂·b_I₁appears only once in each of these surfaces. Therefore the total number of times N_Lthis term appears in μ(p) is given by:

$\begin{matrix} N_{L} = \langle (\begin{matrix} l \\ l \end{matrix}) - (\begin{matrix} l \\ l - 1 \end{matrix}) + (\begin{matrix} l \\ l - 2 \end{matrix}) - \dots + {(- 1)}^{l} \cdot (\begin{matrix} l \\ 1 \end{matrix}) - {(- 1)}^{l} \cdot (\begin{matrix} l \\ 0 \end{matrix}) \rangle & Eq . 72 \end{matrix}$

Using Newton's binomial formula:

$\begin{matrix} {(x + a)}^{n} = a^{n} + (\begin{matrix} n \\ 1 \end{matrix}) \cdot a^{n - 1} \cdot x + (\begin{matrix} n \\ 2 \end{matrix}) \cdot a^{n - 2} \cdot x^{2} + \dots + (\begin{matrix} n \\ 1 \end{matrix}) \cdot a \cdot x^{n - 1} + x^{n} & Eq . 73 \end{matrix}$

Substituting x with 1, a with −1 and n with l we get that N_L=0. Hence μ(p) does not contain any terms of the form a_I₁·b_I₂+a_I₂·b_I₁that do not satisfy Eq. 72. What remains is to show that μ(p) does not contain terms of the form a_I₁·b_I₁. Every term of the form a_I₁·b_I₁is part of

$(\begin{matrix} m \\ m \end{matrix})$

surfaces of m dimensions,

$(\begin{matrix} m \\ m - 1 \end{matrix})$

surfaces of m−1 dimensions,

$(\begin{matrix} m \\ m - 2 \end{matrix})$

surfaces of m−2 dimensions, . . . , and

$(\begin{matrix} m \\ 0 \end{matrix})$

surfaces of 0 dimensions. Therefore, the total number of times a term a_I₁·b_I₁appears in μ(p) is zero (from Newton's binomial formula).

The term μ(p) contains all possible terms of the form a_I₁·b_I₂+a_I₂·b_I₁that satisfy Eq. 71. This is because these terms are part of p and they are not included into any surface of p. Therefore, these terms are not subtracted out when μ(p) is derived.

Consider a product pεP^adefined by Eq. 63. The sum of terms μ(p) is part of the coefficient c_i_cof the Karatsuba output where the index i_cis given by Eq. 74.

First consider the case where m>0. In this case, μ(p) is a sum of terms of the form a_I₁·b_I₂+a_I₂·b_I₁that satisfy Eq. 71. In this case I₁+I₂=i_cfor every term a_I₁·b_I₂+a_I₂·b_I₁. In the second case where m=0, the product p is formed from a single vertex. Therefore, p=μ(p)=a_I₁·b_I₁for some global index I₁. In this case, 2·I₁=i_c.

$\begin{matrix} i_{c} = 2 \cdot i_{0} \cdot n_{1} \cdot n_{2} \cdot \dots \cdot n_{L - 1} + \dots + (i_{q_{0}} + i_{q_{0}}^{'}) \cdot n_{q_{0} + 1} \cdot n_{q_{0} + 2} \cdot \dots \cdot n_{L - 1} + \dots + (i_{q_{1}} + i_{q_{1}}^{'}) \cdot n_{q_{1} + 1} \cdot n_{q_{1} + 2} \cdot \dots \cdot n_{L - 1} + \dots + (i_{q_{m - 1}} + i_{q_{m - 1}}^{'}) \cdot n_{q_{m - 1} + 1} \cdot n_{q_{m - 1} + 2} \cdot \dots \cdot n_{L - 1} + \dots + 2 \cdot i_{L - 1} & Eq . 74 \end{matrix}$

Next we show that the terms μ(p) and μ({tilde over (p)}) that derive from two different products p, {tilde over (p)}εP^ado not include any common terms.

Consider the products p, {tilde over (p)}εP^a. The terms μ(p) and μ({tilde over (p)}) that derive from these products have no terms of the form a_I₁·b_I₂+a_I₂·b_I₁or a_I₁·b_I₁in common.

In the trivial case where the number of free index positions of both p and {tilde over (p)} is zero, p=μ(p), {tilde over (p)}=μ({tilde over (p)}) and p≠{tilde over (p)}. In the case where one of the two products is characterized by zero free index positions and the other is not, then it is not possible for μ(p), μ({tilde over (p)}) to contain common terms since one of the two is equal to a_I₁·b_I₁for some global index I₁and the other is the sum of terms a_I₁·b_I₂+a_I₂·b_I₁that satisfy Eq. 72.

Now, assume that both p and {tilde over (p)} are characterized by at least one free index position and that there exist two terms a_I₁·b_I₂+a_I₂·b_I₁and a_Ĩ₁·b_Ĩ₂+a_Ĩ₂·b_Ĩ₁from μ(p) and μ({tilde over (p)}) respectively that are equal. Equality of global indexes means equality of their associated sequences of local indexes. The local index positions for which I₁and I₂(or Ĩ₁and Ĩ₂) differ are free index positions for both p and {tilde over (p)}. On the other hand, all other local index positions must be occupied. Indeed, if any of these index positions was free, then the local index sequences associated with I₁and I₂would differ at that position, but they do not. Therefore, the products p and {tilde over (p)} are defined using the same free and occupied local index positions. Now, from the equality of the local index sequences of I₁and I₂it is evident that p and {tilde over (p)} specify the same pairs of local index values at their free index positions and the same single values at their occupied positions. Therefore, p and {tilde over (p)} are equal, which contradicts the assumption.

Every term of the form a_I₁·b_I₂+a_I₂·b_I₁of a coefficient of the Karatsuba output is part of a term μ(p) for some product pεP^a. The global indexes I₁and I₂can be converted into 2 local index sequences. These sequences will be identical for some local index positions and different for others. A product p can be completely defined in this case from I₁and I₂by specifying the local index positions for which I₁and I₂differ as free and all others as occupied. The pairs of local index values for which I₁and I₂differ are specified at the free index positions of all vertices of the product p, whereas the local index values which are in common between I₁and I₂are specified at the occupied positions. From the manner in which the product p is specified it is evident that μ(p) contains the term a_I₁·b_I₂+a_I₂·b_I₁.

In what follows we refer to the example of FIG. 11B. We describe the steps by which a single iteration multiplication is performed between two polynomials of degree 8. Additions connect the “a” terms and the “b” terms 6, 7 and 8 in order to form the nodes of the triangle 6-7-8. Additions connect the “a” terms and the “b” terms 3, 4 and 5 to form the triangle 3-4-5. Additions connect the “a” terms and the “b” terms 0, 1 and 2 to form the triangle 0-1-2. Additions connect 1-by-1 the “a” and “b” terms 6-7-8 and 3-4-5. Additions connect 1-by-1 the “a” and “b” terms 6-7-8 and 0-1-2. Additions connect 1-by-1 the “a” and “b” terms 3-4-5 and 0-1-2. Additions create the spanning planes associated the edges of the triangles 6-7-8 and 3-4-5. Additions create the spanning planes associated with the edges of the triangles 6-7-8 and 0-1-2. Additions create the spanning planes associated with the edges of the edges of the triangles 3-4-5 and 0-1-2.

Multiplications create the nodes of the triangles 0-1-2, 3-4-5, and 6-7-8. Multiplications create the edges of the triangle 6-7-8. Multiplications create the edges of the triangle 3-4-5. Multiplications create the edges of the triangle 0-1-2. Multiplications create the edges that connect the nodes of the triangles 6-7-8 and 3-4-5. Multiplications create the edges that connect the nodes of the triangles 6-7-8 and 0-1-2. Multiplications create the edges that connect the nodes of the triangles 3-4-5 and 0-1-2. Multiplications create the spanning planes that connect the edges of the triangles 6-7-8 and 3-4-5. Multiplications create the spanning planes that connect the edges of the triangles 6-7-8 and 0-1-2. Multiplications create the spanning planes that connect the edges of the triangles 3-4-5 and 0-1-2.

Subtractions are performed, associated with the edges of the triangle 6-7-8. Subtractions are performed, associated with the edges of the triangle 3-4-5. Subtractions are performed, associated with the edges of the triangle 0-1-2. Subtractions are performed, associated with the edges that connect the nodes of the triangles 6-7-8 and 3-4-5. Subtractions are performed, associated with the edges that connect the nodes of the triangles 6-7-8 and 0-1-2. Subtractions are performed, associated with the edges that connect the nodes of the triangles 3-4-5 and 0-1-2. Subtractions are performed, associated with the spanning planes that connect the edges of the triangles 6-7-8 and 3-4-5. Subtractions are performed, associated with the spanning planes that connect the edges of the triangles 6-7-8 and 0-1-2. Finally, subtractions are performed, associated with the spanning planes that connect the edges of the triangles 3-4-5 and 0-1-2.

Additions create the coefficients of the resulting polynomial. Next the polynomial is converted to a big number.

FIG. 11A-B illustrates a block diagram and graphical illustration of process of an embodiment. Process 1100 starts with block 1105 where the number of coefficients of operands are expressed as a product of factors. It should be noted that the graphical illustration is an example for a 9×9 operation. In block 1110, each of the factors is associated with a level in a hierarchy of interconnected graphs. At each level of the hierarchy, a fully connected graph (i.e., generalized graphs having generalized vertices and generalized edges) has as many vertices as the factor associated with the level. At the last level of the hierarchy there exist simple graphs with simple interconnected vertices and simple edges.

In block 1115, each simple vertex is associated with a global index and a last level local index. In block 1120, generalized edges are defined consisting of a number of spanning edges and spanning planes. In block 1125, a spanning edge is an edge between two corresponding generalized (or simple) vertices. Corresponding vertices are associated with the same last level local index but different global indexes. A spanning plane is a fully connected graph interconnecting four generalized (or simple) vertices.

In block 1130, for all graphs interconnecting simple vertices, the products associated with simple vertices and simple edges are determined. Block 1135 starts a loop between blocks 1140, 1145, 1150 and 1160, where each block is performed for all generalized edges at each level.

In block 1140, a generalized edge is decomposed into its constituent spanning edges and spanning planes. In block 1145, the products associated with spanning edges are determined. If a spanning edge connects simple vertices, the product associated with the edge from the global indexes of the edge's adjacent vertices is formed. Otherwise the products associated with spanning edges are determined by treating each spanning edge as a generalized edge and applying a generalized edge process (blocks 1140 and 1145) recursively.

In block 1150, to determine products associated with spanning planes, process 1100 examines if the vertices of the plane are simple or not. If they are simple, the product associated with the global indexes of the planes vertices is formed and returned. If the vertices are not simple, the generalized vertices are expanded into graphs and sets of corresponding vertices and edges are created. Corresponding edges are edges interconnecting vertices with the same last level local index but different global index. For each set, the vertices which are elements of the set are used for running the spanning plane process (block 1150) recursively.

In block 1160, it is determined whether the last generalized edge has been processed by blocks 1140, 1145 and 1150. If the last edge has not been processed, process 1100 returns to block 1140. If the last edge has been processed, process 1100 continues with block 1165. In block 1165, for all the graphs associated with products created, (i.e., edges, squares, cubes, hyper-cubes, etc.) the periphery is subtracted and the diagonals are used to create coefficients of a final product. Process 1100 then proceeds with returning the final product at 1170.

Next a comparison of four one-iteration multiplication techniques: the Montgomery approach to Karatsuba (P. Montgomery, “Five, Six and Seven-Term Karatsuba-like Formulae”, IEEE Transactions on Computers, March 2005), the Paar and Weimerskirch approach, an embodiment and the schoolbook way. These techniques are compared in terms of the number of scalar multiplications each technique requires for representative operand sizes. From the numbers shown in FIG. 12 it is evident that an embodiment process outperforms all alternatives which are widely applicable to many different operand sizes. For some of the odd input sizes embodiments generate formulae for the input size minus 1 (which is even) and then use the Paar and Weimerskirch technique to generate products and subtractions for the additional input term.

Big number multiplication is used by popular cryptographic algorithms like Rivest, Shamir, & Adleman (RSA). The embodiment processes avoid the cost of recursion. The embodiments correlate between graph properties (i.e. vertices, edges and sub-graphs) and the Karatsuba-like terms of big number multiplication routines and these embodiments generate and use one iteration Karatsuba-like multiplication processes for any given operand size which require the same scalar operations as recursive Karatsuba, without recursion. Embodiments are associated with the least possible number of ‘scalar’ multiplications. By scalar multiplications it is meant multiplications between ‘slices’ of big numbers or coefficients of polynomials. The embodiments can generate optimal, ‘one-iteration’, Karatsuba-like formulae using graphs.

Embodiments of the present invention may be implemented using hardware, software, or a combination thereof and may be implemented in one or more computer systems or other processing systems. In one embodiment, the invention is directed toward one or more computer systems capable of carrying out the functionality described herein. In another embodiment, the invention is directed to a computing device. An example of a computing device 1300 is illustrated in FIG. 13. Various embodiments are described in terms of this example of device 1300, however other computer systems or computer architectures may be used. One embodiment incorporates process 1100 in a cryptographic program. In another embodiment, process 1100 is incorporated in a hardware cryptographic device.

FIG. 13 is a diagram of one embodiment of a device utilizing an optimized encryption system. The system may include two devices that are attempting to communicate with one another securely. Any type of devices capable of communication may utilize the system. For example, the system may include a first computer 1301 attempting to communicate securely with a device. In one embodiment, the device is smartcard 1303. In other embodiments, devices that use the optimized encryption system may include, computers, handheld devices, cellular phones, gaming consoles, wireless devices, smartcards and other similar devices. Any combination of these devices may communicate using the system.

Each device may include or execute an encryption program 1305. The encryption program 1305 may be a software application, firmware, an embedded program, hardware or similarly implemented program. The program may be stored in a non-volatile memory or storage device or may be hardwired. For example, a software encryption program 1305 may be stored in system memory 1319 during use and on a hard drive or similar non-volatile storage.

System memory may be local random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), fast page mode DRAM (FPM DRAM), Extended Data Out DRAM (EDO DRAM), Burst EDO DRAM (BEDO DRAM), erasable programmable ROM (EPROM) also known as Flash memory, RDRAM® (Rambus® dynamic random access memory), SDRAM (synchronous dynamic random access memory), DDR (double data rate) SDRAM, DDRn (i.e., n=2, 3, 4, etc.), etc., and may also include a secondary memory (not shown).

The secondary memory may include, for example, a hard disk drive and/or a removable storage drive, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage drive reads from and/or writes to a removable storage unit. The removable storage unit represents a floppy disk, magnetic tape, optical disk, etc., which is read by and written to by the removable storage drive. As will be appreciated, the removable storage unit may include a machine readable storage medium having stored therein computer software and/or data.

The encryption program 1305 may utilize any encryption protocol including SSL (secure sockets layer), IPsec, Station-to-Station and similar protocols. In one example embodiment, the encryption program may include a Diffie-Hellman key-exchange protocol or an RSA encryption/decryption algorithm.

The encryption program 1305 may include a secret key generator 1309 component that generates a secret key for a key-exchange protocol. The encryption program 1309 may also include an agreed key generator 1307 component. The agreed key generator 1307 may utilize the secret key from the encryption component 1313 of the device 1303 in communication with the computer 1301 running the encryption program 1305. Both the secret key generator 1309 and the agreed key generator 1307 may also utilize a public prime number and a public base or generator. The public prime and base or generator are shared between the two communicating devices (i.e., computer 1301 and smartcard 1303).

The encryption program may be used for communication with devices over a network 1311. The network 1311 may be a local area network (LAN), wide area network (WAN) or similar network. The network 1311 may utilize any communication medium or protocol. In one example embodiment, the network 1311 may be the Internet. In another embodiment, the devices may communicate over a direct link including wireless direct communications.

Device 1301 may also include a communications interface (not shown). The communications interface allows software and data to be transferred between computer 1301 and external devices (such as smartcard 1303). Examples of communications interfaces may include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA (personal computer memory card international association) slot and card, a wireless LAN interface, etc. Software and data transferred via the communications interface are in the form of signals which may be electronic, electromagnetic, optical or other signals capable of being received by the communications interface. These signals are provided to the communications interface via a communications path (i.e., channel). The channel carries the signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, a wireless link, and other communications channels.

In one example embodiment, an encryption component 1313 may be part of a smartcard 1303 or similar device. The encryption component 1313 may be software stored or embedded on a SRAM 1315, implemented in hardware or similarly implemented. The encryption component may include a secret key generator 1309 and agreed key generator 1307.

In alternative embodiments, the secondary memory may include other ways to allow computer programs or other instructions to be loaded into device 1301, for example, a removable storage unit and an interface. Examples may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip or card (such as an EPROM (erasable programmable read-only memory), PROM (programmable read-only memory), or flash memory) and associated socket, and other removable storage units and interfaces which allow software and data to be transferred from the removable storage unit to device 1301.

In this document, the term “computer program product” may refer to the emovable storage units. These computer program products allow software to be provided to device 1301. Embodiments of the invention may be directed to such computer program products. Computer programs (also called computer control logic) are stored in memory 1319, and/or the secondary memory and/or in computer program products. Computer programs may also be received via the communications interface. Such computer programs, when executed, enable device 1301 to perform features of embodiments of the present invention as discussed herein. In particular, the computer programs, when executed, enable computer 1301 to perform the features of embodiments of the present invention. Such features may represents parts or the entire blocks 1105, 1110, 1115, 1120, 1125, 1130, 1135, 1140, 1145, 1150, 1160, 1165 and 1170 of FIGS. 11A and 11B. Alternatively, such computer programs may represent controllers of computer 1301.

In an embodiment where the invention is implemented using software, the software may be stored in a computer program product and loaded into device 1301 using the removable storage drive, a hard drive or a communications interface. The control logic (software), when executed by computer 1301, causes computer 1301 to perform functions described herein.

Computer 1301 and smartcard 1303 may include a display (not shown) for displaying various graphical user interfaces (GUIs) and user displays. The display can be an analog electronic display, a digital electronic display a vacuum fluorescent (VF) display, a light emitting diode (LED) display, a plasma display (PDP), a liquid crystal display (LCD), a high performance addressing (HPA) display, a thin-film transistor (TFT) display, an organic LED (OLED) display, a heads-up display (HUD), etc.

In another embodiment, the invention is implemented primarily in hardware using, for example, hardware components such as application specific integrated circuits (ASICs) using hardware state machine(s) to perform the functions described herein. In yet another embodiment, the invention is implemented using a combination of both hardware and software.

In the description above, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. For example, well-known equivalent components and elements may be substituted in place of those described herein, and similarly, well-known equivalent techniques may be substituted in place of the particular techniques disclosed. In other instances, well-known circuits, structures and techniques have not been shown in detail to avoid obscuring the understanding of this description.

Embodiments of the present disclosure described herein may be implemented in circuitry, which includes hardwired circuitry, digital circuitry, analog circuitry, programmable circuitry, and so forth. These embodiments may also be implemented in computer programs. Such computer programs may be coded in a high level procedural or object oriented programming language. The program(s), however, can be implemented in assembly or machine language if desired. The language may be compiled or interpreted. Additionally, these techniques may be used in a wide variety of networking environments. Such computer programs may be stored on a storage media or device (e.g., hard disk drive, floppy disk drive, read only memory (ROM), CD-ROM device, flash memory device, digital versatile disk (DVD), or other storage device) readable by a general or special purpose programmable processing system, for configuring and operating the processing system when the storage media or device is read by the processing system to perform the procedures described herein. Embodiments of the disclosure may also be considered to be implemented as a machine-readable or machine recordable storage medium, configured for use with a processing system, where the storage medium so configured causes the processing system to operate in a specific and predefined manner to perform the functions described herein.

While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art.

Reference in the specification to “an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments. The various appearances “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments. If the specification states a component, feature, structure, or characteristic “may”, “might”, or “could” be included, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, that does not mean there is only one of the element. If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.

Number	Name	Date	Kind
5317755	Hartley et al.	May 1994	A
20030142818	Raghunathan et al.	Jul 2003	A1
20060176306	Nagaraj et al.	Aug 2006	A1
20060206554	Lauter et al.	Sep 2006	A1

System, method and apparatus for multiplying large numbers in a single iteration using graphs

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

US Referenced Citations (4)

Related Publications (1)