The present invention is related to a k-cluster residue number system, and more particularly, to a memory-based k-cluster residue number system using look-up tables with reduced data capacity.
Edge artificial intelligence (AI) computing is an area of rapid growth, which integrates neural networks with the Internet of Things (IoT) together for computer vision, natural language processing, and self-driving car applications, it quantizes the floating-point number to fixed-point integer for inference operations. In-memory architecture is one of the important Edge AI computing platforms, which stacks the memory over the top of the logic circuits for Memory Centric Neural Computing (MCNC). The data is directly loaded from stacked memory to Processing Elements (PEs) for computation, it avoids loading the data from the external memory and minimizes data transfer. It significantly reduces the latency and speeds up the operations. The performance is further enhanced using Residue Number System (RNS), which fully utilizes the internal memory to store the data for integer operations.
Residue Number System (RNS) is a number system, which first defines the modular set and transforms the numbers to their integer remainders (also called residues) through modulo division, then performs the arithmetic operations (addition, subtraction, and multiplication) on the remainders only. For example, the modular set is defined as (7, 8, 9) with the numbers 13 and 17. The dynamic range is defined by the product of a modular set with the range 504. It first transforms the numbers to their residue through modulo operations 13→(6, 5, 4) and 17→(3, 1, 8), then performs addition and multiplication on residues only, (6, 5, 4)+(3, 1, 8)=(9, 6, 12)→(2, 6, 3), which is equal to 30. (6, 5, 4)*(3, 1, 8)=(18, 5, 32)→(4, 5, 5), which is equal to 221. Since the remainder magnitude is much smaller, it only requires simple logic for parallel computations.
For the sake of clarity, the dynamic range of the RNS may be defined as the following equation (1):
All the arithmetic operations of the RNS can be implemented using the memory lookup tables for parallel distributed computing. However, the memory requirement is the drawback of using lookup tables in the RNS. The required size of memory is dependent on the square of each modulus as well as the number of bits of the modulus, and can be presented as the following equation (2):
For example, it chooses the RNS modular set as (15, 17) with the dynamic range M=15×17=2552. Since the first modulus (i.e., 15) has a 4-bit length and the second modulus (i.e., 17) has a 5-bit length, for all three arithmetic (i.e., addition, subtraction, and multiplication) operations, the total memory requirement is estimated to be 3×(152×4+172×5)=7035 bits. The area is too large compared with the logic gate design (e.g., the processing elements (PEs)) of the RNS.
In an embodiment, a method for performing operations in a k-cluster residue number system comprises generating an addition and subtraction look-up table comprising 2mi cells for recording values from zero to (mi−1) in an ascending order twice, storing the addition and subtraction look-up table in a memory of the k-cluster residue number system, retrieving a value recorded in a cell at position Q of the addition and subtraction look-up table when performing an addition operation on two integers A and B, and retrieving a value recorded in a cell at position R of the addition and subtraction look-up table when subtracting an integer X by an integer Y. Where mi is a coprime integer of a modular set of the k-cluster residue number system, Q=((A mod mi)+(B mod mi)), R=(X mod mi)−(Y mod mi)=(X mod mi)+(mi−(Y mod mi))=rx+(mi−ry)=(rx+ry′), rx is equal to (X mod mi), ry is equal to (Y mod mi), and ry′ is equal to (mi−(Y mod mi)).
In another embodiment, a method for generating a k-cluster residue number system comprises generating a multiplication look-up table for a coprime integer mi of a modular set of the k-cluster residue number system, storing the multiplication look-up table in a memory of the k-cluster residue number system, and performing a multiplication operation on a multiplicand and a multiplicator using the multiplication look-up table. The coprime integer mi is not 2, and the multiplication look-up table is composed of S cells,
The multiplication operation comprises determining whether a complement of the multiplicand is greater than or equal to the multiplicator; if it is determined that the complement of the multiplicand is greater than or equal to the multiplicator, performing a first procedure; if it is determined that the complement of the multiplicand is less than the multiplicator, performing a second procedure; and retrieving a value from the multiplication look-up table as a product of the multiplicand and the multiplicator according to a column entry and a row entry. The first procedure comprises the following steps: selecting the multiplicand as the column entry and the multiplicator as the row entry, and determining whether the column entry is greater than or equal to the row entry; if it is determined that the column entry is greater than or equal to the row entry, keeping the column entry and the row entry; and if it is determined that the column entry is less than the row entry, interchanging the column entry and the row entry. The second procedure comprises the following steps: selecting the complement of the multiplicand as the column entry and a complement of the multiplicator as the row entry, and determining whether the column entry is greater than or equal to the row entry; if it is determined that the column entry is greater than or equal to the row entry, keeping the column entry and the row entry; and if it is determined that the column entry is less than the row entry, interchanging the column entry and the row entry.
In another embodiment, a k-cluster residue number system comprises a processor and memory coupled to the processor. The processor is used to generate an addition and subtraction look-up table comprising 2mi cells for recording values from zero to (mi−1) in an ascending order twice, retrieving a value recorded in a cell at position Q of the addition and subtraction look-up table when performing an addition operation on two integers A and B, and retrieving a value recorded in a cell at position R of the addition and subtraction look-up table when subtracting an integer X by an integer Y. Where mi is a coprime integer of a modular set of the k-cluster residue number system, Q=((A mod mi)+(B mod mi)), R=(X mod mi)−(Y mod mi)=(X mod mi)+(mi−(Y mod mi))=rx+(mi−ry)=(rx+ry′), rx is equal to (X mod mi), ry is equal to (Y mod mi), and ry′ is equal to (mi−(Y mod mi)).
In another embodiment, a k-cluster residue number system comprises a processor and a memory coupled to the processor. The processor is used to generate a multiplication look-up table for a coprime integer mi of a modular set of the k-cluster residue number system, and perform a multiplication operation on a multiplicand and a multiplicator using the multiplication look-up table. The coprime integer mi is not 2, and the multiplication look-up table is composed of S cells,
The multiplication operation comprises determining whether a complement of the multiplicand is greater than or equal to the multiplicator; if it is determined that the complement of the multiplicand is greater than or equal to the multiplicator, performing the first procedure; if it is determined that the complement of the multiplicand is less than the multiplicator, performing a second procedure; and retrieving a value from the multiplication look-up table as a product of the multiplicand and the multiplicator according to a column entry and a row entry. The first procedure comprises the following steps: selecting the multiplicand as the column entry and the multiplicator as the row entry, and determining whether the column entry is greater than or equal to the row entry; if it is determined that the column entry is greater than or equal to the row entry, keeping the column entry and the row entry; and if it is determined that the column entry is less than the row entry, interchanging the column entry and the row entry. The second procedure comprises the following steps: selecting the complement of the multiplicand as the column entry and a complement of the multiplicator as the row entry, and determining whether the column entry is greater than or equal to the complement of the row entry; if it is determined that the column entry is greater than or equal to the row entry, keeping the column entry and the row entry; and if it is determined that the column entry is less than the row entry, interchanging the column entry and the row entry.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
To represent an n-bit integer and its negative using a k-cluster residue number system (k-RNS), it first defines a modular set of P coprime integers as (m0, m1, . . . , mp) where a dynamic range is generated according to the product of the modular set (m0, m1, . . . , mp). For example, when a modular set of 3 coprime integers is chosen to be (2n/2−1, 2, 2n/2+1) , the dynamic range is set to [−(2n−1), (2n−2)]. The modular set is not limited to 3 coprime integers.
Processor 4 uses the addition and subtraction look-up table 10 to perform addition operations and subtraction operations.
Z=X+Y (4)
rz=rx+ry (5)
In the embodiment, one modulus of the modular set selected by processor 4 is 7. However, the present invention is not limited thereto. The selected modulus could be other coprime integers of the modular set. The addition and subtraction look-up table 10 is composed of 14 (i.e., 2×7) cells 11 for recording values from zero to 6 in an ascending order twice. The addition and subtraction look-up table 10 is a one-dimensional linear array, which is simplified from a traditional two-dimensional addition look-up table A7 for addition operations based on modulus 7. The traditional addition look-up table A7 comprises 49 (i.e., 7×7) cells, more than fourteen cells 11 of the addition and subtraction look-up table 10. Each of the cells 11 is used to store a residue when an addition operation on two remainders rx and ry or two integers X and Y is performed, where rx=(X mod 7) and ry=(Y mod 7). The addition look-up table A7 is transformed to the addition and subtraction look-up table 10 based on the periodic behaviors of the modulo. As shown in
Similarly, according to the subtraction algorithm, the following equation (6) in the integral domain could be transformed into the following equation (7) in the remainder domain:
Z=X−Y (6)
rz=rx−ry (7)
Therefore, the addition and subtraction look-up table 10 could be used not only for addition operations but also for subtraction operations since the addition and subtraction look-up table 10 could be obtained by simplifying a traditional subtraction look-up table S7. The traditional subtraction look-up table S7 also comprises 49 (i.e., 7×7) cells, more than fourteen cells 11 of the addition and subtraction look-up table 10. Each of the cells 11 is used to store a residue when the remainder rx is subtracted by the remainder ry. The subtraction look-up table S7 is transformed to the addition and subtraction look-up table 10 based on periodic behaviors of the modulo. As shown in
In another embodiment of the present invention, when processor 4 subtracts the integer X by the integer Y, processor 4 retrieves the value from the addition and subtraction look-up table 10 according to the complement ry′ of the remainder ry and the remainder rx (i.e., the minuend). Referring to
Since R=(rx+ry′), the start position Ps would be set to be rx of the addition and subtraction look-up table 10, then it is shifted according to the complement ry′. For example, when we subtract 5 with 4, rx=5 and ry=4, the table entry becomes=5, the complement of ry=4 is ry′=3, it shifts to the right by 3, the value recorded in cell 11 at position 8 is defined as 1, it matches the result of (rx−ry)=1.
Therefore, the two-dimensional addition look-up table A7 and the two-dimensional subtraction look-up table S7 could be simplified and transformed into the addition and subtraction look-up table 10, which is a one-dimensional linear array. The addition and subtraction look-up table 10 is used by processor 4 when processor 4 performs an addition operation or a subtraction operation when the modulus mi=7. Since the modulus mi may be an integer other than 7, processor 4 could generate corresponding addition and subtraction look-up tables for other coprime integers of the modular set.
In detail, if the k-RNS 2 uses P coprime integers to define its modular set as (m1, . . . , 2, . . . , mp), the processor 4 generates a corresponding addition and subtraction look-up table for each coprime integer not equal to 2. If the coprime integer is mi, its corresponding addition and subtraction look-up table is composed of 2mi cells 11 for recording values from zero to (mi−1) in an ascending order twice. When processor 4 performs an addition operation on two integers X and Y, processor 4 retrieves a value recorded in cell 11 at position Q of the addition and subtraction look-up table 10, where Q=((X mod mi)+(Y mod mi))=(rx+ry). When processor 4 subtracts the integer X by the integer Y, processor 4 retrieves a value recorded in cell 11 at position R of the addition and subtraction look-up table 10, where R=(X mod mi)−(Y mod mi)=(X mod mi)+(mi−(Y mod mi))=rx+ry′, and ry′ is the complement of the remainder ry (i.e., ry′=mi−ry). According to the subtraction algorithm, the following equation (8) in the integral domain could be transformed into the following equation (9) in the remainder domain:
Z=X×Y (8)
rz=(rx×ry) (9)
According to equations (2) and (10), the lookup table size is reduced from
to
as compared with the prior art. Moreover, according to the multiplication commutative rule, the order of the multiplicand and multiplicator can be exchanged. Therefore, the products of the temporary multiplication look-up table 60 could be mirrored along a Top-Left and Bottom-Right (TL/BR) diagonal line 62, as shown in
to
For example, the multiplication look-up table 12 has twelve cells 11 while the temporary multiplication look-up table 60 has thirty-six cells 11. Therefore, the lookup table size for multiplication operations is further reduced. Due to the mirror property and the periodic behaviors of the modulo, the four identical regions 81, 82, 83, and 84 could be simplified as a multiplication look-up table 12, as shown in
Accordingly, the data amount of the multiplication look-up table 12 could be presented as the following equation (11):
When processor 4 performs a multiplication operation on the two integers X and Y, processor 4 would perform the procedure shown in
Step S902: determine whether a complement rx′ of the multiplicand rx is greater than or equal to the multiplicator ry; if it is determined that the complement rx′ of the multiplicand rx is greater than or equal to the multiplicator ry, go to Step S904; otherwise, go to Step S914;
Step S904: select the multiplicand rx as the column entry and the multiplicator ry as the row entry;
Step S906: determine whether the column entry (rx) is greater than or equal to the row entry (ry); if it is determined that the column entry (rx) is greater than or equal to the row entry (ry), go to Step 908; otherwise, go to Step S910;
Step 908: keep the column entry and the row entry; go to Step 924
Step 910: interchange the column entry and the row entry;
Step 912: ry is selected as the column entry and rx is selected as the row entry; go to Step S924
Step S914: select the complement rx′ of rx as the column entry and the complement ry′ of the multiplicator ry as the row entry;
Step S916: determine whether the column entry (rx′) is greater than or equal to the row entry (ry′); if it is determined that the column entry (rx′) is greater than or equal to the row entry (ry′), go to Step S918; otherwise go to step S920;
Step S918: keep the column entry (rx′) and the row entry (ry′); Go to Step S924
Step S920: interchange the column entry (rx′) and row entry (ry′);
Step S922: the complement ry′ of ry is selected as the column entry and complement rx′ of rx is selected as the row entry; go to Step 924
Step 924: Retrieve a value from the multiplication look-up table 12 as a product of the multiplicand and the multiplicator according to the column entry and the row entry.
Briefly, processor 4 changes the column entry from rx to its RNS complement rx′, where rx′=(mi−rx), then processor 4 compares rx with ry (step S902). If rx′ is less than ry, then both the row and column entries are changed into their residue complement (step S114, where rx′=(mi−rx) and ry′=(mi−ry)), otherwise keeping rx and ry without change (step S904). After that, processor 4 compares the row (rx) and column (ry) entries. If rx is less than ry, then rx and ry are interchanged (rx⇄ry) for table access.
For example, when mi=7, X=10, and Y=19, rx=3, ry=5, rx′=7−3=4, ry′=7−5=2 According to the above approach, the column entry is rx′=4, and the row entry is ry′=2. Therefore, when processor 4 performs a multiplication operation on X and Y, processor 4 would retrieve the value recorded in cell 11 at the 4th column and 2nd row of the multiplication look-up table 12 as the product of the modulo multiplication. The product of X=10 and Y=19 is 190, where the residue |190|7=1. It is consistent with the value: 1 stored in 4th column and 2rd row. In another example, when mi=7, X=11, and Y=15, rx=4, ry=1, rx′=7−4=3, ry′=7−1=6, the column entry is rx=4, and the row entry is ry=1. Therefore, when processor 4 performs a multiplication operation on X and Y (i.e., 11×15), where the residue |165|7=4, processor 4 would retrieve the value: 4 recorded in cell 11 at the 4th column and 1st row of the multiplication look-up table 12 as the product of the multiplication.
In the k-cluster residue number system 2, since the addition and subtraction look-up table 10 and the multiplication look-up table 12 have compressed size, the required size of the memory 6 for storing look-up tables for addition, subtraction, and multiplication operations could be reduced.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.