BACKGROUND OF THE INVENTION
Field of the Invention
The invention is related to a binary floating-point in-memory multiplication device for two binary floating-point number operands. In particular, in order to achieve one-step floating-point multiplication operation for improving the computation efficiency and saving computation power, the binary floating-point in-memory multiplication device of the invention comprises (1) two binary floating-point decoders for converting the exponent bits into two most significant bits (ap-1/bp-1) of two p-bit significands for two inputted floating-point number operands; (2) multiple memory arrays storing a base-2n multiplication code table for the significand-bit multiplication operation; (3) an adder circuit for the exponent-bit addition operation; (4) a binary floating-point number encoder for converting the resultant 2p-bit multiplied number code from the multiplication of two prior binary p-bit significands into the standard binary (p−1)-bit significand number code in IEEE 754 format ready for further operation and storage.
Description of the Related Art
In the modern Von Neumann computing architecture as shown in FIG. 1, the Central Process Unit (CPU) 10 executes logic operations according to the instructions and data from the main memory 11. The CPU 10 includes a main memory 11, Arithmetic and Logic Unit (ALU) 12, an input/output equipment 13 and program control unit 14. Prior to the computation process, CPU 10 is set by the program control unit 14 to point to the initial address code for the initial instruction in the main memory 11. The digital data are then processed with the ALU 12 according to the sequential instructions in the main memory 11 accessed by the clock-synchronized address pointer (instruction cache) in the program control unit 14. The digital logic computation process for CPU 10 is synchronously executed and driven by a set of pre-written sequential instructions stored in the instruction memory unit.
In digital computer systems based on the Von Neumann computing architecture numbers are represented in the binary formats. For example, an integer number/in the m-bit binary format is given by
where bi=[0, 1] for i=0, . . . , (m−1), and the symbol “b” indicates the integer number for the binary format.
The arithmetic operations such as multiplication, addition, subtraction, and division for binary numbers require manipulating the binary codes of operands to obtain the correct binary representation of the resultant numbers from the arithmetic operations in the circuit processors. The manipulations of the operant binary codes include feeding the binary codes into the combinational logic gates and placing the binary codes in the correct positions of the registers and memory units in IC processor chips. Therefore, the more manipulation steps of moving the binary codes in and out of various memory units, registers, and combinational gate logic units through their connecting bus-lines, the more computing power is consumed.
Specially, when a computing processor is operated with the individual bit-level manipulations of the code strings, the power consumption from charging and discharging the associated capacitances of the connecting bus-lines, the logic gates, the registers, and the memory units will significantly increase with the increasing operational steps as the power P ˜f×C×VDD2, where f is the step cycles for the processing time periods, C is the total charging/discharging associated capacitances required for the entire computation process, and VDD is the high voltage supply. For example, the multiplication of two integer numbers represented by two n-bit binary codes is usually done by the so-called Multiply-Accumulation (MA) sequence: applying each single-bit of one “n-bit” operand for the bit-to-bit multiply (bit result obtained by the circuit “AND” gate) with every bit of another “n-bit” operand to obtain the number “n” of “n-bit” binary codes stored in registers; shifting each “n-bit” binary code in circuit to the correct positions in the “n” rows of 2n-bit long register; filling the empty bit registers with zeros for each row of the 2n-bit long registers; performing “(n−1)” steps of addition operations in binary addition circuits for the “n” number of 2n-bit long code strings in the registers to obtain the multiplication 2n-bit long binary code string. The computational power in processors is thus increased by the tedious steps of bit-level manipulations mainly from the transportation of intermediate data and instruction codes on a fixed bandwidth of bus-lines (currently with 8 bit, 16 bit, 32 bit, 64 bit formats) in processors. More steps of computing operations also indicate that higher frequency data transportation for intermediate data codes and additional instruction codes on the fixed bandwidth of bus-lines in processors is required. The heavy data and instruction code traffics for moving in and out of the memory units, logic gates, and registers specially in the computing pipeline processing may cause the bus-line congestion in the processors. The so-called Von Neumann bottleneck caused by the bus-line congestion of heavy data traffics between processor units and memory units is the main reason for slowing down computation processes.
From software programming prospective, the desirable one-step operation (completed in single clock cycle) shall simplify the computational algorithms and programming instructions for processors. Furthermore, the one-step multiplication operation could also save the required memory space for storing intermediary data and additional instruction codes resulting in chip memory area reduction in IC processor chip.
In the U.S. Pat. No. 11,461,074 B2 (the disclosure of which is incorporated herein by reference in its entirety), the binary multiple-digit in-memory multiplication devices comprising memory arrays for storing the base-2n multiplication table can reduce the numbers of intermediate operational steps for the multiplication of two binary integer operands. The one-step multiplication operation for two binary integer operands can be eventually achieved with the binary multiple-digit in-memory multiplication devices. In this invention, we further construct a binary floating-point in-memory multiplication device for two binary floating-point number operands.
Specially, the binary multiple-digit in-memory multiplication devices in the previous patent (U.S. Pat. No. 11,461,074 B2) is applied for the significand multiply in the binary floating-point in-memory multiplication device of the invention. Two floating-point decoders, and an exponent adder circuit are also incorporated in the floating-point in-memory multiplication device of the invention to achieve the one-step floating-point multiplication operation. For saving the computational power and boosting the computation efficiency, the binary floating-point in-memory multiplication device of the invention is designed to achieve one-step floating-point multiplication operation (completed in one clock cycle) such that the multiple times of data transportation between the multiplier unit, temporal data storage, and memory units in the conventional circuit processor can be totally omitted.
SUMMARY OF THE INVENTION
According to IEEE 754 binary floating-point number format code, a binary floating-point number A is represented by one sign bit sa, a q-bit exponent ea, and a p-bit significand a, that is,
where the binary numbers sa, eai, aj=[0, 1] for i=0, . . . , (q−1) and j=0, . . . , (p−1), and f indicates the floating-point format.
Note that since the binary values of ap-1=0 for all eai=0 representing the sub-normal floating-point numbers and ap-1=1 for any non-zero eai representing the normal floating-point numbers can be decoded from the exponent bits eai for i=0, . . . , (q−1), a floating-point number code is usually stored and transported without the most significant bit (MSB) ap-1 of the significand. Therefore, the total number of bits stored and transported still remains (p+q) bits for the floating-point number code. For example, in computer systems, the floating point 8 is (p+q)=8 bits, the half precision is (p+q)=16 bits, the single precision is (p+q)=32 bits, the double precision is (p+q)=64 bits, the quadruple precision is (p+q)=128 bits, and the octuple precision is (p+q)=256 bits, and so forth. Floating-point decoders in the computing hardware are always to decode the binary value of the most significant bit ap-1 for the p-bit significand from the exponent bits (ea0, . . . , eaq−1)b prior to binary arithmetic operations.
For the same format as the floating-point number A, a floating-point number B with one sign bit sb, a q-bit exponent bits eb, and a p-bit significand b is given by
where the binary numbers sb, ebi, bj=[0, 1] for i=0, . . . , (q−1) and j=0, . . . , (p−1), and f indicates the floating-point format. Accordingly, the floating-point number M for the multiplication of A and Bis then given by
By comparing the above two equations for M, we obtain that sm=(sa+sb) for the “sign” of M, em=(ea+eb) for the “exponent” of M and the binary multiplication of the two p-bit significands of A and B, that is,
According to the above equations for the “signs”, the “exponents”, and the “significands” of the floating-point number multiplication for two floating-point number operands, the schematic of a binary floating-point in-memory multiplication device 20 is designed to achieve the one-step floating-point multiplication operation as shown in FIG. 2. In FIG. 2, the voltage signals of two floating-point numbers A=(sa, eaq−1, . . . , ea0, ap-2, . . . , a0) and B=(sb, ebq−1, . . . eb0, bp-2, . . . , b0) are inputted from the A data register 21 and B data register 22 to a “sign” multiplication circuit 200 through the “sign” nodes 217 and 227, an “exponent” adder circuit 240 through the “exponent” nodes 218 and 228, and a binary in-memory multiplier circuit 250 through the “binary multiply” nodes 219 and 229 along with the outputs of a FP (Floating Point) decoder 210a and a FP decoder 210b respectively for the voltage signals of ap-1 and bp-1. The output voltage signals from an “exponent” adder circuit 240 and the in-memory multiplier circuit 250 are consequentially sent to the FP encoder circuit 270 for converting the resultant code back into the standard IEEE 754 floating-point number format code. The voltage signals of a FP encoder circuit 270 at the connection nodes 271 (i.e., nodes 271e for the exponent bits and nodes 271s for the significand bits) along with the “sign” voltage signals at the output node 201 of the “sign” multiplication circuit 200 are stored in the (p+q)-bit output register R 23. Note that the registers 220, 230, and 260 exist in the circuit device 20 only for the purpose of illustrating the voltage signals of intermediate data between the connection nodes. Those registers can be omitted in a real circuit implementation.
The floating-point in-memory multiplication device 20 of the invention performs one-step floating-point multiplication operation for two binary floating-point numbers without intermediate data storage and transportation between ALU, registers, and memory units, so the power consumption can be dramatically reduced. The invention performs one-step floating-point multiplication in memory units (i.e., through in-memory processing/computing), without moving intermediate data in/out of memory units, so as to avoid occupations of bus-line hardware (that may cause bus-line congestion or the Von Neumann bottle-neck in computers), to improve computation efficiency and to save computation power and time. The invention improves the field of in-memory processing/computing by using ROM arrays for storing the n-bit by n-bit multiplication tables (FIGS. 7-8) and transforming an identified leading non-zero bit position z into a binary format and by using specific adders for manipulating the output data from the multiplication tables and the q-bit exponents of both the multiplicand and the multiplier. In particular, no matter what precision the computer system is, the ROM array sizes for the n-bit by n-bit multiplication tables still remain reasonably small resulting in properly small silicon areas and high enough processing speeds.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will become more fully understood from the detailed description given hereinbelow and the accompanying drawings which are given by way of illustration only, and thus are not limitative of the present invention, and wherein:
FIG. 1 shows the conventional Von Neumann computing architecture for a typical Central Processing Unit (CPU).
FIG. 2 shows a schematic diagram of a binary floating-point in-memory multiplication device 20 for two binary floating-point number multiplications according to the invention.
FIG. 3 shows a schematic diagram of “sign” multiplication circuit 200 for the sign operation of two floating-point number multiplications according to an embodiment.
FIGS. 4a and 4b respectively show the schematic diagrams of floating-point decoders 210a and 210b for generating the MSB of the significand from the exponent bits for a floating-point number.
FIG. 5 shows the schematic of a carry-chained exponent adder circuit 240 for the addition of the two exponent bits for the two floating-point number operands.
FIG. 6 shows the schematic of Perpetual Digital Perceptron (PDP) base-2n in-memory multiplier unit 600 for outputting the product codes of two n-bit long inputted codes based on the n-bit-by-n-bit multiplication table in FIG. 7.
FIG. 7 shows the 2n-bit binary product codes of the multiplication tables stored in PDP base-2n in-memory multiplier unit 600 for two n-bit long inputted binary operands.
FIG. 8 shows the 8-bit binary product codes of the multiplication tables stored in PDP base-2n in-memory multiplier unit 600 for two 4-bit (n=4) inputted binary operands in one embodiment of single precision floating-point number multiplication.
FIG. 9 shows the schematic of 6-digit base-24 in-memory binary multiplier circuit 250A for the significand bit multiplication of two floating-point number operands in the embodiment of single precision floating-point number multiplication.
FIG. 10 shows the schematic of a carry-chained Binary Adder BA 920(j) in FIG. 9 to generate the “jth” polynomial binary code (7-digit by 4-bit) for j=0, 1, 2, 3, 4, 5, in the embodiment of single precision floating-point number multiplication.
FIG. 11 shows the schematic of Polynomial Binary Adders PBA 930(i) in a carry-chained configuration in FIG. 9 to add up the six polynomial binary codes for i=0, 1, 2, 3, 4, in the embodiment of single precision floating-point number multiplication.
FIG. 12 shows the block schematic of floating-point encoder circuit 270 to convert the float-point multiplied number back into the standard IEEE 754 format for following data operations according to the invention.
FIG. 13 shows the block schematic of floating-point encoder circuit 270A to convert the float-point multiplied number back into the standard IEEE 754 format in the embodiment of single precision floating-point number multiplication.
FIG. 14 shows an example of the code table for different numbers z of left-shifted bit positions for the 24-bit significands in the embodiment of single precision floating-point number multiplication.
FIG. 15 shows the schematic of a barrel shifter 1340 in the embodiment of single precision floating-point number multiplication.
FIG. 16 shows the schematic of an addition/subtraction circuit 1330 in the embodiment of single precision floating-point number multiplication.
DETAILED DESCRIPTION OF THE INVENTION
The following detailed description is meant to be illustrative only and not limiting. It is to be understood that other embodiment may be utilized and element changes may be made without departing from the scope of the present invention. Also, it is to be understood that the phraseology and terminology used herein are for the purpose of description and should not be regarded as limiting. Those of ordinary skill in the art will immediately realize that the embodiment of the present invention described herein in the context of methods and schematics are illustrative only and are not intended to be in any way limiting. Other embodiment of the present invention will readily suggest themselves to such skilled persons having the benefits of this disclosure.
In the embodiment of the binary floating-point in-memory multiplication circuit device 20 in FIG. 2, the “sign” multiplication circuit 200 applies an XOR (exclusive OR) gate circuit for the voltage signals at nodes 217, 227, and 201 shown in FIG. 3 according to the logic operations of (sa=0, sb=0, and sm=0), (sa=0, sb=1, and sm=1), (sa=1, sb=0, and sm=1), and (sa=1, sb=1, and sm=0). The schematics of two FP decoder circuitries 210a and 210b for obtaining the voltage signals of the digital values of ap-1 and bp-1 for two p-bit significands are shown in FIGS. 4a and 4b. The FP decoder circuit 210a in FIG. 4a having the same circuit configuration as the circuit 210b in FIG. 4b comprises one P-type MOSFET (Metal Oxide Semiconductor Field Effect Transistor) devices EP, one N-type MOSFET device EN for the “enabled” operation, “q” N-type MOSFET devices (Meaq−1, . . . , Mea1, Mea0) with their gates connected to the nodes (eaq−1, . . . , eai, ea0) 218, and an inverter 212a. When the circuit 210a is not enabled with low logic voltage signal “VSS” at node 25, the EP device is turned on to charge the high logic voltage “VDD” at the node 211a with the EN device off to disconnect from the ground voltage. While the circuit 210a is enabled with high logic voltage signal “VDD” at node 25, the EP device is turned off to disconnect the node 211a from the high voltage “VDD” and the EN device is turned on to cause the node 211a to connect to the ground voltage. While enabled, if any one of nodes (eaq−1, . . . , ea1, ea0) 218 is applied with the high logic voltage signal “VDD”, the correspondent N-type MOSFET device(s) of (Meaq−1, . . . , Mea1, Mea0) will be turned on to discharge the node 211a to the ground voltage through the EN device such that the output ap-1 of the inverter 212a will flip to the high logic signal “VDD” (logic value “1”). Otherwise, the output ap-1 will remain at low voltage signals “VSS” (logic value “0”) resulting from the applied low gate voltage signals “VSS” to the entire nodes of (eaq−1, . . . , ea1, ea0) 218 shutting off all the N-type MOSFET devices of (Meaq−1, . . . , Mea1, Mea0) to disconnect node 211a from the ground voltage. The circuit 210b for the floating-point number B works as the same way as the circuit 210a for the floating-point number A. The operations of the two FP decoder circuitries 210a and 210b are equivalent to q-input OR-gate devices. The above FP decoder circuitries 210a and 210b are utilized as embodiments and not limitations of the invention. In actual implementations, the above FP decoder circuitries 210a and 210b can be replaced with any other circuit layouts of q-input OR-gate devices or equivalent logic components.
The addition for the two exponents (eaq−1, . . . , ea1, ea0)b and (eaq−1, . . . , ea1, ea0)b of the floating-point numbers A and B can be performed by a conventional carry-chained exponent adder circuit 240 comprising “q−1” full adders 24f (one OR gate, two XOR gates and two AND gates) and one half adder 24h (an “XOR” gate and an “AND” gate) for the Least Significant Bit (LSB) as shown in FIG. 5.
Referring to the disclosure of U.S. Pat. No. 11,461,074 B2, the binary multiple-digit base-2n in-memory multiplication devices comprising memory arrays for storing the base-2n multiplication table can reduce the numbers of intermediate operational steps for the multiplication of two p-bit binary operands. Therefore, the multiple-digit base-2n in-memory multiplication device with two operands each having the number (p/n) of digits can be designed for the binary significand multiplication. The p-bit by p-bit binary significand multiplication is converted to the (p/n)-digit by (p/n)-digit multiplication with each digit of each significand represented by a unique n-bit binary code. The (p/n)-digit by (p/n)-digit multiplication can be performed with (p/n)2 digit-digit multiply and ((p/n)−1) polynomial additions. The voltage signals of multiplied/product binary code (2n-bit) for each digit-digit multiply are obtained from the output voltage signals of PDP base-2n in-memory multiplier unit 600 comprising a Content Read Only Memory (CROM) array 620, a match detector unit 630 and a Response Read Only memory (RROM) array 640 shown in FIG. 6. A number 22n of 2n-bit operands Ai and Bj of multiplication table in FIG. 7 are hardwired in a number 22n of rows of CROM cells (not shown) of the CROM array 620, where 0<=i, j<=((p/n)−1). A number 22n of 2n-bit multiplication/product codes in the table cells of multiplication table in FIG. 7 are hardwired in a number 22n of rows of RROM cells (not shown) of the RROM array 640. The match detector Unit 630 is enabled by the signal “Enb” at node 605 and configured to respectively sense the voltage potentials at the match-lines 621 for a matched match-line and then activate one of the wordlines 631 corresponding to the matched match-line. Basically, the PDP base-2n in-memory multiplier unit 600 functions as the following: compare the number 22n of 2n-bit operand codes hardwired in the CROM array 620 with a first n-bit digit and a second n-bit digit respectively selected from the p-bit significand (ap-1, ap-2, . . . , a0) in register 220 and the p-bit significand (bp-1, bp-2, . . . , b0) in register 230; when one row of the 2n-bit operand code stored in the CROM array 620 is matched with the first n-bit digit and the second n-bit digit, the match detector Unit 630 activates the correspondent wordline in RROM 640 for the matched match-line to output one of the number 22n of 2n-bit multiplication/product codes hardwired in the RROM 640 as a 2n-bit output code.
In one embodiment, the 32-bit single precision floating-point format (q=8 and p=24) comprises 24 bits for the significand. As shown in FIG. 9, we may apply the base-24 (n=4 for the hexadecimal format) to represent a digit, so we then have 24/4=6 digits for the two 6-digit hexadecimal operand multiplication and finally obtain a 48-bit product code (m47 . . . m1m0). The 6-digit base-24 in-memory binary multiplier circuit 250A comprises thirty-six PDP base-24 in-memory multiplier units 910(0)˜(35) (derived from PDP unit 600), six Binary Adders 920(0)˜(5) and five Polynomial Binary Adders 930(0)˜(4). The 6-digit by 6-digit multiplication is carried out by 36=6×6 digit-digit multiply with an array (910 in FIG. 9) of thirty-six PDP multiplier units each storing the 4-bit multiplication table shown in FIG. 8 for simultaneously parallel multiply, six carry-chained Binary Adders (BA 920(j) in FIG. 10) to generate six 7-digit polynomial binary codes and five carry-chained Polynomial Binary Adders (PBA 930(i) in FIG. 11) for five additions of the six 7-digit polynomial binary codes, for i=0˜4; j=0˜5. The Binary Adder (BA 920(j)) receives six 8-bit coefficients/binary codes of the polynomial (A5*BjX5+j+A4*BjX4+j+A3*BjX3+j+A2*BjX2+j+A1*BjX1+j+A0*BjX0+j) to generate the 7-digit 4-bit (4 bits×7) polynomial binary codes for j=0, . . . , 5, where X=24. Each binary adder 920(j) comprises five 4-bit adders and four half adders in a carry-chained configuration. We illustrate the output nodes 921 of BA circuitry 920(j) in FIG. 10 as the following: the 4-bit binary code for the first digit (Least Significant Digit) of the output of each BA 920(j) is directly passed from the PDP unit 910(0+6*j) producing the least significant 4-bit binary code of (A0*Bj). The 20-bit binary code for the middle digits (2nd digit to 6th digit) of the output of each BA 920(j) is then obtained by the binary addition of the least significant four bits of (Ak+1*Bj) and the most significant four bits of (Ak*Bj) for k=0, 1, 2, 3, 4. The 4-bit binary code for the 7th digit of the output of each BA 920(j) is obtained by adding the carry-bit from the 6th digit to the most significant four bits of (A5*Bj). In brief, the operation of a first binary adder BA 920(0) is equivalent to converting 8-bit first coefficients of a first polynomial of degree 5 (i.e., A5*B0X5+A4*B0X4+A3*B0X3+A2*B0X2+A1*B0X1+A0*B0X0) into 4-bit second coefficients of a second polynomial of degree 6 (i.e., C6X6+C5X5+C4X4+C3X3+C2X2+C1X1+C0X0) in mathematics; the operation of a second binary adder BA 920(1) is equivalent to converting 8-bit first coefficients of a first polynomial of degree 6 (i.e., A5*B1X6+A4*B1X5+A3*B1X4+A2*B1X3+A1*B1X2+A0*B1X1) into 4-bit second coefficients of a second polynomial of degree 7 (i.e., C13X7+C12X6+C11X5+C10X4+C9X3+C8X2+C7X1) in mathematics; . . . ; the operation of a sixth binary adder BA(5) is equivalent to converting 8-bit first coefficients of a first polynomial of degree 10 (A5*B5X10+A4*B5X9+A3*B5X8+A2*B5X7+A1*B5X6+A0*B5X5) into 4-bit second coefficients of a second polynomial of degree 11 (C41X11+C40X10+C39X9+C38X8+C37X7+C36X6+C35X5) in mathematics, where X=24. A total of six 7-digit polynomial codes (i.e., a total of forty-two 4-bit second coefficients C0˜C41) are simultaneously generated by six binary adders BA 920(0)˜(5) for the following polynomial additions.
The schematic of PBA 930(i) in a carry-chained configuration for i=0, 1, 2, 3, 4 is shown in FIG. 11. Each polynomial binary adder 930(i) comprises a (6×4)-bit adder and four half adders in a carry-chained configuration. The output nodes for the most significant 24 bits of the “0th” 7-digit polynomial code (from BA 920(0)) and the output nodes for the 28-bits of the “1st” 7-digit polynomial code (from BA 920(1)) are respectively connected to the input nodes ((pIi)27(pIi)26 . . . (pIi)4) and ((pIi+1)27(pIi+1)26 . . . (pIi+1)1(pIi+1)0) of the PBA 930(0) shown in FIG. 11, for i=0; the output nodes for the most significant 24 bits of the 7-digit polynomial code (from PBA 930(i−1)) and the output nodes for the 28-bits of the “(i+1)th” 7-digit polynomial code (from BA 920(i+1)) are respectively connected to the input nodes ((pIi)27(pIi)26 . . . (pIi)4) and ((pIi+1)27(pIi+1)26 . . . (pIi+1)1(pIi+1)0) of the PBA 930(i) shown in FIG. 11, for i=1˜4. PBA 930(i) outputs the voltage signals at the output nodes, ((pai)27(pai)26 . . . (pai)1(pai)0) for the “ith” polynomial addition. In FIG. 9, the voltage signals of multiplication of two significands at the nodes (m47m46 . . . m1m0) consist of the voltage signals of the most significant 28-bits at the output nodes (m47˜m20) of PBA 930(4), and the voltage signals of the least significant 20 bits respectively from the least significant 4-bit output nodes (m19˜m16) of PBA 930(3), the least significant 4-bit output nodes (m15˜m12) of PBA 930(2), the least significant 4-bit output nodes (m11˜m8) of PBA 930(1), and the least significant 4-bit output nodes (m7˜m4) of PBA 930(0) along with the least significant 4-bit output node (m3˜m0) of BA 920(0) as shown in FIG. 9. The operations of the PBA 930(0)˜(4) are equivalent to lining up and adding like terms of the above second polynomials of degrees ranging from 6 to 11 to obtain 4-bit third coefficients of a third polynomial of degree 11 in mathematics. Here, the third polynomial has twelve terms.
For transforming the floating-point number format of the multiplied binary result from the binary in-memory multiplier circuit 250, the floating-point number M in the form of a 2p-bit significand is written by
In term of (q+1)-bit binary format, the exponent em is given by
where (emqemq−1 . . . em1em0)b is the result of the binary addition/subtraction of the above equation, and (esqesq−1 . . . es1es0)b is the result of the binary addition of (eaq−1eaq−2 . . . ea1ea0)b and (ebq−1ebq−2 . . . eb1eb0)b from the previous exponent adder circuit block 240 in FIG. 2.
Meanwhile according to the IEEE 754 floating-point number format, the significands (m2p-1 . . . mpmp-1 . . . m0)b must be left-shifted to obtain the first leading non-zero bit until the exponent bits (emq−1emq−2 . . . em1em0)b all become zeros for the subnormal floating-point numbers (that is, the number of left-shifted bit positions equal to the maximum value “p”). We denote the number of left-shifted bit positions (i.e., a shift distance relative to the MSB m2p-1) to be z and z=zt-12t-1+ . . . +z020:=(0 . . . zt-1 . . . z0)b in the (q+1)-bit binary format, where 0<=z<=(p−1) and t=roundup (log2p). Therefore, the final exponent bits in the (q+1)-bit format is given by
The schematic of the floating-point encoder circuit 270 is shown in FIG. 12. In FIG. 12, the exponent voltage signals at the input nodes (esqesq−1 . . . es1es0) of addition/subtraction circuit block 1230 are received for the binary additions/subtraction for the number equation of (ea+eb+2−2q-1−z). Meanwhile the multiplication voltage signals at the nodes (m2p-1 . . . mpmp-1 . . . m0) are simultaneously sent to both Lead Zero Detector (LZD) 1210 for detecting the first non-zero bit among the most significant p bits of the 2p-bit product from the circuit 250 and the barrel shifter 1240 for shifting the 2p-bit significand. Starting from the MSB m2p-1, the LZD 1210 detects the first non-zero bit among the most significant p bits of the 2p-bit product to turn on the correspondent word-line in the position shifting encoder 1220 to output the voltage signals of the binary code for the number z of left-shifted bit positions. For example, in FIG. 12, if m2p-1=“1” with voltage signal VDD, the LZD circuitry 1210 will turn on the first column word-line of the position shifting encoder 1220 to output the voltage signals of the binary code (0 . . . 0)b both for the 2p-bit barrel shifter 1240 to shift the 2p-bit product (m2p-1 . . . mpmp-1 . . . m0)b from the circuit 250 to the left by zero bit position (z=0) and the addition/subtraction circuit block 1230 for the binary additions/subtraction for the number equation. If m2p-1=“0” with voltage signal VSS and m2p-2=“1” with voltage signal VDD, the LZD circuitry 1210 will turn on the second column word-line of ROM array 1220 to output the voltage signals of binary code (0 . . . 1)b both for the 2p-bit barrel shifter 1240 to shift the 2p-bit product to the left by one bit position (z=1) and the addition/subtraction circuit block 1230 for the binary additions/subtraction for the number equation. Basically, the position shifting encoder 1220 receives the voltage signals from the LZD circuitry 1210 to convert the number z of left-shifted bit positions into a binary code representation. The exponent voltage signals of the addition/subtraction circuit block 1230 at the output nodes (emq−1emq−2 . . . em1em0) and the significand voltage signals of barrel shifter 1240 at the output nodes (rp-2 . . . r1r0) form the voltage signals of a floating-point multiplication number code complied with the IEEE 754 floating-point standard format. Note that the exponent voltage signals for emq+1 node and emq node are used for the flag signals of underflow and overflow situations, respectively.
In one embodiment for the 32-bit single precision (q=8 and p=24) floating-point encoder 270A, the output nodes for the 48-bit multiplication significands (m47 . . . m24m23 . . . m0)b from the 6-digit base-24 in-memory binary multiplier circuit 250A in FIG. 9 are connected to the input nodes of the barrel shifter 1340 along with the output nodes of the most significant 24 bits (m47 . . . m24)b connected to the input nodes of the LZD circuit 1310, and the 9-bit exponent (es8es7 . . . es1es0)b from the binary addition of (ea+eb) by the exponent adder circuit 240 are transmitted to the input nodes of addition/subtraction circuit 1330 shown in FIG. 13. The LZD circuit 1310 comprising multiple NAND gates detects the first leading non-zero bit relative to the MSB (m47) among the most significant 24 bits (m47 . . . m24)b of the 48-bit multiplication significands. The LZD circuit 1310 outputs the voltage signal “VDD” (logic value “1”) at the first leading non-zero bit position detected and voltage signal “VSS” (logic value “0”) otherwise. The output voltage signals of LZD circuit 1310 are inputted into the position shifting encoder 1320. The position shifting encoder 1320 comprises a ROM array with the correspondent word-lines connected with the output nodes of LZD circuit 1310. The predefined binary codes stored in multiple ROM cells of the ROM array 1320 in advance for the number z of left-shifted bit positions are shown in FIG. 14, where z=z424+z323+z222+z121+z020 for (z4z3z2z1z0)b. When the voltage signal “VDD” at the output node of the first leading non-zero bit position is applied to the correspondent word-line in the ROM array 1320, the voltage signals of a predefined binary code z (=(z4z3z2z1z0)b) at the bit-line nodes of the ROM array 1320 are simultaneously outputted to the input nodes of a barrel shifter 1340 in FIG. 15 and an addition/subtraction circuit 1330 in FIG. 16. FIG. 15 shows the schematic of 48-column left-shifting barrel shifter 1340 decoded by the 5-bit input code z (=(z4z3z2z1z0)b) in binary format. The barrel shifter 1340 comprises an array of Transmission Gates (TGs) 1501 with the connections to left-shift the voltage signals from the input nodes (m47m46 . . . m1m0) to the output nodes (r22r21 . . . r1r0) by z bit positions. The connections for the left-shifted bit positions are configured as the followings: (z4z3z2z1z0) are the control nodes for the correspondent rows of TGs to make the connections for left-shifting (16 columns or 0 column) by z4, (8 columns or 0 column) by z3, (4 columns or 0 column) by z2, (2 columns or 0 column) by z1, and (1 column or 0 column) z0, respectively. Thus, there are a total of five multiplexing stages connected in cascaded. The voltage signal “VDD” (logic value “1”) at any node of (z4z3z2z1z0) is applied to turn on the correspondent row of the TG to pass the voltage signals from the connecting inputted nodes to the correspondent left-shifted column output nodes. While the voltage signal “VSS” (logic value “0”) at the nodes of (z4z3z2z1z0) is applied to turn on the correspondent row of TGs to pass the voltage signals from the connecting inputted nodes to the same column output nodes without any left shifting. The voltage signals at the output nodes (r22r21 . . . r1r0) of the barrel shifter 1340 represent the digit voltage signals for the (p−1)-bit significand of the single precision floating-point number complied with the standard IEEE 754 floating-point number code format (p=24).
Meanwhile, the schematic of addition/subtraction circuit (1330 in FIG. 13) for the embodiment of the 32-bit single precision floating-point encoder 270A is shown in FIG. 16. The binary addition circuitry 1610 comprises the logic gate circuit elements 1611, 1612, 1613 for the binary addition of (es8es7es6es5es4es3es2es1es0)b+(000000010)b. The 10-bit output nodes (including the carry bit node Cb) of the addition circuitry 1610 are connected to the input nodes of the binary subtraction circuitry 1620 comprising the logic gate elements 1621, 1623, and 1622 for subtracting (00100z4z3z2z1z0)b from the output value of the binary addition circuitry 1610. The voltage signals at the output nodes (em7em6em5em4em3em2em1em0) represent the digital voltage signals for the q-bit exponent (q=8) of the single precision floating-point number complied with the standard IEEE 754 floating-point number code format. The output voltage signals at the node em9 and em8 with the voltage signal “VDD” (logic value “1”) indicate the “underflow and “overflow” situations, respectively for the single precision floating-point number.
Please note that the above barrel shifter 1240/1340, the exponent adder circuit 240, the binary addition circuitry 1610 and the subtraction circuitry 1620 are utilized as embodiments and not limitations of the invention. In actual implementations, the above barrel shifter 1240/1340 can be replaced with any other types of barrel shifter, such as a crossbar barrel shifter and a barrel shifter implemented as a cascade of parallel multiplexers; the exponent adder circuit 240 and the binary addition circuitry 1610 can be replaced with any other types of binary addition circuitry, such as Carry Save Adder and Look Ahead Adder; the subtraction circuitry 1620 can be replaced with any other types of binary subtraction circuitry; and these also fall in the scope of the invention. Please also note that the above CROM array 620, the RROM array 640 and the ROM arrays 1220/1320 are utilized as embodiments and not limitations of the invention. In actual implementations, the above CROM array 620, the RROM array 640 and the ROM arrays 1220/1320 can be replaced with any other types of memory arrays or equivalent logic components, and this also falls in the scope of the invention.
The aforementioned description of the preferred embodiment of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form or to exemplary embodiment disclosed. Accordingly, the description should be regarded as illustrative rather than restrictive. The embodiment is chosen and described in order to best explain the principles of the invention and its best mode practical application, thereby to enable persons skilled in the art to understand the invention for various embodiment and with various modifications as are suited to the particular use or implementation contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents in which all terms are meant in their broadest reasonable sense unless otherwise indicated. The abstract of the disclosure is provided to comply with the rules requiring an abstract, which will allow a searcher to quickly ascertain the subject matter of the technical disclosure of any patent issued from this disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Any advantages and benefits described may not apply to all embodiment of the invention. It should be appreciated that variations may be made in the embodiment described by persons skilled in the art without departing from the scope of the present invention as defined by the following claims. Moreover, no element and component in the present disclosure is intended to be dedicated to the public regardless of whether the element or component is explicitly recited in the following claims.