The present disclosure relates to in-memory computing (IMC) circuits, devices, and systems.
Many applications, such as Artificial intelligence (AI) and machine learning algorithms, and image signal processing, rely on matrix multiplications that have many multiplication and addition operations, which can be referred to as multiply accumulate (MAC) operations. The MAC operations can be performed by software or hardware, such as circuits and devices. In-memory computing (IMC) is a technology that uses memory devices in a hardware system to execute MAC operations.
Embodiments of the present disclosure include a system having an in-memory computing (IMC) circuit to perform multiply accumulate (MAC) operations of a weight number with one or more input numbers. The weight number can have a first number of bits and an input number can have a second number of bits. The IMC circuit can generate a product or a partial product of the weight number and the input number. In some embodiments, all the bits of the input number can be provided to the IMC circuit in parallel. In addition, the same weight number can be applied to a plurality of input numbers each having the second number of bits.
Embodiments of the present disclosure include a device or an IMC circuit including a first set of IMC cells having a first number of IMC cells and a second set of IMC cells having the first number of IMC cells. The first set of IMC cells can be configured to generate a first bit-product of a weight number having the first number of bits and a first bit of an input number having a second number of bits. Similarly, the second set of IMC cells can be configured to generate a second bit-product of the weight number and a second bit of the input number. A first IMC cell of the first set of IMC cells includes a first bit-wise multiplication circuit configured to multiply a first bit of the weight number and the first bit of the input number, and a second IMC cell of the second set of IMC cells includes a second bit-wise multiplication circuit configured to multiply a second bit of the weight number and the second bit of the input number.
In some embodiments, the first IMC cell includes a first memory cell configured to store the first bit of the weight number, and where the first IMC cell is coupled to a second memory cell configured to store the first bit of the input number. The first memory cell or the second memory cell can include a static random-access memory (SRAM) cell. The first bit-wise multiplication circuit or the second bit-wise multiplication circuit can include one or more of a NOR gate and a NAND gate. In some embodiments, the first bit of the weight number stored in the first memory cell can be coupled to an other IMC cell of the second set of IMC cells to provide the first bit of the weight number to the other IMC cell so that content of the first memory cell of the first IMC cell can be shared by the other IMC cell of the second set of IMC cells.
In some embodiments, the first bit-wise multiplication circuit can be configured to receive the first bit of the input number, and the second bit-wise multiplication circuit can be configured to receive the second bit of the input number. In some embodiments, the first bit of the input number and the second bit of the input number can be selected from a set of Booth encoded bits of the input number.
In some embodiments, the device can further include a first set of shifters and a second set of shifters. The first set of shifters can be coupled to the first set of IMC cells and configured to shift the first bit-product by a first position to generate a first shifted bit-product, and the second set of shifters can be coupled to the second set of IMC cells and configured to shift the second bit-product by a second position to generate a second shifted bit-product. In some embodiments, a multiple-bit adder can be coupled to the first set of shifters and the second set of shifters and configured to add the first shifted bit-product and the second shifted bit-product to generate a partial product of a product of the weight number and the input number.
In some embodiments, the input number can be a first input number received by the first set of IMC cells and the second set of IMC cells at a first time instance. In addition, the first set of IMC cells can be further configured to generate a third bit-product of the weight number and a first bit of a second input number having the second number of bits received at a second time instance after the first time instance. The second set of IMC cells can be configured to generate a fourth bit-product of the weight number and a second bit of the second input number received at the second time instance. In some embodiments, the first set of IMC cells and the second set of IMC cells can be configured to receive the first input number and the second input number, respectively, at a double pumped speed. In some embodiments, a third IMC cell of the first set of IMC cells can include a third bit-wise multiplication circuit configured to multiply a third bit of the weight number and the first bit of the second input number, and a fourth IMC cell of the second set of IMC cells can include a fourth bit-wise multiplication circuit configured to multiply a fourth bit of the weight number and the second bit of the second input number.
In some embodiments, a memory circuit can include a set of memory cells configured to store a weight number having a first number of bits and a control circuit configured to provide an input number to an IMC circuit, where the input number has a second number of bits. The memory circuit can further include the IMC circuit with a matrix of IMC cells having a total number of IMC cells configured to receive in parallel the second number of bits of the input number and perform bit-wise multiplications in parallel, where the total number is a product of the first number and the second number. An IMC cell of the matrix of IMC cells can include a bit-wise multiplication circuit configured to multiply a first bit of the input number and a second bit of the weight number. In some embodiments, the bit-wise multiplication circuit can include one or more of a NOR gate and a NAND gate. In some embodiments, the IMC cell can include a memory cell configured to store the second bit of the weight number, the IMC cell can be coupled to an other memory cell configured to store the first bit of the input number, and the memory cell can include a static random-access memory (SRAM) cell.
In some embodiments, the matrix of IMC cells can be configured to generate the second number of bit-products including a first bit-product and a second bit-product. The first bit-product can be generated by a first set of IMC cells for a product of the first bit of the input number multiplied with the weight number. The second bit-product can be generated by a second set of IMC cells for a product of a second bit of the input number multiplied with the weight number.
In some embodiments, the first set of IMC cells can be coupled to a first set of shifters configured to shift the first bit-product by a first position to generate a first shifted bit-product, and the second set of IMC cells can be coupled to a second set of shifters configured to shift the second bit-product by a second position to generate a second shifted bit-product. In some embodiments, the first set of shifters and the second set of shifters can be included in the memory circuit. In some embodiments, the first set of shifters and the second set of shifters can be coupled to a multiple-bit adder and configured to add the first shifted bit-product and the second shifted bit-product to generate a partial product of a product of the weight number and the input number.
In some embodiments, a method can include generating, by a bit-wise multiplication circuit, a bit-wise product by multiplying a bit of a weight number having a first number of bits and a bit of an input number having a second number of bits. The method can further include generating, by a first set of IMC cells, a first bit-product of the weight number and a first bit of the input number, where the first bit-product includes the first number of bits, a bit of the first bit-product includes a first bit-wise product generated by a first bit-wise multiplication circuit. In addition, the method can include generating, by a second set of IMC cells including the first number of IMC cells, a second bit-product of the weight number and a second bit of the input number, where the second bit-product includes the first number of bits and a bit of the second bit-product includes a second bit-wise product generated by a second bit-wise multiplication circuit.
In some embodiments, the method can further include shifting, by a first set of shifters coupled to the first set of IMC cells, the first bit-product by a first position to generate a first shifted bit-product; shifting, by a second set of shifters coupled to the second set of IMC cells, the second bit-product by a second position to generate a second shifted bit-product; and generating, by a multiple-bit adder coupled to the first set of shifters and the second set of shifters, a sum of the first shifted bit-product and the second shifted bit-product as a partial product of a product of the weight number and the input number.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, according to the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.
The following disclosure provides many different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components and arrangements are described below to simplify the present disclosure. These are merely examples and are not intended to be limiting. In addition, the present disclosure repeats reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and, unless indicated otherwise, does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
Multiplication of two matrices can require a large number of multiplications and additions of numbers, which can be referred to as multiply accumulate (MAC) operations. The MAC operations can include a multiplication of two numbers, such as a weight number W of a first number of bits denoted by an integer i, W=Wi . . . W1W0, and an input number P of a second number of bits denoted by an integer j, P=Pj . . . P1P0, where W and P represent binary numbers. In some embodiments, W=Wi . . . W1W0 and P=Pj . . . P1P0 can represent an integer, a pseudo floating point, a signed integer, an unsigned integer, or a number of other format of various number of bits. In some embodiments, the weight number W=Wi . . . W1W0 or the input number P=Pj . . . P1P0 can be 2 bits, 4 bits, 8 bits, 16 bits, or other suitable number of bits. In some embodiments, a two bit number, such as a two bit weight number W=W1W0, can be used as an example to illustrate various techniques presented herein. However, any description for a two bit weight number W=W1W0 can be applicable to a weight number of different number of bits.
To generate a product W*P=Wi . . . W1W0*Pj . . . P1P0, a unit of operation can include a bit-wise multiplication circuit to generate a bit-wise product W*Pt by multiplying Ws, a bit of the weight number W selected from Wi, . . . , W1, W0, and Pt, a bit of the input number P selected from P, . . . P1, P0. The bit-wise multiplication circuit can be implemented in various ways, including an analog circuit or a digital circuit. In addition, the bit-wise multiplication circuit can be included in an in-memory computing (IMC) circuit within a memory circuit or a circuit outside of the memory circuit.
To multiply W=Wi . . . W1W0 with P=Pj . . . P1P0, an IMC circuit including multiple IMC cells may be used to generate the bit-wise product W*Pt, where Ws=Wi, . . . , W1, W0 and Pt=Pj, . . . P1P0. In some embodiments, a total number N=i*j of IMC cells can be used, where each IMC cell can include a bit-wise multiplication circuit to generate bit-wise product W*Pt. To multiply the input number P=Pj . . . P1P0 with the weight number W=Wi . . . W1W0, the bits Pj . . . P1, P0 of input number P=Pj . . . P1P0 can be supplied to a set of IMC cells in a bit-serial fashion where the bits Pj . . . P1, P0 arrive in sequence over time in multiple clock cycles (e.g., one bit per unit of time or per clock cycle). Additionally and alternatively, bits Pj . . . P1P0, P0 of the input number P=Pj . . . P1P0 can be supplied to a set of IMC cells in a parallel fashion, where the bits Pj . . . . P0, P0 arrive in parallel during the same clock cycle.
The embodiments herein present designs for a memory circuit including an IMC circuit to perform MAC operations including the operations defined above. In some embodiments, a memory circuit can include a set of memory cells configured to store the weight number W=Wi . . . W1W0 having the first number of bits and a control circuit configured to provide the input number P=Pj . . . P1P0 to the IMC circuit, where the input number P=Pj . . . P1P0 has a second number of bits. The memory circuit can further include the IMC circuit including a matrix of IMC cells having a total number N=i*j of IMC cells configured to receive in parallel the second number of bits of the input number P=Pj . . . P1P0 and perform N=i*j bit-wise multiplications in parallel. An IMC cell of the matrix of IMC cells can include a bit-wise multiplication circuit configured to multiply a first bit Pt of the input number P=Pj . . . P1P0 and a second bit Ws of weight number W=Wi . . . W1W0 to generate the bit-wise product W*Pt. In some embodiments, the bit-wise multiplication circuit can include a NOR gate or a NAND gate. When implemented within the IMC circuit, the NOR gate or the NAND gate can be implemented by a few transistors, such as less than 10 transistors. In some embodiments, the IMC cell can include a memory cell configured to store the second bit Ws of the weight number W, the IMC cell can be coupled to an other memory cell configured to store the first bit Pt of the input number P=Pj . . . P1P0, and the memory cell can include a static random-access memory (SRAM) cell, a magnetoresistive random-access memory (MRAM) cell, any other suitable type of non-volatile random-access memory (NVRAM), or any other suitable type of memory.
In some embodiments, there can be multiple input numbers, which can be denoted as P0, . . . , Pk, where each input number can have j bits. Therefore, the input numbers can be denoted as P0=P0j . . . P01P00, P1=P11 . . . P11P10, . . . Pk=Pkj . . . Pk1Pk0. Embodiments herein can calculate a product of W=Wi . . . W1W0 with any of the input numbers, such as P0, . . . , Pk, to derive P0*W, . . . , Pk*W, and may further add the products to derive a sum of product S=P0*W+ . . . +Pk*W. The MAC operations described herein can include all operations to derive the sum of product S=P0*W+ . . . +Pk*W.
In some embodiments, system 100 can include a memory circuit 105 coupled to an out-of-memory computing block 120 to work together with IMC circuit 110 to perform MAC operations. Out-of-memory computing block 120 may be further coupled to a processing circuit 130 to provide the result of MAC operations to other applications. In some embodiments, processing circuit 130 may include an arithmetic logic unit (ALU), such as an ALU used in a processor. In addition, memory circuit 105 can be coupled to a persistent storage device 101 where input numbers, such as P=Pj . . . P1P0, P0=P0j . . . P01P00, P1=P1j . . . P11P10, Pk=Pkj . . . Pk1Pk0, or the weight number W=Wi . . . W1W0, may be initially stored.
In some embodiments, memory circuit 105 can include IMC circuit 110 coupled to a driver circuit 109 and a control circuit 104. IMC circuit 110 can include a set 114 of memory cells having memory cells 114a, 114b, or more configured to store the weight number W=Wi . . . W1W0 having a first number of bits, and a set 113 of IMC cells including an IMC cell 113a, an IMC cell 113b, or more. Each IMC cell can include a bit-wise multiplication circuit. For example, IMC cell 113a includes a bit-wise multiplication circuit 112a to generate a bit-wise product W*Pt by multiplying a bit Ws of weight number W=Wi . . . W1W0 and a bit Pt of input number P=Pj . . . P1P0. In addition, IMC cell 113b includes a bit-wise multiplication circuit 112b. In some embodiments, the input number P=Pj . . . P1P0 can be supplied as input number 107, and the weight number W=Wi . . . W1W0 can be provided as a weight number 106 that can be saved into set 114 of memory cells. A control signal 103 can be provided to control circuit 104 to control driver circuit 109 to provide input number 107 to IMC circuit 110 by controlling the activation of driver circuit 109.
In some embodiments, IMC circuit 110 can include a matrix of IMC cells 113 having a total number N=i*j of IMC cells 113 configured to receive in parallel the second number j of bits of the input number P=Pj . . . P1P0 and perform bit-wise multiplications in parallel to generate a bit-wise product W*Pt. An IMC cell, such as IMC cell 113a, can include a bit-wise multiplication circuit, such as bit-wise multiplication circuit 112a, configured to multiply a first bit Pt of the input number P=Pj . . . P1P0 and a second bit Ws of the weight number W=Wi . . . W1W0. In some embodiments, bit-wise multiplication circuit 112a or 112b can include a NOR gate or a NAND gate. When implemented within IMC circuit 110, the NOR gate or the NAND gate can be implemented by a few transistors, such as 4 transistors. The small number of transistors used to implement the NOR gate or NAND gate can make the bit-wise multiplication circuit compact and economic to be included in IMC circuit 110, according to some embodiments. In some embodiments, IMC cell 113a can include a memory cell configured to store the second bit Ws of the weight number. IMC cell 113a can be coupled to an other memory cell configured to store the first bit Pt of the input number P=Pj . . . P1P0. The memory cells can include a static random-access memory (SRAM) cell, such as a 6T SRAM cell.
In some embodiments, the matrix of IMC cells can be configured to generate the second number of bit-products including a first bit-product 115 and a second bit-product 117. First bit-product 115 can be generated by a first set of IMC cells for a product of the first bit Pt1 of the input number P=Pj . . . P1P0 multiplied with the weight number W=Wi . . . W1W0. Hence, first bit-product 115 has a value of (Pt1*Wi) . . . (Pt1*W1) (Pt1*W0). Second bit-product 17 can be generated by a second set of IMC cells for a product of a second bit Pt2 of the input number P=Pj . . . P1P0 multiplied with the weight number W=Wi . . . W1W0, which can have a value of (Pt2*Wi) . . . (Pt2*W1) (Pt2*W0). Overall, there can be total j such bit-products generated for each bit of the input number P=Pj . . . P1P0. Each bit-product can be generated by a bit-product circuit with more details shown in
In some embodiments, the first set of IMC cells can be coupled to a first set 121a of shifters configured to shift first bit-product 115 by a first position to generate a first shifted bit-product 122, and the second set of IMC cells can be coupled to a second set 121b of shifters configured to shift second bit-product 117 by a second position to generate a second shifted bit-product 124. The first position and the second position may have different integer values. First set 121a of shifters or second set 121b of shifters can include total i shifters corresponding to the number of bits of the bit-product, such as first bit-product 115 or second bit-product 117. In some embodiments, first set 121a of shifters and second set 121b of shifters can be included in memory circuit 105. In some embodiments, first set 121a of shifters and second set 121b of shifters can be coupled to a multiple-bit adder 123 and configured to add first shifted bit-product 122 and second shifted bit-product 124 to generate a partial product 125 of a product of the weight number W=Wi . . . W1W0 and the input number P=Pj . . . P1P0. Multiple-bit adder 123 can be implemented in various way, including a multi-level adder tree of different adders.
In some embodiments, IMC circuit 210 can include a first set 211 of IMC cells having a first IMC cell 211a and a second IMC cell 211b and can also include a second set 213 of IMC cells having a first IMC cell 213a and a second IMC cell 213b. For a weight number W=Wi . . . W1W0, first set 211 or second set 213 of IMC cells can have i cells corresponding to the number of bits in the weight number W. First set 211 of IMC cells can generate a first bit-product 221 of the weight number W=W1W0 and a first bit P0 of the input number P=P1P0. The value of the weight number W=W1W0 can be saved in memory cells 205a, 205b, respectively. First bit-product 221 can have a first bit 221a with a value of P0*W0 and can have a second bit 221b with a value of P0*W1. First bit-product 221 can have i bits corresponding to the number of bits in the weight number W. A bit of first bit-product 221 can have a value of P0*Ws, where WS can be a bit of the weight number W=Wi . . . W1W0.
Similarly, second set 213 of IMC cells can generate a second bit-product 223 of the weight number W=W1W0 and a second bit P1 of the input number P=P1P0. Second bit-product 223 can have a first bit 223a with a value of Pi*W0 and a second bit 223b with a value of P1*W1. The value of the weight number W=W1W0 can be saved in memory cells 207a, 207b, respectively.
In some embodiments, memory cells 205a, 205b used to store the weight number W=W1W0 together with first set 211 of IMC cells used to generate first bit-product 221 can form a bit-product circuit 210a, which is configured to receive a bit P0 of the input number P=P1P0, and to multiply P0 with the weight number W=W1W0 to generate first bit-product 221 having a value of (P0*W1)(P0*W0). Similarly, memory cells 207a, 207b used to store the weight number W=W1W0 together with second set 213 of IMC cells used to generate second bit-product 223 can form a bit-product circuit 210b, which is configured to receive a bit P1 of the input number P=P1P0, and to multiply Pi with the weight number W=W1W0 to generate second bit-product 223 having a value of (P1*W1)(P1*W0).
In some embodiments, an IMC cell, such as IMC cell 211a, IMC cell 211b, IMC cell 213a, and IMC cell 213b, can include a bit-wise multiplication circuit configured to multiply a bit of the weight number and a bit of the input number. For example, IMC cell 213b can include a bit-wise multiplication circuit 202, which is a NOR gate that is configured to multiply Pi and W1. IMC cell 213b can receive a first bit P1 of the input number P=P1P0 at an input line 205. IMC cell 213b can include a memory cell 203 configured to store W1 of the weight number W=W1W0, and where IMC cell 213b is coupled to a memory cell configured to store Pi through input line 205. Memory cell 203 or the memory cell used to store P1 can include a SRAM cell, such as a 6 transistor SRAM cell. In some embodiments, bit-wise multiplication circuit 202 can include a NOR gate or a NAND gate, which can be implemented by 4 transistors. Accordingly, bit-wise multiplication circuit 202 is a digital bit-wise multiplication circuit instead of an analog multiplication circuit. In addition, bit-wise multiplication circuit 202 is directly coupled to the output line of memory cell 203. In comparison with placing a bit-wise multiplication circuit away from memory cell 203, embodiments herein can reduce the time needed for the output of memory cell 203 to travel to the bit-wise multiplication circuit placed elsewhere. Accordingly, the direct connection between the output line of memory cell 203 and bit-wise multiplication circuit 202 can improve the performance of IMC circuit 210 and the performance of MAC operations.
In some embodiments, out-of-memory computing block 220 can include a first set 215 of shifters and a second set 217 of shifters. First set 215 of shifters can be coupled to first set 211 of IMC cells and configured to shift first bit-product 221 by a first position to generate a first shifted bit-product 225. Second set 217 of shifters can be coupled to second set 213 of IMC cells and configured to shift second bit-product 223 by a second position to generate a second shifted bit-product 227. In some embodiments, the first position and the second position being shifted have different values, and the first position can have value 0, which means the first shifted bit-product 225 is not shifted and has the same value as first bit-product 221.
In some embodiments, a multiple-bit adder 219 can be coupled to first set 215 of shifters and second set 217 of shifters and configured to add first shifted bit-product 225 and second shifted bit-product 227 to generate a product 229 of the weight number W=W1W0 and the input number P=P1P0, which has a value of [2(P1*Wi)(P1*W0)]+(P0*W1)(P0*W0).
In some embodiments, some or all of first set 215 of shifters and second set 217 of shifters, as well as multiple-bit adder 219, can be included in a memory circuit, such as memory circuit 105. By including first set 215 of shifters, second set 217 of shifters, or multiple-bit adder 219 into the memory circuit, the speed of the MAC operations can be improved. Hence, there can be different embodiments that can implement the designs shown in
In some embodiments, IMC circuit 230 is coupled to out-of-memory computing block 240 to perform MAC operations to generate a product of a weight number W=W1W0 and an input number P=P1P0. IMC circuit 230 can include a first set 231 of multiple IMC cells and a second set 233 of multiple IMC cells. First set 231 of multiple IMC cells are configured to receive a bit P0 of the input number P=P1P0 and to multiply P0 with the weight number W=W1W0 to generate first bit-product 241 having a value of (P0*W1)(P0*W0). The value of the weight number W=W1W0 can be saved in memory cells 235a, 235b, respectively. Similarly, second set 233 of multiple IMC cells are configured to receive a bit P1 of the input number P=P1P0, and to multiply Pi with the weight number W=W1W0 to generate second bit-product 243 having a value of (P1*W1)(P1*W0). Instead of having separated memory cells to store the weight number W=W1W0 for second set 233 of multiple IMC cells, second set 233 of multiple IMC cells share the weight number W=W1W0 stored in memory cells 235a, 235b. Accordingly, W0 is stored in memory cell 235a, coupled to an IMC cell 231a of first set 231 of multiple IMC cells to generate first bit-product 241, and also coupled to another IMC cell 233a of second set 233 of IMC cells to provide the first bit W0 of the weight number W=W1W0 to IMC cell 233a so that the content of memory cell 235a can be shared by IMC cell 233a of second set 233 of IMC cells. By sharing the content of memory cell 235a, IMC circuit 230 can save chip area and reduce cost.
In some embodiments, out-of-memory computing block 240 can include a first set 235 of shifters and a second set 237 of shifters. First set 235 of shifters can be coupled to first set 231 of IMC cells and configured to shift first bit-product 241 by a first position to generate a first shifted bit-product 245. Second set 237 of shifters can be coupled to second set 233 of IMC cells and configured to shift second bit-product 243 by a second position to generate a second shifted bit-product 247. In some embodiments, a multiple-bit adder 239 can be coupled to first set 235 of shifters and second set 237 of shifters and configured to add first shifted bit-product 245 and second shifted bit-product 247 to generate a product 249 of the weight number W=W1W0 and the input number P=P1P0, which has of value of [2(P1*Wi)(P1*W0)]+(P0*Wi)(P0*W0).
For simplicity, a symbol 231c is used to represent first set 231 of IMC cells and a symbol 233c is used to represent second set 233 of IMC cells. First set 231 of IMC cells and second set 233 of IMC cells can be coupled to memory cells 235a and 235b, which is represented by memory cells symbol 235c. Memory cells represented by symbol 235c together with symbol 231c representing first set 231 of IMC cells can form a bit-product circuit 230a to generate first bit-product 241. Accordingly, bit-product circuit 230a can be configured to receive a bit P0 of the input number P=P1P0, and to multiply P0 with the weight number W=W1W0 to generate first bit-product 241 having a value of (P0*W1)(P0*W0). Similarly, memory cells represented by symbol 235c together with symbol 233c representing second set 233 of IMC cells can form a bit-product circuit 230b to generate second bit-product 243, which is configured to receive a bit P1 of the input number P=P1P0, and to multiply P1 with the weight number W=W1W0 to generate second bit-product 243 having a value of (P1*W1)(P1*W0).
In some embodiments, at a first time instance T0, input number P0=P01P00 can be received by first set 231 of IMC cells and second set 233 of IMC cells.
First set 231 of multiple IMC cells can receive a bit P00 of the input number P0=P01P00, and to multiply P00 with the weight number W=W1W0 to generate first bit-product 242 having a value of (P00*W1)(P00*W0). Similarly, second set 233 of multiple IMC cells can receive a bit P01 of the input number P0=P01P00, and to multiply P0l with the weight number W=W1W0 to generate second bit-product 244 having a value of (P01*W1)(P01*W0).
In some embodiments, out-of-memory computing block 240 can include a first set 235 of shifters and a second set 237 of shifters. First set 235 of shifters can be coupled to first set 231 of IMC cells and configured to shift first bit-product 242 by a first position to generate a first shifted bit-product. Second set 237 of shifters can be coupled to second set 233 of IMC cells and configured to shift second bit-product 244 by a second position to generate a second shifted bit-product. In some embodiments, a multiple-bit adder 239 can be coupled to first set 235 of shifters and second set 237 of shifters and can be configured to add the first shifted bit-product and the second shifted bit-product to generate a product 248 of the weight number W=W1W0 and the input number P0=P01P00, which has a value of [2(P01*W1)(P01*W0)]+(P00*W0)(P00*W0).
In some embodiments, at a second time instance T1 after the first time instance T0, input number P1=P11P10 can be received by first set 231 of IMC cells and second set 233 of IMC cells. In some embodiments, first set 231 of IMC cells and second set 233 of IMC cells can be configured to receive input number P0=P01P00 and input number Pi=P11P10, respectively, at a double pumped speed. In some embodiments, data is provided at a double pumped speed when the data is provided on both rising and falling edges of a clock signal.
In some embodiments, at the second time instance T1, first set 231 of multiple IMC cells can receive a bit P10 of the input number P1=P11P10 and multiply P10 with the weight number W=W1W0 to generate first bit-product 251 having a value of (P0*Wi)(P10*W0). Similarly, second set 233 of multiple IMC cells can receive a bit P11 of the input number Pi=P11P10, and to multiply P11 with the weight number W=W1W0 to generate second bit-product 253 having a value of (P11*W1)(P1*W0). While generated by the same IMC circuit 230, first bit-product 251 generated at the second time instance T1 can be different from first bit-product 242 generated at the first time instance T0, because the input bits are different at the two time instances.
In some embodiments, first set 235 of shifters can shift first bit-product 251 by a first position to generate a first shifted bit-product, and second set 237 of shifters can shift second bit-product 253 by a second position to generate a second shifted bit-product. In some embodiments, a multiple-bit adder 239 can be coupled to first set 235 of shifters and second set 237 of shifters and configured to add the first shifted bit-product and the second shifted bit-product to generate a product 259 of the weight number W=W1W0 and the input number P1=P11P10, which has a value of [2(P11*Wi)(P11*W0)]+(P10*W1)(P10*W0).
In addition, an adder 257 can be used to add product 259 and product 248 together to generate a sum 258. In some embodiments, adder 257 can be the same adder 239 acting as an adder and accumulator that can save the previous value of the addition operation, product 248, to add a new value, product 259, to the previous value product 248. In some embodiments, adder 257 and adder 239 can be implemented in various ways, such as a single multi-bit adder or a collection of adders forming an adder tree. The value of sum 258 has a value of P0*W+P1*W=(P01P00)*(W1W0)+(P1P10)*(W1W0).
In some embodiments, as shown in
In some embodiments, IMC circuit 310 can include. j bit-product circuits corresponding to the number of bits of input numbers, such as P0, . . . , Pk. Some of the bit-product circuits are shown in
In some embodiments, for each bit-product circuit, out-of-memory computing block 320 can include a corresponding set of shifters coupled to the bit-product circuit. As shown in
In some embodiments, at a first time instance T0, the bits of input number P0, including P0j, . . . ,P01, P00 can be provided in parallel at the same clock cycle to the corresponding bit-product circuits to generate multiple bit-products, such as bit-products 312, 314, 316, and 318. Afterwards, out-of-memory circuit 320 can generate the product W*P0=Wi . . . W1W0*P01 . . . P01P00. Similarly, at a second time instance T1 after the first time instance T0, the bits of input number P1, including P1j, . . . , P11P10, can be provided in parallel at the same clock cycle to the corresponding bit-product circuits to generate multiple bit-products and can further be used to generate the product W*P1=Wi . . . W1W0*P1j . . . P11P10. The rest of input numbers can be provided in sequence. For a time instance Tk, the bits of input number Pk, including Pk, . . . , Pk1, Pk0 can be provided in parallel at the same clock cycle to the corresponding bit-product circuits to generate multiple bit-products and can further be used to generate the product W*Pk=Wi . . . W1W0*Pkj . . . Pk1Pk0. In some embodiments, the input numbers P0, . . . , Pk can be provided at a double pumped speed.
In some embodiments, as shown in
In some embodiments, IMC circuit 330 can include j bit-product circuits corresponding to the number of bits of input numbers, such as P0, . . . , Pk. Some of the bit-product circuits are shown in
In some embodiments, operations of out-of-memory computing block 340 can operate in the same way as described for out-of-memory computing block 320 shown in
In some embodiments, at time instances 70 . . . 7k, input numbers P0, P1, . . . , Pk can be provided to the corresponding bit-product circuits to generate multiple bit-products, such as bit-products 332, 334, 336, and 338. Afterwards, as described for
In some embodiments, IMC circuit 410 can include k bit-product circuits corresponding to the number of input numbers, such as P0, . . . , Pk. Some of the bit-product circuits are shown in
In some embodiments, operations of out-of-memory computing block 420 can operate in the same way as described for out-of-memory computing block 320 shown in
In some embodiments, at time instances T0 . . . Tj, bits of input numbers P0, P1, . . . , Pkcan be provided to the corresponding bit-product circuits to generate multiple bit-products, such as bit-products 412, 414, 416, and 418. Afterwards, similar to the operations described for
At operation 510, a bit-wise multiplication circuit can generate a bit-wise product by multiplying a bit of a weight number having a first number of bits and a bit of an input number having a second number of bits. For example, as shown in
At operation 520, a first set of IMC cells can generate a first bit-product of the weight number and a first bit of the input number, where the first bit-product includes the first number of bits and a bit of the first bit-product includes a first bit-wise product generated by a first bit-wise multiplication circuit. For example, as shown in
At operation 530, a second set of IMC cells including the first number of IMC cells can generate a second bit-product of the weight number and a second bit of the input number, where the second bit-product includes the first number of bits. A bit of the second bit-product includes a second bit-wise product generated by a second bit-wise multiplication circuit. For example, as shown in
Also, system or device 600 can be implemented in a wearable device 660, such as a smartwatch or a health-monitoring device. In some embodiments, the smartwatch can have different functions, such as access to email, cellular service, and calendar functions. Wearable device 660 can also perform health-monitoring functions, such as monitoring a user's vital signs and performing epidemiological functions (e.g., contact tracing and providing communication to an emergency medical service). Wearable device 660 can be worn on a user's neck, implantable in user's body, glasses or a helmet designed to provide computer-generated reality experiences (e.g., augmented and/or virtual reality), any other suitable wearable device, and combinations thereof.
Further, system or device 600 can be implemented in a server computer system, such as a dedicated server or on shared hardware that implements a cloud-based service 670. System or device 600 can be implemented in other electronic devices, such as a home electronic device 680 that includes a refrigerator, a thermostat, a security camera, and other suitable home electronic devices. The interconnection of such devices can be referred to as the “Internet of Things” (IoT). System or device 600 can also be implemented in various modes of transportation 690, such as part of a vehicle's control system, guidance system, and/or entertainment system.
The systems and devices illustrated in
It is noted that references in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” “exemplary,” etc., indicate that the embodiment described can include a particular feature, structure, or characteristic, but every embodiment can not necessarily include the particular feature, structure, or characteristic. Moreover, such phases do not necessarily refer to the same embodiment. Further, when a particular feature, structure or characteristic is described in connection with an embodiment, it would be within the knowledge of one skilled in the art to effect such feature, structure or characteristic in connection with other embodiments whether or not explicitly described.
It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by those skilled in relevant art(s) in light of the teachings herein.
In some embodiments, the terms “about” and “substantially” can indicate a value of a given quantity that varies within 5% of the value (e.g., ±1%, ±2%, ±3%, ±4%, ±5% of the value). These values are merely examples and are not intended to be limiting. The terms “about” and “substantially” can refer to a percentage of the values as interpreted by those skilled in relevant art(s) in light of the teachings herein. Furthermore, the terms “comprising,” “including,” “having,” and the like, as used with respect to embodiments of the present disclosure, are synonymous.
As used hereinafter, including the claims, the term “unit”, “module” or “routine” may refer to, be part of, or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
Where the disclosure recites “a” or “a first” element or the equivalent thereof, such disclosure includes one or more such elements, neither requiring nor excluding two or more such elements. Further, ordinal indicators (e.g., first, second or third) for identified elements are used to distinguish between the elements, and do not indicate or imply a required or limited number of such elements, nor do they indicate a particular position or order of such elements unless otherwise specifically stated.
The terms “coupled with” and “coupled to” and the like may be used herein. “Coupled” may mean one or more of the following. “Coupled” may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements indirectly contact each other, but yet still cooperate or interact with each other, and may mean that one or more other elements are coupled or connected between the elements that are said to be coupled with each other. By way of example and not limitation, “coupled” may mean two or more elements or devices are coupled by electrical connections on a printed circuit board, such as a motherboard, for example. By way of example and not limitation, “coupled” may mean two or more elements/devices cooperate and/or interact through one or more network linkages, such as wired and/or wireless networks. By way of example and not limitation, a computing apparatus may include two or more computing devices “coupled” on a motherboard or by one or more network linkages.
It is to be appreciated that the Detailed Description section, and not the Abstract of the Disclosure section, is intended to be used to interpret the claims. The Abstract of the Disclosure section may set forth one or more but not all possible embodiments of the present disclosure as contemplated by the inventor(s), and thus, are not intended to limit the subjoined claims in any way.
The foregoing disclosure outlines features of several embodiments so that those skilled in the art may better understand the aspects of the present disclosure. Those skilled in the art will appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art will also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.