The application claims the benefit of Taiwan Application No. 111142460, filed on Nov. 7, 2022, at the TIPO, the disclosures of which are incorporated herein in their entirety by reference.
The present disclosure relates to memory arrays and the operating methods thereof, and particularly to memory arrays and the operating methods thereof for use in artificial intelligence.
Deep learning is an important part of Artificial Intelligence (AI) chips, and smart consumer electronic products occupy a pivotal position in both the US and global electronic markets. According to the report of the market research organization IDC, it is estimated that the AI industry will grow at an integral annual growth rate of 17.5% in the future, and the total revenue will exceed US$500 billion by 2024. According to the contents of CES 2022, the exhibition shows that computers, electric vehicles, health care and metaverse applications are the main topic of this exhibition, all of which are extended applications centered on AI smart chips, which shows that the AI technology plays an important role on the development of the industry.
At the Consumer Electronics Show (CES) 2022 in the United States, it can be observed that as the high-speed and low-latency 5G network has gradually matured, coupled with the rise of the Internet of terminal electronic devices, traditional devices are transforming into connected devices. The foundation of the advanced semiconductor technology will further accelerate the rise of various fields such as the smart home, smart wear, smart manufacturing, smart city and self-driving car. Most of these are extended applications with AI smart chips as the main feature, which shows the development and influence of AI technology on the industry. Therefore, the development of low-energy AI chip systems is an inevitable development trend.
Deep learning adopts neural network computations, which will use a large number of multiplication and addition operations. When these operations are executed on traditional Von Neumann architecture that has separated computation and memory storage, it will face the power consumption generated by a large amount of data transfer. In contrast, Compute-in-Memory (CIM) integrates computation and memory storage, which can save a lot of the data access, and has a high potential to achieve low energy consumption and high energy efficiency for the neural network execution.
The accuracy of the neural network is very sensitive to the data bit width, but the floating-point calculation is very heavy on the hardware operation. Therefore, in order to achieve low energy consumption and high calculation load, it is highly potential to run multiple bit multiplication on the CIM architecture.
In prior arts, when CIM performs multi-bit multiplication and addition, (for example, multiply input IN[1:0] with weight W, IN[1:0]*W (weight)), IN[1:0] will be converted to analog voltage as the voltage of the read word line (RWL) to control the original current flow through the RWL on the SRAM cell to be 1 times, ⅔ times or ⅓ times the original current, and t, so that the voltage drop caused by each SRAM cell on the read bit line (RBL) becomes 1 times, ⅔ times or ⅓ times the original voltage drop, thereby realizing the multi-bit multiplication and addition operations. Another method in the prior art is to maintain the RWL at the same operating voltage. By changing the RWL pulse width or increasing the number of RWL pulses, the voltage drop caused by each SRAM cell on the RBL becomes 3 times, 2 times or 1 times the original voltage drop, thereby realizing the multi-bit multiplication and addition operations.
In Document 1 of the U.S. Pat. No. 11,061,646, there are two techniques used. One is to change the pre-charge voltages of the RBL, and thereby generate an SRAM array of multiple voltages to change the current generated by the SRAM cell of each sub-array to represent the weight. It eliminates the need to adjust the RWL voltage level and simplifies the circuit design. The other is to change the pulse width of the RWL, so that the voltage drop generated by the SRAM cell after starting the RWL can be controlled by RWL pulse width to generate weights.
In Document 2 of M. Ali, A. Jaiswal, S. Kodge, A. Agrawal, I. Chakraborty and K. Roy et al., proposed “IMAC: In-Memory Multi-Bit Multiplication and Accumulation in 6T SRAM Array,” in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 8, pp. 2521-2531, Aug. 2020, the method used is to change the pulse width of RWL to control the voltage drop on RBL, to realize the weight of the data.
In Document 3 of M. E. Sinangil et al., proposed “A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS,” in IEEE Journal of Solid- State Circuits, vol. 56, no. 1, pp. 188-198, Jan. 2021, the method used is to increase the number of RWL pulses to control the voltage drop on the RBL to achieve the weight of the data.
In Document 4 of X. Si et al., it proposed “A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro With 8-b MAC Operation for Edge AI Chips,” in IEEE Journal of Solid-State Circuits, vol. 56, no. 9, pp. 2817-2831, Sept. 2021. In Document 5 of X. Si et al., it proposed “15.5 A 28 nm 64 Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips,” 2020 IEEE International Solid-State Circuits Conference—(ISSCC), 2020, pp. 246-248. In Document 6 of X. Si et al., it proposed “24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning,” 2019 IEEE International Solid-State Circuits Conference—(ISSCC), 2019, pp. 396-398. All the Documents 4, 5, and 6 adjust the voltage level of the RWL pulse to change the current generated by the SRAM cell to control the voltage drop of the RBL to achieve data weighting. Document 6 also changes the transistor width of the SRAM cell to increase the current generated by the SRAM cell to control the voltage drop of the RBL and realize data weighting.
The above methods are mainly for processing the analog signal of the RWL pulse. Since the RWL signal is a relatively high-frequency signal, the analog circuit for dynamic processing will have problems of high hardware complexity and high power consumption. In addition, these memory arrays and their operating methods in the conventional technology are all for processing the higher frequency RWL signal, which is difficult to be realized on an analog circuit. Moreover, the resulting energy consumption and area cost are higher, and the operation can only be completed after the analog signal is stable. Therefore, the operating frequency during the CIM macro operation will be limited, and the operation delay will also be limited.
In view of the deficiencies of the prior art, the present disclosure takes the area advantage of the large-scale CIM, and expands the weight of the input feature data in the form of the quantity of bit cells, or uses different operating voltage for the bit cells to generate a weighted current, and the multi-bit multiplication and addition can be achieved by starting the RWL once. Compared with the conventional technology for processing the RWL signal, the computation time can be greatly reduced.
In accordance with one aspect of the present invention, a memory array used for computing-in-memory (CIM) is disclosed. The memory array used for CIM includes a bit cell array, at least one word line, at least one bit line and a reading circuit. The bit cell array includes a plurality of bit cells, wherein each of the plurality of bit cells has a storage bit and operates at an operating voltage, and the plurality of storage bits are associated with a weight bit of a convolutional neural network (CNN). The at least one word line is electrically connected to the bit cell array, wherein the at least one of word line is associated with an input bit of the CNN. The at least one bit line is electrically connected to the bit cell array, wherein the plurality of bit cells are arranged along at least one of a bit line direction and a word line direction, each the at least one bit line has an electrical parameter, a first plurality of bit cells of the bit cell array are arranged along the bit line direction according to a first arrangement quantity, the memory array expands the input bit of the CNN to a plurality of input bits based on at least one of the first arrangement quantity and the operating voltage, and at least one of the first arrangement quantity and the operating voltage is a first weight associated with the first plurality of input bits. The reading circuit is electrically connected to and sensing the electrical parameter of each the at least one bit line to obtain a multiplication and addition result of the plurality of input bits of the CNN and the corresponding weight bits thereof.
In accordance with another aspect of the present invention, a method for operating a memory array is disclosed. The memory array includes a bit cell array including a plurality of bit cells, each of which has a storage bit and an operating voltage, the plurality of storage bits are associated with a weight bit of a convolutional neural network (CNN), and the method includes the following steps: providing at least one word line and at least one bit line electrically connected to the bit cell array, wherein the at least one word line is associated with an input bit of the CNN, the plurality of bit cells are arranged along at least one of a bit line direction and a word line direction, each the at least one bit line has an electrical parameter, and a first plurality of bit cells of the bit cell array are arranged along the bit line direction according to a first arrangement quantity; expanding the input bit of the CNN into a plurality of input bits based on at least one of the first arrangement quantity and the operating voltage; correlating at least one of the first arrangement quantity and the operating voltage with a first weight of the plurality of input bits; and sensing the electrical parameter of the at least one bit line to obtain a multiplication and addition result of the plurality of input bits of the CNN and the corresponding weight bits thereof.
In accordance with a further aspect of the present invention, a memory array for computing-in-memory (CIM) is disclosed. The memory array for computing-in-memory (CIM) includes a bit cell array, at least one word line and at least one bit line. The bit cell array has a plurality of bit cells, wherein each bit cell is operated at an operating voltage. The at least one word line is electrically connected to the bit cell array, wherein the at least one word line is associated with a first parameter. The at least one bit line is electrically connected to the bit cell array, wherein the bit cells extend along a specific direction, each the at least one bit line has an electrical parameter associated therewith, each the bit cell is associated with a second parameter, a first quantity of the plurality of bit cells of the bit cell array extends along the specific direction, and the memory array determines how an expansion associated with at least one of the first parameter and the second parameter is according to the specific direction.
The above objectives and advantages of the present invention will become more readily apparent to those ordinarily skilled in the art after reviewing the following detailed descriptions and accompanying drawings, in which:
Please read the following detailed description with reference to the accompanying drawings of the present disclosure. The accompanying drawings of the present disclosure are used as examples to introduce various embodiments of the present disclosure and to understand how to implement the present disclosure. The embodiments of the present disclosure provide sufficient content for those skilled in the art to implement the embodiments of the present disclosure, or implement embodiments derived from the content of the present disclosure. It should be noted that these embodiments are not mutually exclusive from each other, and some embodiments can be appropriately combined with another one or more embodiments to form new embodiments; that is, the implementation of the present disclosure is not limited to the examples disclosed below. In addition, for the sake of brevity and clarity, relevant details are not excessively disclosed in each embodiment, and even if specific details are disclosed, examples are used only to make readers understand. The present invention will now be described more specifically with reference to the following embodiments. It is to be noted that the following descriptions of the preferred embodiments of this invention are presented herein for the purposes of illustration and description only; they are not intended to be exhaustive or to be limited to the precise form disclosed.
Please refer to
OUT=Σi=0C
The Equation 1 is multi-bit multiplication and addition of neural network operations, where bA is the bit width of the input feature map, bW is the bit width of Weight, and each bit of Input and Weight has a different weight. Therefore, the equation 1 is further disassembled into: OUT=Σi=0C
The partial sum in Equation 1 refers to a result of the single element INi,j [bA-1:0] of the input feature map in CNN multiplied by the single element Wi,j [bW-1:0]. For example, when i=0 and j=0, a single element of the input feature map can be extended to 1st width of multiple bits, such as [bA-1:0], and a single element of the weight can also be extended to 2nd width of multiple bits, such as [bW-1:0]. That is to say, the partial sum represents the multiplication result of the input and the weight of the single element. K in Equation 1 represents the kernel size, Cin represents the input channel, Cout represents the output channel, and OF represents the output feature map of the part. OF includes a single element with multi-bit input multiplied by multi-bit weight, i.e., OF has multi-output of channel Cout. The multi-input channel Cin can be regarded as the result of expanding into multiple bits. For example, when the bit is binary and expanded to 4 bits, then Cin=2{circumflex over ( )}4=16; when the bit is ternary, and the bit is expanded to 4 bits, then Cin=3{circumflex over ( )}4=81, 2{circumflex over ( )}b in the disassembled Equation 1 is 3{circumflex over ( )}b, and so on. The partial sum in Equation 1 is summed from j=0 to (K{circumflex over ( )}2)-1, and summed from i=0 to Cin-1, which is equal to the multiplication and addition result of the input and weight of multiple bit cells. The result of this multiplication and addition is only the result of part output features.
Please refer to
In order to implement multi-bit multiplication and addition in CIM, the present invention proposes to modify the architecture to implement the weights of different input bits of the expanded input IN, or the weights of different weight bits of the expanded weight W, so that the current or voltage difference generated by one SRAM cell or multiple SRAM cells directly generates a Partial Sum. Its principle is as follows:
Equation 2 is the Partial Sum formula of multi-bit multiplication and addition, Equation 3 is the MOSFET current formula, and Equation 4 is the voltage and current formula of RBL.
Please refer to
Please refer to
Please refer to
The weight of W[b] can be realized using the capacitance values 8C0, 4C0, 2C0 and 1C0 of the capacitors on the RBL and the reading circuit 304, but for the RWL position corresponding to IN[3:0], a corresponding quantity of SRAM cells must be used. Taking 4 bits IN as an example, each W[b] must be represented by 42−1=15 SRAM cells.
In another embodiment, the weight of W[b] can be realized by using the quantity of bit cells BC in the word line RWL direction, or can be achieved by providing different operating voltages VDD with different RBL bit cells BC by a voltage regulation circuit, both of which can save the capacitor on the read circuit 304.
Turning on RWL to carry out calculation means multiplying IN and W to execute multiplication and addition in CIM. Each RBL can get W[b]×(8×IN[3]+4×IN[2]+2×IN[1]+1×IN[0]), and finally the read circuit 304 (such as the conversion circuit 305 for converting analog signals to digital signals) performs charge sharing to obtain IN[3:0]×W[3:0] multiplication and addition result.
In another way and structure of extending the input bits, adjusting the operating voltage VDD of the bit cell BC in the SRAM cell structure 42 can also make the circuit represent the multiple bits IN. Please refer to
Please refer to
The embodiment of
Partial Sum=Σb
Please refer to
When writing the weight W of CNN, the corresponding 4 bits IN of the weight data is expanded into 4 bits in the RBL direction, as shown in
After the weight W of CNN is written, the voltage regulation circuit 704 provides four different operating voltages V3, V2, V1, V0 to the SRAM cells on the four RWL[3], RWL[2], RWL[1] and RWL[0], so that the SRAM cell has four output currents I3, I2, I1 and I0 having different weights, which are respectively represented as weights of input bits IN[3], IN[2], IN[1], IN[0], wherein the greater the voltage or current, the greater the weight is, and vice versa.
Then, in this corresponding way of IN and W, RWL is turned on to perform multiplication and addition in CIM, and each RBL can obtain W[b]×(I3×IN[3]+I2×IN[2]+I1×IN[1]+I0×IN[0]), wherein the current I3=2×I2=4×I1=8×I0 generated by the SRAM cell is caused by the voltage regulation circuit 704, and finally the conversion circuit (e.g. ADC convertor) 305 perform charge sharing to obtain the multiplication and addition result of IN[3:0]×W[3:0]. The weight of W[b] can be realized by using RBL and capacitors C3, C2, C1 and C0 having capacitance values 8C0, 4C0, 2C0 and 1C0 respectively on the reading circuit 304.
Please refer to
The embodiment of
Partial Sum=Σb
From the descriptions of the above-mentioned multiple embodiments, the memory arrays 30, 40, 60, 70, and 80 used for in-memory operations (CIM) in the present disclosure can be summarized as follows. A memory array 30, 40, 60, 70, 80 used for computing-in-memory (CIM) includes a bit cell array 301, 401, 601, 701, 801, at least one word line 302, 402, 602, 702, 802, at least one bit line 303, 403, 603, 703, 803, and a reading circuit 304. The bit cell array 301, 401, 601, 701, 801 includes a plurality of bit cells, wherein each of the plurality of bit cells has a storage bit and operates at an operating voltage VDD1, VDD2, . . . , VDDN, and the plurality of storage bits are associated with a weight bit W of a convolutional neural network (CNN). The at least one word line 302, 402, 602, 702 is electrically connected to the bit cell array 301, 401, 601, 701, 801, wherein the at least one of word line 302, 402, 602, 702, 802 is associated with an input bit of the CNN. The at least one bit line 303, 403, 603, 703, 803 is electrically connected to the bit cell array 301, 401, 601, 701, 801, wherein the plurality of bit cells are arranged along at least one of a bit line direction and a word line direction, each the at least one bit line 303, 403, 603, 703, 803 has an electrical parameter, a first plurality of bit cells of the bit cell array 301, 401, 601, 701, 801 are arranged along the bit line RBL direction according to a first arrangement quantity bA, the memory array 30, 40, 60, 70, 80 expands the input bit IN[0] of the CNN to a plurality of input bits IN[bA-1], IN[bA-2], . . . , IN[1], IN[0] based on at least one of the first arrangement quantity bA and the operating voltage VDD1, VDD2, . . . , VDDN, and at least one of the first arrangement quantity bA and the operating voltage VDD1, VDD2, . . . , VDDN is a first weight 2bA-1, 2bA-2, . . . , 21, 20 associated with the first plurality of input bits IN[bA-1], IN[bA-2], . . . , IN[1], IN[0]. The reading circuit 304 is electrically connected to the plurality of bit lines RBL, and senses the electrical parameter of each of the plurality of bit lines RBL to obtain a multiplication and addition result Partial Sum of the plurality of input bits IN[bA-1], IN[bA-2], . . . , IN[1], IN[0] of the CNN and the corresponding weight bits W thereof.
In any embodiment of the present disclosure, the electrical parameter includes at least one of a current, a charge, and a voltage. The memory arrays 30, 40, 60, 70, 80 further include at least one voltage regulation circuit 44 to provide the operating voltages VDD1, VDD2, . . . , VDDN. The magnitude of the operating voltage VDD is positively related to the first weight 2bA-1, 2bA-2, . . . , 21, 20 of the plurality of input bits IN[bA-1], IN[bA-2], . . . , IN[1], IN[0]. The first arrangement number bA is positively related to the first weights 2bA-1, 2bA-2, . . . , 21, 20. The reading circuit 305 includes complex capacitors C3, C2, C1 and C0, and the plurality of capacitors C3, C2, C1 and C0 are respectively one-to-one corresponding to the plurality of bit lines RBL[3], RBL[2], RBL[1], RBL[0] to obtain the Partial Sum of the multiplication and addition result. The memory array 30, 40, 60, 70, 80 further includes an analog-to-digital conversion circuit (ADC) 305 as the read circuit 304, and the Partial Sum of the multiplication and addition result is converted to a digital data by the ADC conversion circuit 305.
In any embodiment of the present disclosure, the plurality of bit cells BC are arranged along a word line RWL direction according to a second arrangement quantity bW. The memory array 30, 40, 60, 70, 80 expands a weight bit W[b] of the CNN to the plurality of weight bits W[bW-1], W[bW-2], . . . , W[1], W[0] according to the second arrangement quantity bW and at least one of the operating voltages VDD1, VDD2, . . . , VDDN, and at least one of the second arrangement quantity bW and the operating voltage VDD1, VDD2, . . . , VDDN is associated with a second weight 2bW-1, 2bW-2, . . . , 21, 20 of the plurality of weight bits W[bW-1], W[bW-2], . . . , W[1], W[0].
Please refer to
In any embodiment of the present disclosure, wherein the plurality of bit cells BC is arranged along the word line RWL direction according to a second arrangement quantity bW, the operation method S10 further includes the following steps: expanding a weight bit W[b] of the CNN into a plurality of weight bits W[bW-1], W[bW-2], . . . , W[1], W[0] according to at least one of the second arrangement quantity bW and the operating voltages VDD1, VDD2, VDDN; and associating at least one of the second arrangement quantity bW and the operating voltages VDD1, VDD2, . . . , VDDN with a second weight 2bW-1, 2bW-2, . . . , 21, 20 of the plurality of weight bits W[bW-1], W[bW-2], . . . , W[1], W[0].
Please refer to
Regarding the expanded input feature map, in the embodiment of
Regarding the expansion weight W, for the RWL position corresponding to IN[3:0], a corresponding quantity of SRAM cells must be used. Taking 4 bits IN as an example, each W[b] must use 8 SRAM cell to represent.
Regarding the write weight W, when writing the weight, expand the weight data corresponding to 4 bits IN in the RBL direction into 4 bits in the RBL direction according to the method of
Regarding reducing the SRAM operating voltage, after the weight W is written, the voltage regulator 904 is used to make the SRAM cells on the four RWLs to use two different operating voltages V1, V0, so that the SRAM cells have two output currents having different weight current.
Regarding starting RWL to perform calculations, in this way of corresponding IN and W, start RWL to perform multiplication and addition for CIM, and each RBL can get W[b]×(I1×4×IN[3]+I1×2×IN[2]+I1×IN[1]+I0×IN[0]), wherein the voltage regulation circuit 904 causes the SRAM cell to generate the current I1=2×I0, and finally the conversion circuit (ADC) 305 performs charge sharing to obtain the multiplication and addition result of IN[3:0]×W[3:0]. The weight magnitude of W[b] can be realized by using RBL and capacitors C3, C2, C1, and C0 on the reading circuit 304 having capacitance values 8C0, 4C0, 2C0, and 1C0.
In any embodiment of the present disclosure, the disclosed memory array and its operation method for in-memory operation (CIM) can also be applied to multiplication and addition of multiple parameters, not limited to multiplication and addition of input features and weights of a single element of CNN. Please refer to
Any embodiment of the present disclosure can be applied to other types of memory cells, such as ReRAM or MRAM.
In any embodiment of the present disclosure, without changing the structure of the bit cell itself, the bit cell BC is statically processed. The embodiments include Embodiment I by changing the quantity of the bit cell BC, Embodiment II by changing the operating voltage of the bit cell BC, and Embodiment III by combining Embodiment I with Embodiment II. Other methods that enable the bit cell BC to generate a weighted current to represent multi-bit input signals are within the scope of the embodiments of the present disclosure.
The implementation of the above Embodiments I, II, III is as follows:
I. This disclosure uses the quantity of the bit cell BC to represent the weight. It only needs to start the read driver 505 (such as RWL) once to complete the multiplication and addition operation, and there is no need to do analog processing on the RWL signal. In the prior art, it is necessary to start RWL multiple times, change the RWL pulse width or change the RWL signal magnitude. In comparison, the present technology can reduce hardware complexity and reduce operation delay.
II. In the present disclosure, the voltage regulator 504 is used to generate different operating voltages VDD1˜VDDN, the weight is represented by changing the operating voltage of the bit cell BC, and multiplication and addition operations are completed only by starting the read driver 505 (such as RWL) once. The present disclosure does not need to modify the structure of the bit cell BC itself, nor do analog processing of the RWL signal. In the prior art, the conventional technologies would modify the bit cell BC or perform dynamic analog processing on the word line RWL to achieve the weight represents, which will increase the complexity of the hardware and increase the power consumption of the circuit. On the contrary, the present technology can have lower hardware complexity by statically changing the operating voltage, and reduce the power consumption of the CIM by lowering the operating voltage at the same time.
III. Combining the above two methods, and the present disclosure to represents the weight by the quantity of bit cells BC and changing the operating voltage of the bit cell BC.
In any embodiment of the present disclosure, the first parameter IN is related to an input bit IN[b] of a convolutional neural network (CNN). Each bit cell BC has a storage bit, and each storage bit is related to a second parameter W, and the second parameter W is related to a weight bit W[b] of the CNN. The weight bit W[b] is associated with a single weight element Wi,j (e.g., W00 in
In any embodiment of the present disclosure, each bit cell BC has a storage bit, and each storage bit is related to a second parameter W, and the second parameter W is related to a weight bit W[b] of the CNN. The second direction DIR2 is a word line RWL direction. A second plurality of bit cells of the bit cell array 50 extends a second quantity bW along with the second direction DIR2, and the memory array 50 determines a second weight magnitude 2bW-1, 2bW-2, . . . , 21, 20 of the second parameter W based on at least one of the second quantity bW and the operating voltages VDD1˜VDDN.
Please refer to
In any embodiment of the present disclosure, the directions DIR1, DIR2 are only along the direction of the bit line RBL, only along the direction of the word line RWL, or along the direction of the word line RWL and along the direction of the bit line RBL. The first parameter IN is related to an input bit IN[b] of a convolutional neural network (CNN). Each bit cell BC has a storage bit related to a second parameter W, and the second parameter W is related to a weight bit W[b] of the CNN. The weight bit is associated with a single weight element Wi,j (e.g., W00 in
While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention need not be limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims which are to be accorded with the broadest interpretation so as to encompass all such modifications and similar structures.
Number | Date | Country | Kind |
---|---|---|---|
111142460 | Nov 2022 | TW | national |