The present invention relates to a compute-in-memory (CIM) design, and more particularly, to a CIM circuit with charge-domain passive summation and an associated method.
A convolutional neural network (CNN) used by an artificial intelligence (AI) application is made up of neurons that have learnable weights. Each neuron receives AI inputs, and performs a dot product (i.e., a convolution operation) upon AI inputs and weights. One conventional approach may employ a central processing unit (CPU) to deal with the convolution operations, which is not a power-efficient solution. Another conventional approach may employ a bit-wise current-based or time-based compute-in-memory (CIM) circuit to deal with the convolution operations, which is neither a power-efficient solution nor a high-accuracy solution. Thus, there is a need for an innovative CIM design with low power consumption and high accuracy.
One of the objectives of the claimed invention is to provide a CIM circuit with charge-domain passive summation and an associated method.
According to a first aspect of the present invention, an exemplary CIM circuit is disclosed. The exemplary CIM circuit includes a processing circuit. The processing circuit includes a data-selection circuit and a charge-domain passive summation circuit. The data-selection circuit includes a memory array and a selection circuit. The memory array is arranged to store a plurality of candidate weights. The selection circuit is arranged to select a target weight from the plurality of candidate weights stored in the memory array. The charge-domain passive summation circuit is arranged to generate an analog computation result of an input received by the processing circuit and the target weight stored in the memory array through a weighted capacitor array integrated with the memory array.
According to a second aspect of the present invention, an exemplary CIM method is disclosed. The exemplary CIM method includes: storing a plurality of candidate weights in a memory array; selecting a target weight from the plurality of candidate weights; and performing, by a weighted capacitor array integrated with the memory array, charge-domain passive summation to generate an analog computation result of an input and the target weight.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Certain terms are used throughout the following description and claims, which refer to particular components. As one skilled in the art will appreciate, electronic equipment manufacturers may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not in function. In the following description and in the claims, the terms “include” and “comprise” are used in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to . . . ”. Also, the term “couple” is intended to mean either an indirect or direct electrical connection. Accordingly, if one device is coupled to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
It should be noted that the present invention has no limitations on the arrangement of word lines (WLs) and bit lines (BLs) of the memory array 108. In one exemplary implementation, the memory array 108 may be designed to have WLs in a horizontal direction and BLs in a vertical direction. In another exemplary implementation, the memory array 108 may be designed to have WLs in a vertical direction and BLs in a horizontal direction.
In some embodiments of the present invention, the CIM circuit 100 may be an analog CIM (ACIM) circuit used by an artificial intelligence (AI) application, and the candidate weights CW1-CWY may be weights of a neural network such as a convolutional neural network (CNN). The selection circuit 110 is arranged to select a target weight Wk (k={1,2, . . . ,Z}) from the candidate weights CW1-CWY stored in the memory arrayl08. For example, the selection circuit 110 of the processing circuit 102_1 may select a target weight W1 (i.e., Wk with k=1) being one of the candidate weights CW1-CWY, the selection circuit 110 of another processing circuit 102_2 may select a target weight W2 (i.e., Wk with k=2) being one of the candidate weights CW1-CWY, and the selection circuit 110 of yet another processing circuit 102_Z may select a target weight WZ (i.e., Wk with k=Z) being one of the candidate weights CW1-CWY. The target weights selected and used by different processing circuits 102_1-102_Z may be the same or may be different from each other. In a case where the CIM circuit 100 is used by an AI application, the CIM circuit 100 may be used to act as one neuron in the CNN, and may be reused to act as another neuron in the CNN. Hence, the candidate weights CW1-CWY may include weights of different neurons in the CNN.
In this embodiment, the CIM circuit 100 is an ACIM circuit that uses the charge-domain passive summation circuit 106 to generate an analog computation result of an analog input AOUT1 (i.e., AOUTk with k=1) received by the processing circuit 102_1 and the target weight W1 (i.e., Wk with k=1, which is one of the candidate weights CW1-CWY stored in the memory array 108) through the weighted capacitor array 112 with a particular capacitance ratio, where the particular capacitance ratio may be adjusted, depending upon actual design considerations. For example, capacitors C1-CN of the weighted capacitor array 112 maybe implemented using MOM (Metal-Oxide-Metal) capacitors, and thus occupy a large layout area in a chip. In this embodiment, the weighted capacitor array 112 of the charge-domain passive summation circuit 106 can be shared among multiple candidate weights CW1-CWY stored in the memory array 108. Hence, the weighted capacitor array 112 can be integrated with the memory array 108 for area optimization. Specifically, in a vertical direction of an integrated circuit, the weighted capacitor array 112 implemented using MOM capacitors may overlay memory cells 114 of the memory array 108 that are used to store the candidate weights CW1-CWY.
In this embodiment, the processing circuits 102_1-102_Z are arranged to receive a plurality of analog inputs AOUT1, AOUT2, AOUTz output from a plurality of external analog buffers 10_1, 10_2, . . . , 10_Z, respectively. For example, each of the external analog buffers 10_1-10_Z may be implemented using a digital-to-analog converter (labeled by “DA”). Hence, the analog inputs AOUT1, AOUT2, . . . , AOUTz are generated by converting a plurality of digital codes DIN1, DIN2, . . . , DINz from a digital domain to an analog domain. Since inputs of the processing circuits 102_1-102_Z are analog signals, node (energy) reduction can be achieved. For example, the processing circuit 102_1 requires only a single node N_IN for receiving only a single analog input AOUT1 (which has a specific voltage level representative of the digital code DIN1) from the external analog buffer 10_1, such that the input power dissipation (fCV2) can be greatly reduced.
As mentioned above, each of the candidate weights CW1-CWY (Y≥2) may be an X-bit weight CWi[X−1:0] & X≥2) and each bit of the X-bit weight CWi[X−1:0] is stored in one memory cell 114 of the memory array 108. Hence, the target weight W1 (i.e., Wk with k=1) has a plurality of bits W1[X−1:0] stored in memory cells 114 in the memory array 108, respectively. In this embodiment, the selection circuit 110 is further arranged to selectively apply the analog input AOUT1 to capacitors C1-CN according to bits W1[X−1:0], respectively. For example, the weighted capacitor array 112 is a binary-weighted capacitor array (N=X−1) consisting of capacitors CN=2X−1C, . . . , C2=2C, and C1=1C. When W1[i] (i={1,2, . . . ,X−1}) is equal to 1, the selection circuit 110 allows the analog input AOUT1 to be delivered to a capacitor Ci of the binary-weighted capacitor array 112 (i.e., VINi=AOUT1). When W1[i] (i={1,2, . . . ,X−1}) is equal to 0, the selection circuit 110 blocks the analog input AOUT1 from being delivered to the capacitor Ci of the binary-weighted capacitor array 112, and allows a reference voltage (e.g., ground voltage GND) to be delivered to the capacitor Ci of the binary-weighted capacitor array 112 (i.e., VINi=GND). In this embodiment, the selection circuit 110 is arranged to control transmission of the analog input AOUT1 by referring to the bits W1[X−1:0] concurrently, thereby enabling a direct multi-bit operation for setting the analog computation result at the charge-domain passive summation circuit 106. Hence, the charge-domain passive summation circuit 106 (particularly, weighted capacitor array 112 of charge-domain passive summation circuit 106) of the processing circuit 102_1 generates an analog computation result (which is an analog output of DIN1×W1[X−1:0]) by combining the voltage signals VIN1-VINN through charge redistribution among the binary-weighted capacitor array CN=2X−1C, . . . , C2=2C, and C1=1C. Since the analog computation result is set by controlling voltage signals VIN1-VINN applied to capacitors C1-CN of the weighted capacitor array 112 according to bits W1[X−1:0], the analog computation result with high accuracy can be generated from the processing circuit 102_1.
Similarly, the charge-domain passive summation circuit 106 (particularly, weighted capacitor array 112 of charge-domain passive summation circuit 106) of anther processing circuit 102_2 generates an analog computation result (which is an analog output of DIN2×W2[X−1:0]) by combining the voltage signals VIN1-VINN through charge redistribution among the binary-weighted capacitor array CN=2X−1C, . . . , C2=2C, and C1=1C; and the charge-domain passive summation circuit 106 (particularly, weighted capacitor array 112 of charge-domain passive summation circuit 106) of yet another processing circuit 102_Z generates an analog computation result (which is an analog output of DINZ×WZ[X−1:0]) by combining the voltage signals VIN1-VINN through charge redistribution among the binary-weighted capacitor array CN=2X−1C, . . . , C2=2C, and C1=1C.
As shown in
For better comprehension of technical features of the present invention, an exemplary circuit design of a processing circuit used by the proposed CIM circuit 100 is illustrated in
Each of the global selection switches SWk and SWj has one terminal that is arranged to receive the analog input AOUTk from an external analog buffer (not shown). One of the global selection switches that corresponds to a memory cell line (e.g., memory cell row or memory cell column) in which the target weight Wk is stored is switched on, and the rest of the global selection switches are switched off. In this embodiment, one switch control signal W_ADD_ENk may be asserted to switch on the global selection switch SWk, and another switch control signal W_ADD_ENj may be deasserted to switch off the global selection switch SWj. Though the candidate weight Wj is not selected as the target weight Wk, the memory cells that store bits of the candidate weight Wj may include input parasitic capacitance Cpar_in. By switching off the global selection switch SWj, the power dissipation resulting from input parasitic capacitance Cpar_in of memory cells that stores bits of the candidate weight Wj can be prevented to achieve energy reduction/power saving.
Suppose that the memory array 108 is an SRAM array, and each of the memory cells 114 is an SRAM cell. Hence, each memory cell 114 may have two bit lines BL and
The cell selection switch SW3 is also a local selection switch integrated with each memory cell 114. In this embodiment, the candidate weights CW1-CWY may be stored in memory cell lines (e.g., memory cell rows or memory cell columns), respectively. The cell selection switches SW3 integrated with the memory array 108 may be categorized into a plurality of cell selection switch groups that correspond to the memory cell lines (e.g., memory cell rows or memory cell columns), respectively. Hence, each of the cell selection switch groups includes cell selection switches SW3, each having one terminal that is coupled to the charge-domain passive summation circuit (particularly, one capacitor of weighted capacitor array 122). For example, the cell selection switch SW3 of the memory cell that stores the bit Wk[X−1] has one terminal coupled to the capacitor 2X−1C of the weighted capacitor array 112, the cell selection switch SW3 of the memory cell that stores the bit Wk[0] has one terminal coupled to the capacitor 1C of the weighted capacitor array 112, and so on. In this embodiment, cell selection switches of one of the cell selection switch groups that corresponds to a memory cell line (e.g., memory cell row or memory cell column) in which the target weight Wk is stored are switched on, and cell selection switches of the rest of the cell selection switch groups are switched off. For example, cell selection switches SW3 of a cell selection switch group that corresponds to a memory cell line (e.g., memory cell row or memory cell column) in which the candidate weight Wj is stored are switched off. Though the candidate weight Wj is not selected as the target weight Wk, the memory cells that store bits of the candidate weight Wj may include cell parasitic capacitance Cpar_cell. By switching off the cell selection switches SW3, the power dissipation resulting from cell parasitic capacitance Cpar_cell of memory cells that stores bits of the candidate weight Wj (which is not selected as the target weight Wk) can be prevented to achieve energy reduction/power saving.
As shown in
Please refer to
However, it is possible that the same transfer curve possessed by the external analog buffers 301 and 302 after auto-zeroing may still deviate from an ideal curve. To address this issue, the calibration of the external analog buffers 301 and 302 may further include aligning a transfer curve of each of the external analog buffers 301 and 302 with a predetermined curve.
Please refer to
As mentioned above, the proposed CIM circuit 100 may be employed by an AI application. For example, the AI application may employ a CNN with multiple layers, and the proposed CIM circuit 100 maybe used by a neuron in one layer and reused by a neuron in another layer. In some embodiments of the present invention, per-layer calibration may be employed for tracking process, voltage, temperature (PVT) variation.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
This application claims the benefit of U.S. Provisional Application No. 63/369,673, filed on Jul. 28, 2022. Further, this application claims the benefit of U.S. Provisional Application No. 63/369,674, filed on Jul. 28, 2022. The contents of these applications are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63369673 | Jul 2022 | US | |
63369674 | Jul 2022 | US |