Embodiments relate to an in-memory computation circuit and, in particular, to supporting a compensated computation operation.
An in-memory computation (IMC) system stores information in the bit cells of a memory array and performs calculations at the bit cell level. An example of a calculation performed by an IMC system is a multiply and accumulate (MAC) operation where an input array of numbers (x values, also referred to as the feature or coefficient data) are multiplied by an array of computational weights (g values) stored in the memory and the products are added together to produce an output (z values).
By performing these calculations at the bit cell level in the memory, the IMC system does not need to move data back and forth between a memory device and a computing device. Thus, the limitations associated with data transfer bandwidth between devices are obviated and the computation can be performed with lower power consumption.
It is recognized that linearity and drift concerns exist with respect to the memory element and its transconductance which have an adverse effect on the analog computation operation. There is a need in the art to compensate for the foregoing concerns.
In an embodiment, an in-memory computation circuit comprises: a memory array including a plurality of memory cells arranged in a matrix with plural rows and plural columns, the memory cells storing computational weights for an in-memory compute operation, each row including a word line connected to the memory cells of the row, and each column including a bit line connected to the memory cells of the column; a biasing circuit for each bit line; and a column combining circuit configured to combine and integrate analog signals generated at column outputs of the biasing circuits.
Each biasing circuit comprises: a first transistor and second transistor having source-drain paths connected in series between the bit line and the column output; wherein said first transistor is configured to apply a fixed reference voltage level to said bit line; wherein said second transistor is configured as a switching circuit and controlled to turn on for a time duration corresponding to coefficient data for said in-memory compute operation; and wherein an analog signal at the column output is dependent on the coefficient data and computational weight.
In an embodiment, an in-memory computation circuit comprises: a memory array including a plurality of memory cells arranged in a matrix with plural rows and plural columns, the memory cells storing computational weights for an in-memory compute operation, each row including a word line connected to the memory cells of the row, and each column including a bit line connected to the memory cells of the column; a switching circuit for each bit line, each switching circuit coupled between the bit line and a column output and controlled to turn on for a time duration corresponding to coefficient data for said in-memory compute operation; wherein an analog signal at the column output is dependent on the coefficient data and computational weight; and a column combining circuit configured to combine and integrate the analog signals generated at the column outputs of the switching circuits.
In an embodiment, an in-memory computation circuit comprises: a memory array including a plurality of memory cells arranged in a matrix with plural rows and plural columns, the memory cells storing computational weights for an in-memory compute operation, each row including a word line connected to the memory cells of the row, and each column including a bit line connected to the memory cells of the column; a biasing circuit for each bit line, each biasing circuit connected between the bit line and a column output and configured to apply a fixed reference voltage level to said bit line; wherein an analog signal at the column output is dependent on the coefficient data and computational weight; and a column combining circuit configured to combine and integrate the analog signals generated at the column outputs of the biasing circuits.
For a better understanding of the embodiments, reference will now be made by way of example only to the accompanying figures in which:
Reference is now made to
Each memory cell 14 includes a word line WL and a bit line BL. The memory cells 14 in a common row of the matrix are connected to each other through a common word line WL<n>. The memory cells 14 in a common column of the matrix are connected to each other through a common bit line BL<m>.
The word lines WL<1>, . . . , WL<n> are driven by a word line driver circuit 18 which generates word line signals 16 in response to a received address signal (Address). The word line driver circuit 18 decodes the Address and applies the pulse of the word line signal 16 to one word line WL at a time (illustrated here, as an example, as being applied to word line WL<1>). The pulse width of each word line signal 16 is fixed and defined by an on time TON.
It is important to note here that the activation of one word line WL at a time performs a single multiply and accumulate (MAC) operation. In order to perform matrix-vector multiplication (MVM), where k MAC operations are implemented (k being less than or equal to n), a sequence of k word line WL activations are required. Consequently, k word line WL on time (Ton) cycles are necessary for the performance of one full MVM operation.
Biasing circuitry 20 applies a bias (time, voltage and/or current) to each of the bit lines BL in response to feature (or coefficient) data x input to the in-memory computation circuit 10. This feature data may, for example, comprise a plurality of multi-bit digital signals x1, . . . , xm that are processed by the biasing circuits 201, . . . , 20m to generate the bias applied to the corresponding word lines WL<1>, . . . , WL<n>. The analog signal ym at a column output on a given bit line BL<m> (i.e., the bit line charge) is then dependent on a product between the bias applied to the bit line and the transconductance gmn (which corresponds to the programmed resistivity) of the memory cell 14mn selected by the word line WL to which the word line signal 16 is applied. In other words, the memory cell 14 contributes a bit line current for the analog signal ym that is proportional to xm×gmn. So, in the example shown in
A combining circuit 22 combines, for example through an integration operation, the analog signal y1, . . . , ym charges to generate a corresponding decision zn result for the MAC decision operation, where zn=g1n×x1+g2n×x2+ . . . +gmn×xm. Further processing of the decision zn result may, for example, be made by converting the analog decision signal zn to a digital value using an analog-to-digital converter (ADC) which is then processed in a digital signal processing (DSP) circuit.
In a preferred embodiment, each memory cell 14 is a phase change memory (PCM) cell comprising a select circuit (MOSFET transistor, BJT transistor, diode device, etc.) 14t operating as a switching element and a variable resistive element 14r providing the transconductance gmn.
In case of a MOSFET transistor for the select circuit 14t (as shown in
In case of a BJT transistor for the select circuit 14t, the control node (base) of the BJT transistor is connected to the word line WL. The conduction path of the BJT transistor is connected in series with the variable resistive element 14r between the bit line BL and a reference node (for example, ground). More specifically, the emitter of the BJT transistor is connected to a first terminal of the variable resistive element 14r, the collector of the BJT transistor is connected to the reference node (for example, ground), and the second terminal of the variable resistive element 14r is connected to the bit line BL. In this case the WL driver has the opposite polarity respect to the MOS select transistor case.
In case of a diode device for the select circuit 14t, the control node of the select device 14t is connected to the word line WL. The diode path of the select circuit 14t is connected in series with the variable resistive element 14r between the bit line BL and the word line. More specifically, one terminal of the diode device is connected to a first terminal of the variable resistive element 14r, the other is connected to the word line, and the second terminal of the variable resistive element 14r is connected to the bit line BL.
As is well known to those skilled in the art, a PCM-type memory cell 14 is configured to store data using phase change materials (such as chalcogenide) that are capable of stably transitioning between amorphous and crystalline phases according to an amount of heat transferred thereto. The amorphous and crystalline phases exhibit two or more distinct resistances (corresponding to the variable resistive element 14r), in other words two or more distinct transconductances, which are used to distinguish two or more distinct logic states programmable into the memory cell. The amorphous phase exhibits a relatively higher resistance (i.e., a lower transconductance) and thus the current sunk from the bit line BL by the memory cell programmed in this state when selected by assertion of the word line signal at the gate of the select circuit 14t is relatively smaller. Conversely, the crystalline phase exhibits a relatively lower resistance (i.e., a higher transconductance) and thus the current sunk from the bit line BL by the memory cell programmed in this state when selected by assertion of the word line signal at the gate of the select circuit 14t is relatively larger.
In an embodiment for a specific, but non-limiting, example for two distinct logic states: the amorphous phase may represent programming of the memory cell to logic “0” (or reset state) for the associated coefficient weight and the crystalline phase may represent programming of the memory cell to logic “1” (or set state) for the associated coefficient weight. In an embodiment for a three or more distinct logic states: varying degrees of the amorphous phase (with different resistances) plus the crystalline phase may be used to represent programming of the memory cell into three or more corresponding levels.
It will be understood that other memory cell types could instead be used for the array 12. For example, magnetoresistive random access memory (MRAM) cells or resistive random access memory (RRAIVI) cells could be used. The memory cell may alternatively comprise a static random access memory (SRAM) cell.
Reference is now made to
Those skilled in the art will understand that memory cells based on devices like phase change memory cells do not act as ideal resistances. The actual value of transconductance depends on the bit line BL voltage VBL, and thus the transconductance gmn is a function of voltage VBL. This relationship is referred to as a “non-linearity effect.” For this reason, the circuit of
The comparator 68 functions to compare the voltage vm to the ramp signal Vramp. When the voltage vm is less than the voltage of the ramp signal Vramp, the output of comparator 68 causes transistors 64 and 76 to turn on and activate the current mirror circuits. The bit line current from the selected memory cell 14mn is mirrored through transistor 82 to the combining circuit 22. When the voltage of the ramp signal Vramp exceeds the voltage vm, comparator 68 turns off transistors 64 and 76. The duration of time for bit line current for the analog signal ym is accordingly dependent on the bias voltage vm while the BL current is proportional to the transconductance gmn (programmed resistivity) of the selected memory cell 14mn. The analog signal ym is accordingly dependent on the product of bias voltage vm and the transconductance gmn (programmed resistivity) of the selected memory cell 14mn. It will be noted here that the analog signal ym is a charge.
The reference voltage Vref applied to the source terminal of transistor 72 may be generated using a voltage regulator circuit. For example, a low-drop out (LDO) type voltage regulator 92 formed by an amplifier 94 and MOSFET device 96, where the MOSFET device is coupled in series with transistor 72 and gate driven by the output of the amplifier. Feedback from the source of transistor 72 (drain of transistor 96) is provided to the non-inverting input of the amplifier, and the inverting input receives the reference voltage Vref which may be generated using a band-gap circuit. The voltage at the source of transistor 72 (drain of transistor 96) is regulated to equal the reference voltage Vref. The LDO type regulator 92 may be used as well for generating the reference voltage Vref for the circuit of
Reference is now made to
A differential amplifier circuit 112 has an inverting input terminal coupled, preferably directly connected, to the source terminal of transistor 110. An output of the amplifier circuit 112 is coupled, preferably directly connected, to the gate terminal of the transistor 110. The non-inverting input terminal of the amplifier 112 receives a reference voltage Vref. The negative feedback with the transistor 110 and amplifier 112 forces the drain voltage (i.e., the bit line BL<ref> voltage) to equal the reference voltage Vref. Note here that this is analogous to circuits of
Reference is now made to
A differential amplifier circuit 112 has an inverting input terminal coupled, preferably directly connected, to the source terminal of transistor 110. An output of the amplifier circuit 112 is coupled, preferably directly connected, to the gate terminal of the transistor 110. The non-inverting input terminal of the amplifier 112 receives a reference voltage Vref. The negative feedback with the transistor 110 and amplifier 112 forces the drain voltage (i.e., the bit line BL<ref> voltage) to equal the reference voltage Vref. Note here that this is analogous to circuits of
Reference is now made to
It is important to again note here that the activation of one word line WL at a time performs a single multiply and accumulate (MAC) operation. Each MAC operation needed for the in-memory computation, for examine in connection with performing matrix-vector multiplication (MVM), requires a different word line selection.
The MAC operation performed may be mathematically described as follows:
Where: VOUT,n, i.e., zn, is the output voltage across the output capacitor Cout in response to the bit line currents from the memory cell transconductances g1n to gmn, QOUT,i is the corresponding charge contribution of BL<i>, where i=1, . . . , m, COUT is the capacitance of the output capacitor Cout, iBL,i is the mirrored bit line current from the memory cell 14in, TON,i is the duration of time for bit line BL<i> current flow for the analog signal yi, gi,n, is the transconductance of the memory cell 14i,n corresponding to its programmed weight, Vref is the reference voltage, vi is the input voltage corresponding to the feature (or coefficient) data xi, CRAMP is the capacitance of the ramp generator circuit capacitor 102, gref is the transconductance of the reference memory cell 14ref.
The foregoing mathematical representation is applicable to the use of the circuits shown in
The use of the reference memory cell 14ref in connection with the generation of the ramp signal further supports removal of the weight time (drift) dependence. It is known in the art that the conductance of the memory cells 14 tends to decrease due to amorphization and relaxation of the crystal lattice for the phase change material. In particular, this conductance drift can be modeled and shaped by empirical law:
Where: t is time, t0 is an arbitrary time instant, α is the drift coefficient, and G0 is the conductance at time t0.
Considering the contribution of a single bit line BL current only, then the output voltage is given by:
Considering now the whole MAC operation, where all BL currents are summed and integrated on the COUT capacitance:
If αmn˜αref, meaning that the memory cells 14mn and the reference memory cell 14ref suffer from substantially the same drift, the drift coefficient is zero and drift is compensated for using the circuit.
Reference is now made to
Again, it will be noted that the activation of one word line WL at a time performs a single multiply and accumulate (MAC) operation. Each MAC operation needed for the in-memory computation, for examine in connection with performing matrix-vector multiplication (MVM), requires a different word line selection.
Operation is this case provides for:
Where: VOUT,n, i.e., zn, is the output voltage across the output capacitor Cout in response to the bit line currents from the memory cell transconductances g1n to gmn, QOUT,i is the corresponding charge contribution of BL<i>, where i=1, . . . , m, COUT is the capacitance of the output capacitor Cout, iBL,i is the mirrored bit line current from the memory cell 14in, TON,i is the duration of time for bit line BL<i> current flow for the analog signal yi, gi,n is the transconductance of the memory cell 14i,n corresponding to its programmed weight, Vref is the reference voltage, vi is the input voltage corresponding to the feature (or coefficient) data xi, CRAMP is the capacitance of the ramp generator circuit capacitor 102, gref is the transconductance of the reference memory cell 14ref.
The foregoing mathematical representation is applicable to the use of the circuits shown in
The current mirror circuits 221, 222, . . . , 22m in
In an alternative embodiment, however, B, C and D are not equal. In this case, each of the currents iBL1, iBL2, . . . , iBLm generated by the current mirror circuits 221, 222, . . . , 22m are given different weights for the integration process. For example only, in an embodiment supporting binary weighting: B=4, C=2 and D=1. The currents iBL1, iBL2, . . . , iBLm for the integration in this case will be binary weighted. For this implementation, the biasing circuits 201, 202, . . . , 20m connected to the current mirror circuits 221, 222, . . . , 22m will be driven in response to a common input vm as shown in
It will be noted that with the use of the biasing circuit 20m embodiment as shown in
Reference is now made to
In this configuration, all of the bit-cells 14mnP together may be considered to form a first bank of the memory array and all of the bit-cells 14mnN together may be considered to form a second bank of the memory array.
Each memory cell 14mn includes a word line WL and a pair of bit lines BL. The pair of bit-cells 14mnP and 14mnN in the memory cells 14 in a common row of the matrix are connected to each other through a common word line WL<n>. The positive bit-cells 14mnP of the memory cells 14 in a common column of the matrix are connected to each other through a common positive bit line BL<m>P, while the negative bit-cells 14mnN of the memory cells 14 in a common column of the matrix are connected to each other through a common negative bit line BL<m>N.
The word lines WL<1>, . . . , WL<n> are driven by a word line driver circuit 18 which generates word line signals 16 in response to a received address signal (Address). The word line driver circuit 18 decodes the Address and applies the pulse of the word line signal 16 to one word line WL at a time (illustrated here, as an example, as being applied to word line WL<1>). The pulse width of each word line signal 16 is fixed and defined by an on time TON.
It is important to note here that the activation of one word line WL at a time performs a single multiply and accumulate (MAC) operation. In order to perform matrix-vector multiplication (MVM), where k MAC operations are implemented (k being less than or equal to n), a sequence of k word line WL activations are required. Consequently, k word line WL on time (Ton) cycles are necessary for the performance of one full MVM operation.
Biasing circuitry 20 applies a bias (time, voltage and/or current) to the positive and negative bit lines BL in response to feature (or coefficient) data x input to the in-memory computation circuit 10. This feature data may, for example, comprise a plurality of multi-bit digital signals x1, . . . , xm that are processed by the biasing circuits 201P, 201N, . . . , 20mP, 20mN to generate the bias applied to the corresponding word lines WL<1>P, WL<1>N, . . . , WL<n>P, WL<n>N. The biasing circuit 20mP is coupled to the positive bit line BL<m>P and the biasing circuit 20mN is coupled to the negative bit line BL<m>N, with those biasing circuits both receiving the digital signal xm. The positive or negative analog signal ymP, ymN on the positive or negative bit line BL<m>P, BL<m>N (i.e., the bit line charge) at the column output is then dependent on a product between the bias applied to the bit line and the transconductance gmnP or gmnN (which corresponds to the programmed resistivity) of the bit-cell of the memory cell 14mn selected by the word line WL to which the word line signal 16 is applied. In other words, the memory cell 14 contributes either a (positive) bit line current for the positive analog signal ymP that is proportional to xm×gmnP, or a (negative) bit line current for the negative analog signal ymN that is proportional to xm×gmnN. So, for an example where the word line signal 16 is applied to word line WL<1>, and the positive bit-cell 1411P of the memory cell 1411 is programmed with the in-memory computation weight, the positive analog signal y1P current on the positive bit line BL<1>P is proportional to x1×g11P, and the negative analog signal y1N current on the negative bit line BL<1>N is zero. Conversely, if instead the negative bit-cell 1411N of the memory cell 1411 is programmed with the in-memory computation weight, the positive analog signal y1P current on the positive bit line BL<1>P is zero, and the negative analog signal y1N current on the negative bit line BL<1>N is proportional to x1×g11N. A similar operation is performed for each column.
The biasing circuits 201P, 201N, . . . , 20mP, 20mN of the biasing circuitry 20 may have any one of the circuit configurations as shown in
A combining circuit 22 combines, for example through an integration operation, the analog signal y1P, y1N, . . . , ymP, ymN currents at the column outputs to generate a corresponding decision zn result for the MAC decision operation, where zn=(±g1n×x1)+(±g2n×x2)+ . . . +(±gmn×xm), and where the ± symbol indicates a taking into account of whether the weight gmn is positive or negative. Further processing of the decision zn result may, for example, be made by converting the analog decision signal zn to a digital value using an analog-to-digital converter (ADC) which is then processed in a digital signal processing (DSP) circuit.
Reference is now made to
The current mirror circuits 221P, 221N, . . . , 22mP, 22mN in
In an alternative embodiment, however, B and C are not equal. In this case, the currents iBL1P, iBL1N generated by the current mirror circuits 221P, 221N and the currents iBLmP, iBLmN generated by the current mirror circuits 22mP, 22mN are given different weights for the integration process. For example only, in an embodiment supporting binary weighting: B=2, C=1. The currents for the integration in this case will be binary weighted. For this implementation, the biasing circuits 201P, 201N, . . . , 20mP, 20mN connected to the current mirror circuits 221P, 221N, . . . , 22mP, 22mN will be driven in response to input vm, as shown in
It will be noted that with the use of the biasing circuit 20m embodiment as shown in
Reference is now made to
Each memory cell 14 includes a word line WL and a pair of bit lines BL. The pair of bit-cells 14mnW and 14mnS in the memory cells 14 in a common row of the matrix are connected to each other through a common word line WL<n>. The weight bit-cells 14mnW of the memory cells 14 in a common column of the matrix are connected to each other through a common weight bit line BL<m>W, while the sign bit-cells 14mnS of the memory cells 14 in a common column of the matrix are connected to each other through a common sign bit line BL<m>S.
The word lines WL<1>, . . . , WL<n> are driven by a word line driver circuit 18 which generates word line signals 16 in response to a received address signal (Address). The word line driver circuit 18 decodes the Address and applies the pulse of the word line signal 16 to one word line WL at a time (illustrated here, as an example, as being applied to word line WL<1>). The pulse width of each word line signal 16 is fixed and defined by an on time TON.
It is important to note here that the activation of one word line WL at a time performs a single multiply and accumulate (MAC) operation. In order to perform matrix-vector multiplication (MVM), where k MAC operations are implemented (k being less than or equal to n), a sequence of k word line WL activations are required. Consequently, k word line WL on time (Ton) cycles are necessary for the performance of one full MVM operation.
Biasing circuitry 20 applies a bias (time, voltage and/or current) to the weight bit lines BL in response to feature (or coefficient) data x input to the in-memory computation circuit 10. This feature data may, for example, comprise a plurality of multi-bit digital signals x1, . . . , xm that are processed by the biasing circuits 201W, . . . , 20mW to generate the bias applied to the corresponding word lines WL<1>W, . . . , WL<n>W. The analog signal ymW on the weight bit line BL<m>W, (i.e., the weight bit line charge) at the column output is then dependent on a product between the bias applied to the bit line and the transconductance gmnW (which corresponds to the programmed resistivity) of the weight bit-cell of the memory cell 14mn selected by the word line WL to which the word line signal 16 is applied. In other words, the weight bit-cell contributes a bit line current for the analog signal ymW that is proportional to xm×gmnW. So, for an example where the word line signal 16 is applied to word line WL<1>, and the weight bit-cell 1411W of the memory cell 1411 is programmed with the in-memory computation weight, the analog signal y1W current on the weight bit line BL<1>W is proportional to x1×g11W. A similar operation is performed for each column.
The biasing circuits 201W, . . . , 20mW of the biasing circuit 20 may have any one of the circuit configurations as shown in
Sign bit lines BL<m>S can optionally have biasing circuitry 20, but the applied bias is not dependent on the feature (or coefficient) data x input to the in-memory computation circuit 10. Circuit embodiments for the biasing circuits 20mS on the sign bit lines BL<m>S are shown in
The analog signal ymS on the sign bit line BL<m>S, (i.e., the sign bit line charge) is dependent on the transconductance gums (which corresponds to the programmed resistivity) of the sign bit-cell of the memory cell 14mn selected by the word line WL to which the word line signal 16 is applied. So, for an example where the word line signal 16 is applied to word line WL<1>, and the sign bit-cell 1411S of the memory cell 1411 is programmed with a positive sign, there is a zero analog signal y1S current on the sign bit line BL<1>S. Conversely, if the sign bit-cell 1411S of the memory cell 1411 is programmed with a negative sign, there is a non-zero analog signal y1S current on the sign bit line BL<1>S. A similar operation is performed for each column.
A combining circuit 22 combines, for example through an integration operation, the analog signal y1W, . . . , ymW currents, as a function of the analog signal y1S, . . . , ymS currents (which indicate whether the weight gmnW is positive or negative), to generate a corresponding decision zn result for the MAC decision operation, where zn=(±g1n×x1)+(±g2n×x2)+ . . . +(±gmn×xm), and where the ± symbol indicates the taking into account of whether the weight is positive or negative. Further processing of the decision zn result may, for example, be made by converting the analog decision signal zn to a digital value using an analog-to-digital converter (ADC) which is then processed in a digital signal processing (DSP) circuit.
Reference is now made to
The currents sourced to/sunk from the node 166 are applied to an integrator circuit 220. The output voltage for the decision zn result of the MAC decision operation is generated by the integrator circuit 220 across an integration capacitor Cout. The integrator circuit 220 includes a differential amplifier 222 having an inverting input terminal configured to receive a output reference voltage Vref,out. The integration capacitor Cout is connected in feedback between the output terminal of the amplifier 222 and the non-inverting input terminal. It is the sum of the currents iBL1P (or iBL1N), iBLmP (or iBLmN) applied to node 166 that is integrated on the output capacitor Cout. A switch formed, for example, by a MOSFET device connected in parallel with the capacitor Cout is selectively activated by reset signal (Reset) to discharge the capacitor Cout at the beginning of each MAC decision operation.
The current mirror circuits 221, . . . , 22m in
In an alternative embodiment, however, B and C are not equal. In this case, the currents iBL1, iBLm generated by the current mirror circuits 221, 22m are given different weights for the integration process. For example only, in an embodiment supporting binary weighting: B=2, C=1. The currents iBL1, iBLm for the integration in this case will be binary weighted. For this implementation, the biasing circuits 201W, . . . , 20mW connected to the current mirror circuits 221, . . . , 22m will be driven in response to input vm as shown in
It will be noted that with the use of the biasing circuit 20m embodiment as shown in
Reference is now made to
Each memory cell 14 includes a word line WL and a pair of bit lines BL. The pair of bit-cells 14mnW and 14mnS in the memory cells 14 in a common row of the matrix are connected to each other through a common word line WL<n>. The weight bit-cells 14mnW of the memory cells 14 in a common column of the matrix are connected to each other through a common weight bit line BL<m>W, while the sign bit-cells 14mnS of the memory cells 14 in a common column of the matrix are connected to each other through a common sign bit line BL<m>S.
The word lines WL<1>, . . . , WL<n> are driven by a word line driver circuit 18 which generates word line signals 16 in response to a received address signal (Address). The word line driver circuit 18 decodes the Address and applies the pulse of the word line signal 16 to one word line WL at a time (illustrated here, as an example, as being applied to word line WL<1>). The pulse width of each word line signal 16 is fixed and defined by an on time TON.
It is important to note here that the activation of one word line WL at a time performs a single multiply and accumulate (MAC) operation. In order to perform matrix-vector multiplication (MVM), where k MAC operations are implemented (k being less than or equal to n), a sequence of k word line WL activations are required. Consequently, k word line WL on time (Ton) cycles are necessary for the performance of one full MVM operation.
The circuit of
The biasing circuitry 20 applies a bias (time, voltage and/or current) to the weight bit lines BL in response to the remaining bits xmV of the feature (or coefficient) data. The bits x1V, . . . , xmV are processed by the biasing circuits 201W, . . . , 20mW to generate the bias applied to the corresponding word lines WL<1>W, . . . , WL<n>W. The analog signal ymW on the weight bit line BL<m>W, (i.e., the weight bit line charge) at the column output is then dependent on a product between the bias applied to the bit line and the transconductance gmnW (which corresponds to the programmed resistivity) of the weight bit-cell of the memory cell 14mn selected by the word line WL to which the word line signal 16 is applied. In other words, the weight bit-cell contributes a bit line current for the analog signal ymW that is proportional to xmW×gmnW. So, for an example where the word line signal 16 is applied to word line WL<1>, and the weight bit-cell 1411W of the memory cell 1411 is programmed with the in-memory computation weight, the analog signal y1W current on the weight bit line BL<1>W is proportional to x1V×g11W. A similar operation is performed for each column.
The biasing circuits 201W, . . . , 20mW of the biasing circuitry 20 may have any one of the circuit configurations as shown in
There is also biasing circuitry 20 for the sign bit lines BL<m>S, but the applied bias is not dependent on the feature (or coefficient) data x input to the in-memory computation circuit 10. Circuit embodiments for the biasing circuits 20mS on the sign bit lines BL<m>S are shown in
The analog signal ymS on the sign bit line BL<m>S, (i.e., the sign bit line charge) is dependent on the transconductance (which corresponds to the programmed resistivity) of the sign bit-cell of the memory cell 14mn selected by the word line WL to which the word line signal 16 is applied. So, for an example where the word line signal 16 is applied to word line WL<1>, and the sign bit-cell 1411S of the memory cell 1411 is programmed with a positive sign, there is a zero analog signal y1S current on the sign bit line BL<1>S. Conversely, if the sign bit-cell 1411S of the memory cell 1411 is programmed with a negative sign, there is a non-zero analog signal y1S current on the sign bit line BL<1>S. A similar operation is performed for each column.
The combining circuit 22 combines, for example through an integration operation, the analog signal y1W, . . . , ymW currents, as a function of the analog signal y1S, . . . , ymS currents (which indicate whether the weight gmn is positive or negative) and the sign bit xmS (which indicate whether the multi-bit digital signal xm is a positive or negative feature data value), to generate a corresponding decision zn result for the MAC decision operation, where zn=(±g1n×±x1V)+(±g2n×±x2V)+ . . . +(±gmn×±xmV), and where the ± symbols indicate the taking into account of whether the weight and/or feature data value is positive or negative. Further processing of the decision zn result may, for example, be made by converting the analog decision signal zn to a digital value using an analog-to-digital converter (ADC) which is then processed in a digital signal processing (DSP) circuit.
Reference is now made to
When the bit line current from the sign bit-cell 14mnS is less than the reference current iref, this being indicative of the sign bit-cell 14mnS is storing a value for a positive sign, the logic signal at node 174 is logic low, and when the sign bit xmS is also logic low, this being indicative of the remaining bits xmV of the feature (or coefficient) data having a positive sign, the control signal for the current switching circuit output by the XOR gate will be logic low causing the transistor 162 to turn on (and the transistor 160 to turn off). The positive current iBLmP is then sourced to node 166.
When the bit line current from the sign bit-cell 14mnS is less than the reference current iref, this being indicative of the sign bit-cell 14mnS is storing a value for a positive sign, the logic signal at node 174 is logic low, and when the sign bit xmS is logic high, this being indicative of the remaining bits xmV of the feature (or coefficient) data having a negative sign, the control signal for the current switching circuit output by the XOR gate will be logic high causing the transistor 160 to turn on (and the transistor 162 to turn off). The negative current iBLmN is then sunk from node 166.
When the bit line current from the sign bit-cell 14mnS is more than the reference current iref, this being indicative of the sign bit-cell 14mnS is storing a value for a negative sign, the logic signal at node 174 is logic high, and when the sign bit xmS is logic low, this being indicative of the remaining bits xmV of the feature (or coefficient) data having a positive sign, the control signal for the current switching circuit output by the XOR gate will be logic high causing the transistor 160 to turn on (and the transistor 162 to turn off). The negative current iBLmN is then sunk from node 166.
When the bit line current from the sign bit-cell 14mnS is more than the reference current iref, this being indicative of the sign bit-cell 14mnS is storing a value for a negative sign, the logic signal at node 174 is logic high, and when the sign bit xmS is logic high, this being indicative of the remaining bits xmV of the feature (or coefficient) data having a negative sign, the control signal for the current switching circuit output by the XOR gate will be logic low causing the transistor 162 to turn on (and the transistor 160 to turn off). The positive current iBLmP is then sourced to node 166.
The currents sourced to/sunk from the node 166 are applied to an integrator circuit 220. The output voltage for the decision zn result of the MAC decision operation is generated by the integrator circuit 220 across an integration capacitor Cout. The integrator circuit 220 includes a differential amplifier 222 having an inverting input terminal configured to receive a output reference voltage Vref,out. The integration capacitor Cout is connected in feedback between the output terminal of the amplifier 222 and the non-inverting input terminal. It is the sum of the currents iBL1P (or iBL1N), . . . , iBLmP (or iBLmN) applied to node 166 that is integrated on the output capacitor Cout. A switch formed, for example, by a MOSFET device connected in parallel with the capacitor Cout is selectively activated by reset signal (Reset) to discharge the capacitor Cout at the beginning of each MAC decision operation.
The current mirror circuits 221, . . . , 22m in
In an alternative embodiment, however, B and C are not equal. In this case, the currents iBL1, iBLm generated by the current mirror circuits 221, 22m are given different weights for the integration process. For example only, in an embodiment supporting binary weighting: B=2, C=1. The currents iBL1, iBLm for the integration in this case will be binary weighted. For this implementation, the biasing circuits 201W, . . . , 20mW connected to the current mirror circuits 221, . . . , 22m will be driven in response to input vm as shown in
It will be noted that with the use of the biasing circuit 20m embodiment as shown in
Reference is now made to
Each memory cell 14 includes a word line WL and a bit line BL. The memory cells 14 in a common row of the matrix are connected to each other through a common word line WL. The memory cells 14 in a common column of the matrix are connected to each other through a common bit line BL. The word lines WL<1>, . . . , WL<n> are driven by a word line driver circuit 18 which generates word line signals 16 in response to a received address signal (Address). The word line driver circuit 18 decodes the Address and applies the pulse of the word line signal 16 to one word line WL at a time (illustrated here, as an example, as being applied to word line WL<1>). The pulse width of each word line signal 16 is fixed and defined by an on time TON.
It is important to note here that the activation of one word line WL at a time performs a single multiply and accumulate (MAC) operation. In order to perform matrix-vector multiplication (MVM), where k MAC operations are implemented (k being less than or equal to n), a sequence of k word line WL activations are required. Consequently, k word line WL on time (Ton) cycles are necessary for the performance of one full MVM operation.
The circuit of
The biasing circuitry 20 applies a bias (time, voltage and/or current) to the bit lines BL in response to the remaining bits xmV of the feature (or coefficient) data. The bits x1V, . . . , xmV are processed by the biasing circuits 201, . . . , 20m to generate the bias applied to the corresponding word lines WL<1>, . . . , WL<n>. The analog signal ym on the bit line BL<m>, (i.e., the weight bit line charge) at the column output is then dependent on a product between the bias applied to the bit line and the transconductance (which corresponds to the programmed resistivity) of the memory cell 14mn selected by the word line WL to which the word line signal 16 is applied. In other words, the memory cell contributes a bit line current for the analog signal ym that is proportional to xmV×gmn. So, for an example where the word line signal 16 is applied to word line WL<1>, and the memory cell 1411 is programmed with the in-memory computation weight, the analog signal y1 current on the bit line BL<1> is proportional to x1V×g11.
The biasing circuits 201W, . . . , 20mW of the biasing circuitry 20 may have any one of the circuit configurations as shown in
A combining circuit 22 combines, for example through an integration operation, the analog signal y1, . . . , ym currents as a function of the sign bit xmS (which indicates whether the multi-bit digital signal xm is a positive or negative feature data value), to generate a corresponding decision zn result for the MAC decision operation, where zn=(g1n×±x1V)+(g2n×±x2V)+ . . . +(gmn×±xmV), and where the ± symbol indicates the taking into account of whether the feature data is positive or negative. Further processing of the decision zn result may, for example, be made by converting the analog decision signal zn to a digital value using an analog-to-digital converter (ADC) which is then processed in a digital signal processing (DSP) circuit.
Reference is now made to
When the sign bit xmS is logic low, this being indicative of the remaining bits xmV of the feature (or coefficient) data having a positive sign, the control signal for the current switching circuit will be logic low causing the transistor 162 to turn on (and the transistor 160 to turn off). The positive current iBLmP is then sourced to node 166.
When the sign bit xmS is logic high, this being indicative of the remaining bits xmV of the feature (or coefficient) data having a negative sign, the control signal for the current switching circuit will be logic high causing the transistor 160 to turn on (and the transistor 162 to turn off). The negative current iBLmN is then sunk from node 166.
The currents sourced to/sunk from the node 166 are applied to an integrator circuit 220. The output voltage for the decision zn result of the MAC decision operation is generated by the integrator circuit 220 across an integration capacitor Cout. The integrator circuit 220 includes a differential amplifier 222 having an inverting input terminal configured to receive a output reference voltage Vref,out. The integration capacitor Cout is connected in feedback between the output terminal of the amplifier 222 and the non-inverting input terminal. It is the sum of the currents iBL1P (or iBL1N), . . . , iBLmP (or iBLmN) applied to node 166 that is integrated on the output capacitor Cout. A switch formed, for example, by a MOSFET device connected in parallel with the capacitor Cout is selectively activated by reset signal (Reset) to discharge the capacitor Cout at the beginning of each MAC decision operation.
The current mirror circuits 221, . . . , 22m in
In an alternative embodiment, however, B, C and D are not equal. In this case, each of the currents iBL1, iBL2, . . . , iBLm generated by the current mirror circuits 221, 222, . . . , 22m are given different weights for the integration process. For example only, in an embodiment supporting binary weighting: B=4, C=2 and D=1. The currents iBL1, iBL2, . . . , iBLm for the integration in this case will be binary weighted. For this implementation, the biasing circuits 201, . . . , 20m connected to the current mirror circuits 221, . . . , 22m will be driven in response to input vm as shown in
It will be noted that with the use of the biasing circuit 20m embodiment as shown in
Reference is now made to
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
8699273 | Binboga | Apr 2014 | B2 |
8902678 | Dimartino et al. | Dec 2014 | B2 |
8953360 | Bedeschi | Feb 2015 | B2 |
9396795 | Jeloka et al. | Jul 2016 | B1 |
9508446 | Chen et al. | Nov 2016 | B1 |
9859008 | Kim | Jan 2018 | B1 |
9887011 | Hung | Feb 2018 | B1 |
10056145 | Backhausen et al. | Aug 2018 | B2 |
10073733 | Jain et al. | Sep 2018 | B1 |
10319449 | Yang | Jun 2019 | B1 |
10373682 | Parkinson et al. | Aug 2019 | B2 |
10636481 | Chang et al. | Apr 2020 | B1 |
10643677 | Yabe | May 2020 | B2 |
10692570 | Al-Shamma | Jun 2020 | B2 |
10762958 | Pyo et al. | Sep 2020 | B2 |
10831446 | Chen et al. | Nov 2020 | B2 |
10943652 | Lu et al. | Mar 2021 | B2 |
11024393 | Zhang et al. | Jun 2021 | B1 |
11043259 | Wentzlaff et al. | Jun 2021 | B2 |
11061646 | Sumbul et al. | Jul 2021 | B2 |
11100987 | Mantegazza et al. | Aug 2021 | B1 |
20180256069 | Suda et al. | Sep 2018 | A1 |
20190276970 | Lee | Sep 2019 | A1 |
20210033648 | Khaddam-Aljameh et al. | Feb 2021 | A1 |
20210035636 | Nazarian | Feb 2021 | A1 |
20210134343 | Li et al. | May 2021 | A1 |
20210271597 | Verma et al. | Sep 2021 | A1 |
20210279036 | Li et al. | Sep 2021 | A1 |
20210334639 | Tran | Oct 2021 | A1 |
20210342671 | Hoang et al. | Nov 2021 | A1 |
20210343320 | Horng et al. | Nov 2021 | A1 |
20220044099 | Conte | Feb 2022 | A1 |
20220068380 | Carissimi et al. | Mar 2022 | A1 |
Number | Date | Country |
---|---|---|
2021137894 | Jul 2021 | WO |
2021158861 | Aug 2021 | WO |
Entry |
---|
Biswas, et al: “CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks,” IEEE Journal of Solid-State Circuits 54, No. 1, Dec. 17, 2018, pp. 217-230. |
Chi, et al: “PRIME: A Novel Processing-In-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory,” ACM SIGARCH Computer Architecture News 44, No. 3, Jun. 18, 2016, pp. 27-39. |
Mittal, Sparsh: “A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks,” Machine Learning and Knowledge Extractions 1, No. 1, Mar. 2019, pp. 75-114. |
Lelmini, Daniele et al: “Device and Circuit Architectures for In-Memory Computing,” Advanced Intelligent Systems, Sep. 2020, 19 pages. |
Xie, Chenchen, et al: “Speeding Up the Write Operation for Multi-Level Cell Phase Change Memory with Programmable Ramp-Down Current Pules,” Micromachines 2019, www.mdpi.com/journal/micromachines, 13 pages. |
Khaddam-Aljameh R et al.: “Hermes Core—A 14NM CMOS and PCM-Based in Memory Compute Core Using an Array of 300PS/LSB Linearized CCO-Based ADCS and Local Digital Processing”, 2021 Symposium on VLSI Circuits Digest of Technocal Papers, 3 pgs. |
Mayahinia Mahta et al.: “A Voltage-Controlled, Oscillation-Based ADC Design for Computation-in-Memory Architectures Using Emerging ReRAMs,” ACM Journal on Emerging Technologies in Computing Systems, vol. 18, No. 2, Article 32, Pub date Mar. 2022, 26 pgs. |
Zhang Xueyong et al.: “A 0.11-0.38 pJ/cycle Differential Ring Oscillator in 65 nm CMOS for Robust Neurocomputing,” IEEE Transactions on Circuits and Systems—I: Regular Papers, vol. 68, No. 2, Feb. 2021, 15 pgs. |
Number | Date | Country | |
---|---|---|---|
20230326524 A1 | Oct 2023 | US |