This application claims priority to Korean Patent Application No. 10-2022-0093494 (filed on Jul. 27, 2022), which is hereby incorporated by reference in its entirety.
The technology relates to a computation apparatus in a memory for computation of signed weights.
In the conventional computation apparatuses, a memory for storing data and a computation apparatus for performing computation are separate from each other, and thus for computation, data stored in the memory is fetched and the fetched data is moved to the computation apparatus so that computation is performed, and then a computation result is moved back to the memory. According to the conventional computation apparatuses, such frequent data transfer causes a time delay, and significant power consumption is generated therefrom. In order to resolve this issue, a computation in-memory (CIM) structure that performs computation in a memory is proposed.
Meanwhile, neural networks are attracting attention. In neural networks, multiply and accumulate (MAC) computation, which is a multiplication operation of multiplying an input element by a weight element of a weight matrix and an accumulation operation of accumulating the multiplication results, needs to be performed. In a simple neural network, a neural network may be implemented using unsigned weight values.
However, according to a CIM structure adopting a non-volatile memory, upon a reboot after a power-off, it is required to read data from an external device in which data is stored, and store the data in the nonvolatile memory again to perform computation, and thus the data transfer causes a significant time delay, and power consumption occurs
Furthermore, when implementing a neural network, it is difficult to ensure the accuracy of using unsigned weights in a complex network, such as a convolution layer.
The embodiment is intended to resolve the above described issues of the related art. In other words, the embodiment is directed to providing an apparatus for a neural network computation in which a time delay occurring due to data transfer and power consumption are reduced while having high accuracy.
The present embodiment includes a computation apparatus located in a memory module and configured to perform computation with data stored in the memory, the computation apparatus including: a plurality of word lines to which an input is provided, a plurality of unit arrays which store a weight having a sign and perform a multiplication operation on the input provided from the word line and the weight, and an accumulation line connected to the plurality of unit arrays and on which results of the multiplication operations performed by the plurality of unit arrays are accumulated, wherein each of the plurality of unit arrays includes a source follower amplifier including a ferroelectric transistor configured to output a voltage corresponding to a result of the multiplication operation with respect to an input voltage provided to the word line.
According to an aspect of the present embodiment, the unit array includes a plurality of source follower amplifiers, a multiplication line connected to sources of the plurality of source follower amplifiers, and a transfer switch configured to control a connection between the multiplication line and the accumulation line.
According to an aspect of the present embodiment, a drain of the source follower amplifier is supplied with a predetermined voltage.
According to an aspect of the present embodiment, the ferroelectric transistor has a threshold voltage corresponding to the weight stored in the ferroelectric transistor, and the source follower amplifier outputs a voltage corresponding to a difference between the threshold voltage and a voltage corresponding to the provided input as a result of the multiplication operation to the multiplication line.
According to an aspect of the present embodiment, the computation apparatus further includes a pre-charge circuit configured to pre-charge the accumulation line with a pre-charge voltage, and when the input corresponds to a logic low, the transfer switch is turned on such that the voltage of the multiplication line is set to the pre-charge voltage.
According to an aspect of the present embodiment, the pre-charge voltage corresponds to a signed zero value.
According to an aspect of the present embodiment, the transfer switch is turned on after the multiplication operations of the plurality of unit arrays are completed to allow a voltage of the accumulation line to correspond to a result obtained by accumulating the results of the multiplication operations.
According to an aspect of the present embodiment, the computation apparatus further includes a discharge switch configured to discharge charges charged in the accumulation line to a reference voltage.
The present embodiment includes a unit array located in a memory module and configured to perform a multiplication operation with data stored in the memory, the unit array including: a plurality of word lines to which an input is provided; a multiplication line from which a result of a multiple operation is output; and a plurality of source follower amplifiers composed of a ferroelectric transistor including a drain to which a predetermined voltage is provided, a gate to which the input is provided, and a source for outputting a result of a multiplication operation of the weight value and the input to the multiplication line, wherein the source follower amplifier outputs a difference between the input and a threshold voltage of the ferroelectric transistor as the result of the multiplication operation.
According to an aspect of the present embodiment, the plurality of source follower amplifiers store different pieces of weight information as different threshold voltages.
According to an aspect of the present embodiment, an input is provided to one of the plurality of word lines, and the source follower amplifier provided with the input outputs the result of the multiplication operation.
According to an aspect of the present embodiment, the unit array further includes an accumulation line connected to the multiplication line and on which the multiplication result is accumulated, and one or more unit arrays connected to the accumulation line, wherein results of multiplication operations of the one or more unit arrays are accumulated on the accumulation line.
According to the embodiment, an apparatus for a neural network computation in which a time delay occurring due to data transfer and power consumption are reduced while having high accuracy can be provided.
Hereinafter, the present embodiment will be described with reference to the accompanying drawings.
In one embodiment, the computation apparatus 10 further includes a pre-charge circuit 200. The pre-charge circuit 200 includes a pre-charge transistor that is turned on with a pre-charge signal PRE to pre-charge the accumulation lines AL0 and AL1 with a pre-charge voltage VPRE. In the embodiment shown in
In one embodiment, the computation apparatus 10 further includes a discharge circuit 300. The discharge circuit 300 is turned on with a discharge signal DSC to flows charges charged in the accumulation lines AL0 and AL1 to a reference potential VSS to discharge the accumulation lines AL0 and AL1. The unit array 100 includes a source follower amplifier 110 composed of plurality of ferroelectric transistors, and a drain of the ferroelectric transistor included in the source follower amplifier 110 is supplied with a preset voltage VSL, and a gate of the ferroelectric transistor is connected to a word line WL0_0, . . . , WL0_N, WL1_0, . . . , or WL1_N to receive an input. In one embodiment, the preset voltage VSL may be a VDD voltage. A source of the ferroelectric transistor is connected to a multiplication line ML0_0, ML0_1, ML1_0, or ML1_1 and outputs a voltage corresponding to a result of a multiplication of the input provided through the gate and the weight stored in the ferroelectric transistor.
The ferroelectric layer may be formed of a ferroelectric material. The ferroelectric material is a material that spontaneously polarizes and forms dipoles even when an electric field is not applied from the outside. When the ferroelectric material is supplied with a voltage greater than or equal to a critical voltage, dipoles formed in the ferroelectric layer are aligned according to a direction of the electric field. In addition, when the ferroelectric material is supplied with an opposite voltage greater than or equal to a critical voltage, the dipoles formed in the ferroelectric layer are aligned according to a direction of the electric field that is formed in the opposite direction. In
Referring to
Orienting positive poles of dipoles toward a substrate brings about an effect similar to that of a decrease in a threshold voltage of a transistor. Thus, in a case in which an electric field is applied such that a sufficiently large number of positive poles of dipoles are oriented toward the substrate, a channel is formed between the source and the drain even when a voltage lower than a voltage in a second state is supplied through the gate electrode, which is indicated by a red diagram shown in
Referring to
Orienting negative poles of dipoles toward a substrate brings about an effect similar to that of an increase in a threshold voltage of a transistor. Thus, in a case in which an electric field is applied such that a sufficiently large number of negative poles of dipoles are oriented toward the substrate, a higher voltage needs to be supplied in the second state compared to the voltage supplied to the gate electrode in the first state in order to flow the same current as that in the first state, which is indicated by a blue diagram shown in
Examples of
Referring to
The source follower amplifier 110 outputs a voltage corresponding to a difference between the input voltage provided to the gate and the threshold voltage to a source. Such a voltage relationship is expressed as the following equation.
V
O
−V
∈
−V
TH [Equation 1]
(Vo: an output voltage of a source follower amplifier, VIN: an input voltage of a source follower amplifier, VTH: a threshold voltage)
The computation apparatus 10 according to the present embodiment may operate in a first phase P1 in which each of the unit arrays performs a multiplication operation on an input and a weight and a second phase P2 in which the results computed in the first phase P1 are accumulated. As an embodiment, the computation apparatus 10 according to the present embodiment may further include a third phase P3 in which the accumulated accumulation line is discharged to initialize the operation results.
In the first phase P1, a preset voltage VSL is provided to the drains of the source follower amplifiers 110 included in the unit array 100. The drains of the source follower amplifiers 110 included in the computation apparatus 10 may all be electrically connected, and the voltage provided in the first phase P1 may be a voltage of VDD, which is a driving voltage of the computation apparatus 10.
A pre-charge signal PRE is provided to turn a pre-charge transistor on. As the pre-charge transistor is turned on, the accumulation lines AL0 and AL1 are pre-charged with a pre-charge voltage VPRE.
An input is provided to a unit array 100 through one of a plurality of word lines connected to the unit array 100. In the embodiment shown in
Each of the ferroelectric transistors of the source follower amplifiers 110 included in the unit array 100 stores a threshold voltage corresponding to a different weight value. Therefore, the source follower amplifier 110 connected to the word line through which an input is provided outputs a voltage corresponding to the difference between the input voltage and the threshold voltage to the source electrode.
Table 1 is a table that illustrates, when ferroelectric transistors store threshold voltages corresponding to a signed weight of 3 bits, multiplication results of an input and a weight and voltages of the multiplication line formed by the source follower amplifier 110.
Referring to Table 1, the operation results of a signed weight and an input are indicated as VG−VTH_0, VG−VTH_1, VG−VTH_2, VG−VTH_3, VG−VTH_4, VG−VTH_5, VG−VTH_6, and VG−VTH_7. Among the results, an operation result of “0” corresponds to a voltage of VG−VTH_4, which is not a ground potential as in the conventional technology but a pre-charge voltage VPRE.
A logic low voltage corresponding to a digital bit “0” may be provided through the word line WL0_0, . . . , or WL0_N as an input, in which case, an operation result of an input and a weight is 0, regardless of the weight value. When an input provided through the word line is 0, a transfer transistor MTR0 is turned on with a transfer signal SGD0, and the multiplication lines ML0_0 and ML0_1 are pre-charged with a pre-charge voltage VPRE, which is a voltage corresponding to the operation result, that is, 0.
A logic high voltage corresponding to a digital bit “1” is provided through the word line WL1_0 as an input. A transfer transistor MTR1 is blocked with a transfer signal SGD1. The source follower amplifier 110 outputs a voltage corresponding to a multiplication operation result of a digital bit “1” and a weight value. The voltage corresponding to the multiplication result is indicated as VG−VTH_0, VG−VTH_1, VG−VTH_2, VG−VTH_3, VG−VTH_4, VG−VTH_5, VG−VTH_6, and VG−VTH_7, each of which corresponds to the difference between the input voltage provided to the word line and the threshold voltage programmed in the ferroelectric transistor.
Therefore, the multiplication lines ML1_0 and ML1_0 connected to the outputs of the source followers 110 provided with the logic high voltage corresponding to the digital bit “1” as an input are pre-charged with a voltage corresponding to the multiplication results by the source follower amplifiers 110.
Subsequently, transfer signals SGD0 and SGD1 are provided to turn the transfer transistors MTR on. As the transfer transistors MTR are turned on, charges charged in the multiplication line ML0_0 and the multiplication line ML1_0 are transferred to the accumulation line AL0 and thus charge-shared, and charges charged in the multiplication line ML0_1 and the multiplication line ML1_1 are transferred to the accumulation line AL1 and thus charge-shared.
As the charge sharing is performed, charges having been present in each of the multiplication lines are redistributed to each of the accumulation lines AL0 and AL1 to form a new voltage, and the voltage newly formed in the accumulation line corresponds to a result obtained by accumulating a multiplication result. Thus, as the voltage formed in the accumulation line AL0 or AL1 in the second phase P2 is detected, a multiply and accumulate (MAC) operation result may be obtained.
In addition, as shown in
Thus, charges charged in the multiplication line ML may also be transferred to the accumulation line in the discharge phase P3 and discharged together.
Experiment Result
The present embodiment provides a benefit of reducing area consumption compared to the related art using two transfer transistors or an ReRAM and STT-MRAM by constructing a unit array with one transfer transistor and N ferroelectric transistors, and provides a benefit of increasing the accuracy in complex networks and/or data sets by performing computation using signed weights.
Although embodiments of the present invention have been described with reference to the accompanying drawings, this is for illustrative purposes, and those of ordinary skill in the art should appreciate that various modifications, equivalents, and other embodiments are possible without departing from the scope and sprit of the present invention. Therefore, the scope of the present invention is defined by the appended claims of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0093494 | Jul 2022 | KR | national |