The present disclosure relates to in memory computing.
In-memory computing (IMC) is a widely used technique that performs low-precision computations inside memory elements to break the von-Neumann bottleneck in conventional artificial intelligence/machine learning (AI/ML) hardware. Static random access memory (SRAM)-based IMC exhibits energy efficiency and integration with complementary metal-oxide semiconductor (CMOS) integrated circuits (ICs). A fundamental limitation of SRAM based IMC is nonlinearity in the multiply-and-accumulate (MAC) operation. For large values of MAC result, the proportional large discharge current through SRAM bitline pushes the access transistors into linear region and makes the discharge current, and hence the MAC result, a nonlinear function of bitline voltage. This limitation has been addressed by applying pulsed input activations and adding capacitors to an SRAM bitcell for charge-domain computation which has lower sensitivity to bitline voltage, and hence, higher linearity than current-domain computation in a traditional SRAM bitcell. While pulsed input makes each SRAM bitcell linear, accumulation of partial products is still performed in current-domain and the overall MAC result is still nonlinear. A capacitive SRAM can improve linearity over current-domain accumulation by making the MAC result independent of the discharge current. However, linearity of MAC is limited because the analog input is sampled on the capacitor in the bitcell through an n-channel metal-oxide semiconductor (NMOS) switch. The analog input activation modulates threshold voltage (Vth) of the NMOS switch making the voltage sampled on the capacitor nonlinear. Vth drop in the NMOS capacitor also limits the maximum input swing that can be handled by the SRAM bitcell and restricts the supply voltage to relatively high values.
The system and method in accordance with embodiments of the present disclosure address the fundamental non-linearity in an SRAM bitcell by: 1) using delta-sigma modulators (DSM) to convert analog input activations into a binary pulse trains, and 2) using 9 transistor/1 capacitor (9TIC) SRAM bitcells to perform computations in charge-domain. Compared to SAR or flash ADC, resolution of DSM can be re-configured easily without requiring changes in hardware. For the same oversampling ratio (OSR), quantization noise-shaping in DSM allows higher resolution of input activation compared to a counter-based technique that averages quantization error.
Embodiments of the system and method of the present disclosure improve the linearity of vector matrix multipliers (VMMs) by including (1) delta-sigma modulators that convert the input and output activations into binary pulse trains, (2) charge-domain computation in each SRAM cell that removes the nonlinear dependency on bitline voltage of the result of the multiplication and allows rail-to-rail output swing, and (3) a CMOS switch that transmits input activation to the capacitor in the SRAM cell, which improves linearity by suppressing the switch threshold voltage dependence on input activation voltage.
A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One or more circuits can be configured to perform particular operations or actions. One general aspect includes a method for improving linearity during multiply and accumulate (MAC) computations in an in memory computing (IMC) macro. The method includes converting an input activation of analog input into binary pulses, applying the binary pulses to the analog input to produce pulsed input into the IMC macro, and performing charge-domain MAC computations on the pulsed input in the IMC macro to provide a plurality of read word line (RWL) output bits.
Implementations may include one or more of the following features. The method may include combining and weighting a plurality of RWL output bits based on bit positions of the plurality of RWL output bits. The IMC macro may include an array of capacitive bitcells having between six and twelve transistors inclusive. The method may include charging the array of capacitive bitcells by applying the pulsed input to a static random access memory (SRAM) capacitor through a RWL. The method may include creating the binary pulses using a delta-sigma modulator (DSM). The method may include reconfiguring the DSM to modify a binary pulse train. The method may include dynamically reconfiguring the input activation by changing an oversampling ratio (OSR) of the DSM. The IMC macro may include an array of 9TIC SRAM bitcells. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
One general aspect includes a system for improving linearity during multiply and accumulate (MAC) computations in an in memory computing (IMC) macro. The system includes a plurality of input delta sigma modulators (DSMs). The plurality of DSMs converts an input activation into binary pulses, and applies the binary pulses to an analog input. The IMC macro includes a bitcell array with weights. The IMC macro is configured to receive the pulsed analog input, and the bitcell array is configured to perform charge-domain MAC computations on the pulsed analog input to provide a plurality of read word line (RWL) output bits producing a binary pulse train. The charge-domain MAC computations are enabled by switched-capacitor circuits. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Implementations may include one or more of the following features. The bitcell array is configured to combine and weight the plurality of RWL output bits based on bit positions of the plurality of RWL output bits. The bitcell array may include an array of capacitive bitcells having between six and twelve transistors inclusive. The binary pulse train is modified by reconfiguring the plurality of DSMs. The plurality of DSMs is configured to dynamically reconfigure the input activation based on an oversampling ratio (OSR) of the plurality of DSMs. The bitcell array is charged by applying the pulsed analog input to a static random access memory (SRAM) capacitor through a RWL. The IMC macro may include an array of 9TIC SRAM bitcells. The system may include a plurality of output DSMs configured to provide output activations for readout from the IMC macro. The output activations may include reconstruction by digital decimation of the plurality of output DSMs. The system may include a loop filter including a dynamic amplifier. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a method for improving linearity during multiply and accumulate (MAC) computations in an in memory computing (IMC) macro. The method includes sending read bit lines (RBLs) from a static random access memory (SRAM) array to a delta-sigma modulator (DSM), where the SRAM includes capacitors. The method includes combining, by the DSM, voltages on the SRAM RBLs with binary weights to produce a binary pulse-train as output activation, and performing a multi-cycle integration to combine outputs from the SRAM RBLs in the SRAM array with associated of the binary weights. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Implementations may include one or more of the following features. The method can include determining the binary weights may include, for a least significant bit, sampling a first RBL voltage on a first capacitor and accumulating the sampled first RBL voltage on a feedback capacitor one time. The method can include, for a second bit, sampling a second RBL voltage on a second capacitor and accumulating the sampled second RBL voltage on a second feedback capacitor two times. The method can include, for a sign bit, sampling a third RBL voltage on a third capacitor and scaling the third RBL voltage by a factor of four on a third feedback capacitor. The method can include combining the sampled first RBL voltage with the sampled second RBL voltage and the sampled third RBL voltage forming integrator output, computing comparator output from the DSM by providing the integrator output to a 1-bit quantizer, and completing a loop of the DSM by feeding the comparator output back to bottom plates of the capacitors during accumulation of sample RBL voltage. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
Those skilled in the art will appreciate the scope of the present disclosure and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.
The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the embodiments and illustrate the best mode of practicing the embodiments. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that when an element such as a layer, region, or substrate is referred to as being “on” or extending “onto” another element, it can be directly on or extend directly onto the other element or intervening elements may also be present. In contrast, when an element is referred to as being “directly on” or extending “directly onto” another element, there are no intervening elements present. Likewise, it will be understood that when an element such as a layer, region, or substrate is referred to as being “over” or extending “over” another element, it can be directly over or extend directly over the other element or intervening elements may also be present. In contrast, when an element is referred to as being “directly over” or extending “directly over” another element, there are no intervening elements present. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present.
Relative terms such as “below” or “above” or “upper” or “lower” or “horizontal” or “vertical” may be used herein to describe a relationship of one element, layer, or region to another element, layer, or region as illustrated in the Figures. It will be understood that these terms and those discussed above are intended to encompass different orientations of the device in addition to the orientation depicted in the Figures.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including” when used herein specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Referring now to
Referring now to
Continuing to refer to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
The operational steps described in any of the exemplary embodiments herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary embodiments may be combined.
Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.
This application claims priority to, and the benefit of, U.S. Provisional Application No. 63/496,727, entitled “Delta-Sigma Modulator-Based Variable-Resolution Activation In-Memory Computing Macro,” and filed Apr. 18, 2023, which is hereby incorporated by reference in its entirety.
This invention was made with government funds under Grant No. CCF-1948331 awarded by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63496727 | Apr 2023 | US |