The present disclosure relates to improvements in multiply-accumulate operations in binarized neural networks (BNN) utilizing digital-to-analog current-mode signal processing in integrated circuits (IC).
Matrix multiplication via a multiply-accumulate operation (MAC) is a fundamental computation function required in machine learning. Digital computation engines based on expensive and advanced deep sub-micron semiconductor manufacturing can perform large numbers of complex multiply-accumulate operations in the cloud with precision and speed, but are generally power hungry and costly, and have latency delays. Typically, cloud related latency and digital power consumption of digital machine learning ICs are prohibitive for some near the edge, at the edge, or on sensor applications. Moreover, while the cost of expensive ICs can be amortized over a longer life cycle attributed to chips deployed in the cloud, the same may not be acceptable for ICs on or near sensors or at the edge of the network. Machine learning for edge based and or on device ICs generally have a shorter life cycle, and as such they cannot afford being expensive. Also, on device or sensor machine learning ICs generally target mass market or consumer applications that are much more price sensitive than machine learning ICs that are deployed in the cloud data-centers. More importantly, safety and privacy concern may prohibit some sensors and edge devices from delegating their machine learning tasks to the cloud. Imagine if a heart pace maker, hearing aid, or residence digital electronics (e.g., smart door opener, smart fire place control, smart home surveillance video, etc.) being sent on the cloud and intercepted and hacked. As such, machine learning computation tasks on edge of the network or near sensors or on sensors, generally speaking, must perform their machine learning computation locally and not on the cloud for safety and latency concerns as well as for low-cost reasons. Because of shorter life cycles of edge-based devices which may target cost sensitive consumer or large volume mass market, the price cannot be expensive, which precludes fabricating such computational ICs with short life cycles on advanced deep-micron manufacturing where the tooling and wafer costs are very high.
Moreover, we are near or at the end of Moore's Law which means the semiconductor industry and ICs can not bank on chip costs to, for example, go down by half every year going forward (like it has been doing for the last few decades). In other words, companies may not risk inventing today on future mass markets by banking on expensive digital machine learning ICs, whose prices may not go down much from here.
In summary, low current consumption, low voltage power supply, asynchronous operations, safe and private (near or on device and sensor) machine learning ICs are needed, in order for smart devices and smart sensor to become free from the binds of the cloud (based machine learning), free from binds of wire, free from frequent battery recharge, and freedom from some remote intelligence that is channeled through (smart) devices or (smart) sensors onto public communications networks (which may be hacked, corrupted, or lost).
Latency and delays associated with computations delegated to the cloud can be restrictive or prohibitive for some machine learning applications, which makes computation on devices and or sensors essential.
Moreover, on-or-near-device and sensor machine learning ICs must be manufacturable inexpensively, and be based on low-cost, non-advanced shallow sub-micron (as opposed to expensive deep sub-micron) fabrication technologies. Low cost machine learning ICs are needed in order to meet the shorter life cycles (that mandates lower amortization cost of tooling and fabrication over the shorter life of the IC) and volume maximizing needs of mass market or consumer centric on or near device and sensor applications.
It is an objective of the present disclosure to provide improvements to MACs for BNN including but not limited to the following, in part or combination thereof:
An objective of this disclosure is to provide mixed-signal ICs for MAC in BNNs that enables machine learning on sensors and or devices that is safe, private, and having minimal latency delays.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs that is small and low cost.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs that can operate with low power supply voltage (VDD).
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs including the option of being arranged with in-memory-compute (IMC) having low dynamic power consumption due to read-write cycles in and out of memory.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs that consumes low DC and stand-by operating current.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs with the option of operating free from clock and asynchronously, which minimizes latency delay and reduces free clock related power consumption.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs wherein internal analog signal full-scale to zero-scale span (e.g., summing node of a MAC or analog input of an Analog-To-Digital-Converter or analog input of a comparator) is less restricted by VDD.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs that can be manufacturable on low-cost, readily available, multi-source, and conventional Complementary-Metal-Oxide-Semiconductor (CMOS) fabrication.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs with the option of not requiring passive resistors and or capacitors, thus providing the option that MAC's performance is mostly independent of passive resistors and or capacitors, which can help lower costs and improve the manufacturing yield.
Another objective of the present disclosure is for a meaningful portion of the computation circuitry to shut itself off (i.e. ‘smart self-power-down’) in the face of no incoming signal so that the computation circuits can remain ‘always on’ while consuming low stand-by current consumption.
Another objective of the present disclosure is to lower the sensitivity of operating currents, bias currents, reference currents, summing current, and or output currents to power supply variations.
Another objective of the present disclosure is to lower the sensitivity of operating currents, bias currents, and reference currents to normal manufacturing variations (e.g., normal threshold voltage variations of transistor), which could improve the silicon die yield and lower the IC cost.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs with the option of operating in current mode.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs that is fast.
Another objective of this disclosure is to provide mixed-signal IC for MAC in BNNs that provide an option of not requiring switching capacitors for mixed-mode signal processing.
Another objective of the present disclosure is to provide mixed-signal IC for MAC in BNNs wherein the summation and subtraction functions of the MAC can be performed in analog or mixed mode (current-mode and or voltage mode).
Another objective of the present disclosure is to provide mixed-signal IC for MAC in BNNs wherein the digital XOR and or digital XNOR function is performed with mixed signals (current-mode and or voltage mode).
Another objective of the present disclosure is to provide mixed-signal IC for MAC in BNNs wherein the both the digital XOR and or digital XNOR functions as well as the function of the summation of the outputs of plurality of digital XOR and or digital XNOR are performed with mixed signals (in current-mode and or voltage mode) in one asynchronous cycle that reduces latency delay.
Another objective of the present disclosure is to provide mixed-signal IC for MAC in BNNs wherein digital adders (which occupy larger die area) are avoided. The present disclosure aims to eliminate the digital adding function of bitwise count of logic state ‘1’ (or ON state) in BNNs. Instead, the present disclosure aims to perform the bitwise count of logic state ‘1’ (or ON state) in BNNs in analog current-mode. Having the objective of utilizing digital input to analog current output XOR (iXOR) and or digital input to analog current output XNOR (iXNOR), the current outputs of plurality (e.g., 1000s) of iXOR and or iXNOR can be simply coupled together, in order to perform the summation (counting of logic state ‘1’) function in mixed-signal current mode in one shot, asynchronously, and in an area efficient (low cost) manner.
Another objective of the present disclosure is for the non-linearity due to the non-systematic random statistical contribution of mismatches of the adding current mode signals to be accumulated by the square root of the sum of the squares of such non-systematic random mismatches attributed to the plurality of the summing current signals.
Another objective of the present disclosure is for monotonic incremental accumulation of adding current signals.
An aspects of the embodiments disclosed herein include an method of performing a multiply-accumulate operation (MAC) for binarized neural networks in an integrated circuit, the method comprising: supplying a regulated current source (IPSR) having a value of IPSR, mirroring and scaling IPSR onto a plurality of current sources (IS), each having a value ISV that is proportional to IPSR; individually gating each IS in the plurality of IS current sources, each gating responsive to a logical combination of one of an XOR, and an XNOR of a corresponding pair of digital input signals (x, w) to generate a plurality of corresponding analog-output current sources (ISX), wherein each of the ISX current sources has a value that swings between substantially zero and substantially ISv responsive to the logical combination of the one of the XOR, and the XNOR of the corresponding pair of digital input signals x, w; combining the plurality of ISX current sources to generate an analog summation current (IS
The subject matter presented herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and illustrations, and in which like reference numerals refer to similar elements, and in which:
The embodiment disclosed in
Numerous embodiments are described in the present application and are presented for illustrative purposes only and is not intended to be exhaustive. The embodiments were chosen and described to explain principles of operation and their practical applications. The present disclosure is not a literal description of all embodiments of the disclosure(s). The described embodiments also are not, and are not intended to be, limiting in any sense. One of ordinary skill in the art will recognize that the disclosed embodiment(s) may be practiced with various modifications and alterations, such as structural, logical, and electrical modifications. For example, the present disclosure is not a listing of features which must necessarily be present in all embodiments. On the contrary, a variety of components are described to illustrate the wide variety of possible embodiments of the present disclosure(s). Although particular features of the disclosed embodiments may be described with reference to one or more particular embodiments and/or drawings, it should be understood that such features are not limited to usage in the one or more particular embodiments or drawings with reference to which they are described, unless expressly specified otherwise. The scope of the disclosure is to be defined by the claims.
Although process (or method) steps may be described or claimed in a particular sequential order, such processes may be configured to work in different orders. In other words, any sequence or order of steps that may be explicitly described or claimed does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order possible. Further, some steps may be performed simultaneously despite being described or implied as occurring non-simultaneously (e.g., because one step is described after the other step). Moreover, the illustration of a process by its depiction in a drawing does not imply that the illustrated process is exclusive of other variations and modifications thereto, does not imply that the illustrated process or any of its steps are necessary to the embodiment(s). In addition, although a process may be described as including a plurality of steps, that does not imply that all or any of the steps are essential or required. Various other embodiments within the scope of the described disclosure(s) include other processes that omit some or all of the described steps. In addition, although a circuit may be described as including a plurality of components, aspects, steps, qualities, characteristics and/or features, that does not indicate that any or all of the plurality are essential or required. Various other embodiments may include other circuit elements or limitations that omit some or all of the described plurality. In U.S. applications, only those claims specifically citing “means for” or “step for” should be construed in the manner required under 35 U.S.C. § 112(f).
Throughout this disclosure, the terms FET is field-effect-transistor; MOS is metal-oxide-semiconductor; MOSFET is MOS FET; PMOS is p-channel MOS; NMOS is n-channel MOS; BiCMOS is bipolar CMOS; SPICE is Simulation Program with Integrated Circuit Emphasis which is an industry standard circuit simulation program; micro is μ which is 10−6; nano is n which is 10−9; and pico is p which is 10−12. Bear in mind that VDD (as a positive power supply) and VSS (as a negative power supply) are applied to all the circuitries, block, or systems in this disclosure, but may not be shown for clarity of illustrations. The VSS may be connected to a negative power supply or to the ground (zero) potential. Body terminal of MOSFETs can be connected to their respective source terminals or to the MOSFET's respective power supplies, VDD and VSS.
Keep in mind that for descriptive clarity, illustrations of this disclosure are simplified, and their improvements beyond simple illustrations would be obvious to one skilled in the arts. For example, it would be obvious for one skilled in the art that MOSFET current sources can be cascoded for higher output impedance and lower sensitivity to power supply variations, whereas throughout this disclosure current sources are depicted with a single MOSFET for clarity of illustrations. Another example, it would also be obvious to one skilled in the art that a circuit design (such as the ones illustrated in this disclosure) can be arranged with NMOS transistors, and or its complementary version utilizing transistors such as PMOS type.
The illustrated circuit schematics of embodiments described in the proceeding sections have the following benefits which are summarized here to avoid their repetitions in each section for sake of clarity and brevity:
First, mixed-signal current-mode circuit designs in this disclosure are suitable for MAC in BNNs.
Second, plurality of mixed-signal current-mode circuit designs in this disclosure take small silicon die area which makes them cost-effective for MAC in BNNs that may require thousands of such circuits in one chip.
Third, because voltage swings are small in current mode signal processing, the disclosed mixed-signal current-mode circuit designs can enable MAC in BNNs that are fast.
Fourth, also because current mode signal processing can be made fast, the disclosed mixed-signal current-mode circuit designs utilized in MAC in BNNs can provide a choice of trade-off and flexibility between running at moderate speeds and operating with low currents to save on power consumption.
Fifth, in part because the disclosed mixed-signal current-mode circuit designs can be arranged on a silicon die right next to memory (e.g., SRAM, EPROM, E2PROM, etc.) as in Compute-In-Memory (CIM) MAC in BNNs. Such an arrangement reduces the read/write cycles in an out of memory and thus lowers dynamic power consumption.
Sixth, the disclosed mixed-signal current-mode circuit designs can be clock-free, providing computations for MAC in BNNs to operate asynchronously which minimizes latency delay.
Seventh, the disclosed mixed-signal current-mode circuit designs can be clock-free and capacitor-free for MAC in BNNs which provide an option of not requiring switching capacitors for mixed-mode signal processing. This arrangement avoids the extra cost of capacitors on silicon and lowers the dynamic power consumption attributed to switching the capacitors and the clocking updates.
Eight, performance of the disclosed mixed-signal current-mode circuit designs can be arranged to be independent of resistors and capacitor values and their normal variations in manufacturing. Benefits derived from such independence is passed onto the MAC in BNNs that utilize the disclosed circuits. As such, die yield to perform to specifications can be made mostly independent of passive resistors and or capacitors values and their respective manufacturing variations which could otherwise reduce die yield and increase cost.
Ninth, because voltage swings are small in current mode signal processing, the disclosed mixed-signal current-mode circuit designs here can enable MAC in BNNs to operate with low power supply voltage (VDD).
Tenth, also because voltage swings are small in current mode signal processing, the disclosed mixed-signal current-mode circuit designs can enable internal analog signal to span between full-scale and zero-scale (e.g., summing node of a MAC or analog input of an Analog-To-Digital-Converter or analog input of a comparator) which enables the full-scale dynamic range of MAC in BNNs to be less restricted by VDD.
Eleventh, the disclosed mixed-signal current-mode circuit designs for MAC in BNNs that can be manufactured on low-cost standard and conventional Complementary-Metal-Oxide-Semiconductor (CMOS) fabrication which are more mature, readily available, and process node portable, which helps MAC in BNNs ICs with more rugged reliability, and multi-source manufacturing flexibility as well as lower manufacturing cost.
Twelfth, digital addition and digital subtraction occupy larger die area. Because the disclosed circuit designs operate in current mode, the function of addition in current mode simply requires the coupling of output current ports. The function of subtraction in current mode can be arranged via a current mirror. Thus, the disclosed circuit designs utilized in MAC in BNNs can be smaller and lower less.
Thirteenth, digital XOR and XNOR functions are required in BNNs. The present disclosure arranges the XOR and XNOR functions to be performed in mixed-signal current-mode for MAC in BNNs.
Fourteenth, plurality of XOR and or XNOR outputs are required to be accumulated for BNNs. The present disclosure provides digital-input to analog-output current XOR and or XNOR circuit designs suitable for mixed-signal MAC in BNNs. The plurality of output currents of plurality of XOR and or XNOR are couple together to perform the function of addition asynchronously, which reduces latency delays substantially.
Fifteenth, as noted earlier, digital addition and subtraction functions occupy large die areas and can be expensive. The present disclosure eliminates the digital adding function of bitwise count of logic state ‘1’ (or ON state) in BNNs. Instead, the present disclosure performs the bitwise count of logic state ‘1’ (or ON state) in BNNs in analog current-mode. By utilizing digital input to analog current output XOR (iXOR) and or digital input to analog current output XNOR (iXNOR), the current outputs of plurality (e.g., 1000s) of iXOR and or iXNOR are simply coupled together, which performs the summation (counting of logic state ‘1’) in mixed-signal current mode and in an area efficient (low cost) manner.
Sixteenth, the disclosed mixed-signal current-mode circuit designs utilized in MAC in BNNs help reduce inaccuracies attributed to the function of addition that stems from random but normal manufacturing variation (e.g., random transistor mismatches in normal fabrication). In the disclosed mixed-signal current-mode circuit designs, the non-linearity due to the non-systematic random statistical contribution of mismatches of adding the current signals roughly equals to the square root of the sum of the squares of such non-systematic random mismatches (attributed to the plurality of the summing signals). Such benefit of attenuated impact of imperfections (due to random manufacturing variations) on overall accuracy is an inherent advantage of the disclosed designs which can improve manufacturing yield to specifications that is passed on to the MAC in BNNs.
Seventeenth, cascoding current source can help increase output impedance and reduce sensitivity of output currents to power supply variation, but require two cascoded transistors. This disclosure provides the option of utilizing a power supply desensitization circuit for a current source that is not cascoded (e.g., single MOSFET current source) which save on area, considering a large number (e.g., 10s of 1000s) of iXOR (and or iXNOR) that may be required in MAC in BNNs.
Eighteenth, because each unit of cumulative current (that represents the bitwise count of logic state ‘1’ at the output of each iXOR and or iXNOR) are equal to one another, the incremental summation of plurality of output currents is thermometer-like. Accordingly, the disclosed mixed-signal current-mode circuit designs utilized in MAC for BNNs provides monotonic incremental accumulation of adding current signals which is beneficial in converging on minimum cost function during training of BNNs.
Nineteenth, the disclosed mixed-signal current-mode circuit designs utilized in MAC for BNNs enables a meaningful portion of the computation circuitry to shut itself off (i.e. ‘smart self-power-down’) in the face of no incoming signal so that the computation circuits can remain ‘always on’ while consuming low stand-by current consumption.
The XOR (x⊕w) of U11 controls analog switches N21 and N31 which enable or disable current mirror N11, N41 and thus control the value of IO to swing to either I11 value (analog current equivalent to ‘logic 1’) or zero (analog current equivalent to ‘logic 0’).
MAC for BNNs can be arranged to receive plurality of x, w digital bits inputted to plurality of XOR to generate plurality of IO currents that can be summed to generate IOS (i.e., utilizing current mode summation for bitwise counting of plurality of logic state ‘1’ of plurality of iXOR outputs for MAC in BNNs).
In this disclosure, unless otherwise specified, I1 value is the analog current equivalent to ‘logic 1’ and zero current is the analog current equivalent to ‘logic 0’.
The XOR (x⊕w) of U12 and inverter U22 controls analog switches N32 and N42 which steer the N22 current (that is mirrored and scaled from N12) to flow through either N32 to form the IO1 current (swinging to either zero or I12) or flow through N42 to form the IO2 current (swinging to either I11 or zero).
Similar to prior section, utilizing the disclosed embodiment of
The XOR (x⊕w) of U13 controls the analog switches N43 which enable or disable current mirror N13, N23 and thus controlling the value of IO to swing to either I13 value (analog current equivalent to ‘logic 1’) or zero (analog current equivalent to ‘logic 0’). Notice that N43 and N33 are arranged with the same size for current mirror N13, N23 matching.
Like preceding sections, utilizing the embodiment of
Analog switches arranged by N54 in series with N64 are controlled by x and w digital signals, respectively. Also, analog switches are arranged by placing N74 in series with N84 which are controlled by
As such, the disclosed digital-input analog-output current XNOR (iXOR) function as an analog XNOR (whose output effectively controls analog current switches). The analog output current here is controlled by a meshed composite of series-parallel analog current switch iSWSP comprising of four transistors which can be meaningfully area efficient.
Similarly, utilizing the embodiment illustrated in
An iSWSP1 comprising a series combination of analog switches N35 and N45 are placed in parallel with another series combination of analog switches N55 and N65. When digital bits x, w are both HIGH (logic state ‘1’) or both LOW (logic state ‘0’), then either N35 and N45 or N55 and N65 connect the gate-drain port of N15 to the gate port of N25 thus causing the operating current IO of N25 to mirror and scale that of the operating current I15 of N15.
Note that concurrently, an iSWSP2 comprising a series combination of analog switches N75 and N85 are placed in parallel with another series combination of analog switches N95 and N105. When digital bits x, w are both HIGH or both LOW, then either series combination of N75 and N85 and or series combination of N95 and N105 remain open (i.e., composite switch is ‘off’). Conversely, when x, w are in any state other than both HIGH or both LOW, then either series combination of N75 and N85 and or series combination of N95 and N105 turn ‘on’ and short the gate-port voltage of N25 to VSS which keeps the IO of N25 to zero.
Also, keep in mind that if digital bits x, w are in any state other than both HIGH or both LOW, then either series combination of N35 and N45 and or series combination of N55 and N65 turn off and isolate the gate-drain port of N15 from the grounded gate port of N25.
Similarly, utilizing the embodiment illustrated in
The disclosed iXNOR of
Like prior sections, utilizing the embodiment illustrated in
Analog switches arranged by N57 in series with N67 are controlled by
Similar to the functional operation of a digital XNOR, when x and w are both HIGH (logic 1), then the series analog switches N37 and N47 are both ON which enables N17 to scale and mirror its current I17 onto drain port of N27 (and through iSWSP) to generate IO. Similarly, when digital bits x and w are both LOW (logic 0), then the series analog switches N57 and N67 are both ON which again enables N17 to scale and mirror its current I17 onto drain port of N27 (and through iSWSP) to generate IO.
As noted in the previous sections, utilizing the embodiment illustrated in
The disclosed iXNOR of
Like prior sections, utilizing the embodiment illustrated in
Single-ended output currents of plurality of digital input XOR to analog output currents are accumulated (ISO) and added to a single bias current IB generated by a Bias iDAC. A single-ended input current iADC digitalizes the net ISO+IB.
The embodiment disclosed in
The PSR circuit of
Be mindful that the DC voltage of the summing node, where the summation current ISO+IB flow through, can be arranged to be (an equivalent of a diode connected) VGS of a PMOSFET that can be arranged to track the diode connected VGS of P39.
Accordingly, the single-ended sum of ISO+IB currents flowing through the equivalent of a diode connected VGS of a PMOSFET as the input of a single-ended current mode ADC (iADC) or a single-ended current mode comparator (iCOMP) can be regulated by the PSR circuit to follow the constant current reference proportional to I19.
Thus, the disclosed embodiments provides the option of desensitization ISO, IB currents from power supply variations with a single transistor current source (e.g., N29 and N79) instead of a cascoded current source, which can save substantial die area considering the plurality (e.g., 1000s) of iXORs may be required for a typical MAC in BNNs.
It can be noticed that in
Utilizing the disclosed embodiment illustrated in
Keep in mind that the disclosed embodiment illustrated in
Each latch cell that is inputted with the digital weight signals (w) and digital row select weight write signals (e.g., row ‘a’ digital write signals Wwa &
To lower dynamic power consumption associated with reading/writing digital weight data in-and-out of memory, the weight data can be stored in a respective latch, wherein each respective latch cell on the silicon die is not only laid-out right next to its respective iXNOR cell but also the training data-set is pre-loaded and latched onto the respective iXNOR (e.g., N510 through N910) such as the one described and illustrated in section 7 and
Training data-set is loaded onto respective array of latch cells one row at a time via write control signals (e.g., Wwa &
Accordingly, the respective digital outputs of the latch array (laid-out on the silicon die right next to their respective iXNOR array) receive their respective digital weight data-set (e.g., w1a, w2a) into the respective weight digital ports of iXNOR arrays (e.g., gate ports of N710, N910).
The signal data-set (e.g., x1 &
The outputs of plurality of iXNORs (along a latch array row or a latch array column, depending on the system architecture and software specifications) can be coupled together to generate plurality of summation currents (e.g., ISOa, ISOb).
As such, utilizing the embodiment illustrated in
The embodiment disclosed in
Plurality of circuits similar to that of
The digital weight (wi) data-set is stored in the SRAM array while the respective xi digital signals are inputted to the iMAC in BNNs.
Consequently, a plurality of rows (e.g., rows a and b) of single-ended sums of analog-output currents (e.g., ISOa and ISOb) are generated that represent the analog equivalent of the digital sum of the plurality of respective
Each SRAM cell that is inputted with the digital weight signals (e.g., w1a &
To lower dynamic power consumption associated with reading/writing digital weight data in-and-out of memory, the weight data can be stored in its respective SRAM cell, wherein each respective SRAM cell on the silicon die is not only laid-out right next to its respective iXNOR cell but also the training data-set is locked onto the respective iXNOR (e.g., N511 through N911) similar to the one described in section 7
Training data-set is loaded onto respective array of SRAM cells one row at a time via write control signal (e.g., Wwa) which control the row of SRAM array input switches (e.g., N111−N211). Once the said digital weights data-set (e.g., w1a &
The signal data-set (e.g., x1 &
Accordingly, the outputs of plurality of iXNORs (along a SRAM array row or a SRAM array column, depending on the system and software requirements) can be coupled together to generate plurality of summation currents (e.g., ISOa, ISOb).
In summary, utilizing the embodiment illustrated in
Here, differential output currents of plurality of analog output currents of plurality of iXOR are accumulated differentially (dISO=ISO1−ISO2) and added to a differential bias current dIB, wherein dIB is generated by a differential Bias iDAC. A differential current-input comparator (diCOM) generates the sign of the net result of dISO+dIB.
The embodiment disclosed in
The PSR section of the circuit of
Be mindful that the DC voltage of the summing node, where the summation differential current dISO+dIB flow through, can be arranged to be (as an equivalent of a diode connected pair of) VGS of a PMOSFET that can be arranged to track the (diode connected) VGS of P39. Accordingly, the differential sum of ISO+ID currents flowing through the equivalent of a pair of diode connected VGS of a PMOSFET (as the differential input of a current mode ADC or current mode comparator diCOMP12) can be regulated by the PSR circuit to follow the constant current reference proportional to I112. The disclosed embodiments provides the option of desensitization of ISO and IB currents from power supply variations with a single transistor current source (e.g., N512) per each iXOR instead of a cascoded current source, which can save substantial die area considering the plurality (e.g., 1000s) of iXORs may be required for a typical MAC in BNNs.
It can be noticed that in
As an example, the mixed-mode differential iXOR of
When w1=0, then I1 flows through N612 (while N712 is starved which cuts off any operating current from flowing through both N1012 and N1112). With w1=0, If x1=0, then I1 that flows through N612 is passed on to flow through N812 and onto positive port of diCOMP. Also, with w1=0, if x1=1, then I1 that flows through N612 is passed on to flow through N912 and onto the negative port of diCOMP.
When w1=1, then I1 flows through N712 (while N612 is starved which cuts off both N812 and N912). With w1=1, If x1=0, then I1 that flows through N712 is passed on to flow through N1112 and onto negative port of diCOMP. Also, with w1=1, if x1=1, then I1 that flows through N712 is passed on to flow through N1012 and onto the positive port of diCOMP.
In summary, utilizing the disclosed embodiment illustrated in
Keep in mind that the disclosed embodiment illustrated in
The mixed-mode differential iXOR of
When w=0, then I1 flows through N313 parallel pair, while N413 is starved which cuts off both N713 and N813 from current. With w=0, if x=0, then I1 is passed on to flow through N513 and onto IO2 current port. Also, with w=0, if x=1, then I1 flows through N613 and onto IO1 current port.
When w=1, then I1 flows through N413 parallel pair, while N313 is starved which cuts off both N513 and N613 from current. With w=1, if x=0, then I1 is passed on to flow through N713 and onto IO1 current port. Also, with w=1, if x=1, then I1 flows through N813 and onto IO2 current port.
As discussed in section 12 and illustrated in
Plurality of mixed-mode differential iXNORs or iXORs (with digital input to differential analog output currents) can be utilized here in a SRAM memory array with CIM. The digital data that is stored in the SRAM array such as weights (wi) along with array of respective xi digital signals are inputted to the differential iMAC for BNN. As a result, plurality of rows (e.g., rows a and b) of differential sums of analog-output differential currents (e.g., dISOa=ISO1a−ISO2a and dISOb=ISO1b−ISO2b) are generated that represent the analog equivalent of the digital sum of the plurality of respective xi⊕wi and or
Each SRAM cell that is inputted with the digital weight signals (e.g., w1a &
To lower dynamic power consumption associated with reading/writing digital weight data in-and-out of memory, the weight data can be stored in its respective SRAM, wherein each respective SRAM cell on the silicon die is not only laid-out right next to its respective differential iXNOR or iXOR cell but also the training data-set is pre-loaded and locked onto the respective differential iXNOR or iXOR (e.g., N914 through N1114).
Training data-set is loaded onto respective array of SRAM cells one row at a time via write control signal (e.g., Wwa) which control the row of SRAM array input switches (e.g., N114-N214). Once the said digital weights data-set (e.g., w1a &
The signal data-set (e.g., x1 &
The outputs of plurality of differential iXNORs or iXOR (along a SRAM array row or a SRAM array column, depending on the system and software requirements) can be coupled together to generate plurality of summation currents (e.g., ISOa, ISOb).
As such, utilizing the embodiment illustrated in
The present application is a continuation-in-part of and claims the benefit of priority from U.S. patent application Ser. No. 16/746,897 filed Jan. 19, 2020. The present application also claims the benefit of priority of U.S. Provisional Patent Application Ser. No. 62/912,407 filed Oct. 8, 2019. Both applications are hereby incorporated by reference as if fully set forth herein.
Number | Name | Date | Kind |
---|---|---|---|
4677369 | Bowers et al. | Jun 1987 | A |
4899066 | Aikawa et al. | Feb 1990 | A |
5218246 | Lee et al. | Jun 1993 | A |
5261035 | Adler | Nov 1993 | A |
5280564 | Shioni et al. | Jan 1994 | A |
5283579 | Tasdighi | Feb 1994 | A |
5289055 | Razavi | Feb 1994 | A |
5294927 | Levinson et al. | Mar 1994 | A |
5329632 | Lee et al. | Jul 1994 | A |
5334888 | Bodas | Aug 1994 | A |
5391938 | Hatsuda | Feb 1995 | A |
5523707 | Levy et al. | Jun 1996 | A |
5535309 | Shin | Jul 1996 | A |
5576637 | Akaogi et al. | Nov 1996 | A |
5581661 | Wang | Dec 1996 | A |
5583456 | Kimura | Dec 1996 | A |
5592107 | McDermott | Jan 1997 | A |
5640084 | Tero et al. | Jun 1997 | A |
5668710 | Caliboso et al. | Sep 1997 | A |
5734260 | Tasdighi et al. | Mar 1998 | A |
5734291 | Tasdighi et al. | Mar 1998 | A |
5814995 | Tasdighi | Sep 1998 | A |
5861762 | Sutherland | Jan 1999 | A |
5923208 | Tasdighi et al. | Jul 1999 | A |
5966029 | Tarrab et al. | Oct 1999 | A |
6005374 | Tasdighi | Dec 1999 | A |
6054823 | Collings et al. | Apr 2000 | A |
6122284 | Tasdighi et al. | Sep 2000 | A |
6166670 | O'Shaughnessy | Dec 2000 | A |
6353402 | Kanamori | Mar 2002 | B1 |
6573758 | Boerstler et al. | Jun 2003 | B2 |
6727728 | Bitting | Apr 2004 | B1 |
6754645 | Shi et al. | Jun 2004 | B2 |
6903579 | Rylov | Jun 2005 | B2 |
6930512 | Yin | Aug 2005 | B2 |
7088138 | Xu et al. | Aug 2006 | B2 |
7142014 | Groen et al. | Nov 2006 | B1 |
7298171 | Parris | Nov 2007 | B2 |
7557614 | Bonsels et al. | Jul 2009 | B1 |
7557743 | Imai | Jul 2009 | B2 |
7612583 | Winograd | Nov 2009 | B2 |
7924198 | Cui | Apr 2011 | B2 |
8587707 | Matsumoto | Nov 2013 | B2 |
8653857 | Becker | Feb 2014 | B2 |
9519304 | Far | Dec 2016 | B1 |
9780652 | Far | Oct 2017 | B1 |
9921600 | Far | Mar 2018 | B1 |
10009686 | Das | Jun 2018 | B1 |
10177713 | Far | Jan 2019 | B1 |
10198022 | Far | Feb 2019 | B1 |
10311342 | Farhadi et al. | Jun 2019 | B1 |
10387740 | Yang et al. | Aug 2019 | B2 |
10411597 | Far | Sep 2019 | B1 |
10491167 | Far | Nov 2019 | B1 |
10504022 | Temam et al. | Dec 2019 | B2 |
10536117 | Far | Jan 2020 | B1 |
10560058 | Far | Feb 2020 | B1 |
10581448 | Far | Mar 2020 | B1 |
10592208 | Wang et al. | Mar 2020 | B2 |
10594334 | Far | Mar 2020 | B1 |
10664438 | Sity et al. | Mar 2020 | B2 |
10621486 | Yao | Apr 2020 | B2 |
10684955 | Luo et al. | Jun 2020 | B2 |
10691975 | Bagherinezhad et al. | Jun 2020 | B2 |
10699182 | Gulland et al. | Jun 2020 | B2 |
10700695 | Far | Jun 2020 | B1 |
20030225716 | Shi et al. | Dec 2003 | A1 |
20050218984 | Yin | Oct 2005 | A1 |
20070086655 | Simard et al. | Apr 2007 | A1 |
20090167579 | Kawano | Jul 2009 | A1 |
20090179783 | Matumoto | Jul 2009 | A1 |
20100079320 | Wang | Apr 2010 | A1 |
20120126852 | Shin et al. | May 2012 | A1 |
20140354865 | Yun | Dec 2014 | A1 |
20160026912 | Falcon et al. | Jan 2016 | A1 |
20160239706 | Dijkman et al. | Aug 2016 | A1 |
20160246506 | Hebig et al. | Aug 2016 | A1 |
20160328647 | Lin et al. | Nov 2016 | A1 |
20170195601 | Yun | Jul 2017 | A1 |
20170200094 | Bruestle et al. | Jul 2017 | A1 |
20190286953 | Farhadi et al. | Sep 2019 | A1 |
20190325269 | Bagherinezhad et al. | Oct 2019 | A1 |
20200184037 | Zatloukal et al. | Jun 2020 | A1 |
20200184044 | Zatloukal | Jun 2020 | A1 |
Entry |
---|
Ali Far, “Small size class AB amplifier for energy harvesting with ultra low power, high gain, and high CMRR,” 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, 2016, pp. 1-5. |
Ali Far, “Compact ultra low power class AB buffer amplifier,” 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, 2017, pp. 1-6. |
Ali Far, “Subthreshold current reference suitable for energy harvesting: 20ppm/C and 0.1%/V at 140nW,” 2015 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, 2015, pp. 1-4. |
Ali Far, “Amplifier for energy harvesting: Low voltage, ultra low current, rail-to-rail input-output, high speed,” 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, 2016, pp. 1-6. |
Ali Far, “Class AB amplifier with noise reduction, speed boost, gain enhancement, and ultra low power,” 2018 IEEE 9th Latin American Symposium on Circuits & Systems (LASCAS), Puerto Vallarta, Mexico, 2018, pp. 1-4. |
Ali Far, “Low noise rail-to-rail amplifier runs fast at ultra low currents and targets energy harvesting,” 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, 2017, pp. 1-6. |
Ali Far, “A 5μW fractional CMOS bandgap voltage and current reference,” 2013 IEEE Global High Tech Congress on Electronics, Shenzhen, 2013, pp. 7-11. |
Ali Far, “A 400nW CMOS bandgap voltage reference,” 2013 International Conference on Electrical, Electronics and System Engineering (ICEESE), Kuala Lumpur, 2013, pp. 15-20. |
Ali Far, “Enhanced gain, low voltage, rail-to-rail buffer amplifier suitable for energy harvesting,” 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, 2017, pp. 1-6. |
Ali Far, “Subthreshold bandgap voltage reference aiming for energy harvesting: 100na, 5 ppm/c, 40 ppm/v, psrr-88db,” 2015 IEEE 5th International Conference on Consumer Electronics—Berlin (ICCE-Berlin), Berlin, 2015, pp. 310-313. |
Ali Far, “A 220nA bandgap reference with 80dB PSRR targeting energy harvesting,” 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Vancouver, BC, 2016, pp. 1-4. |
Ali Far, “Sub-1 volt class AB amplifier with low noise, ultra low power, high-speed, using winner-take-all,” 2018 IEEE 9th Latin American Symposium on Circuits & Systems (LASCAS), Puerto Vallarta, Mexico, 2018, pp. 1-4. |
Ali Far, “A low supply voltage 2μW half bandgap reference in standard sub-μ CMOS,” 2014 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, 2014, pp. 1-5. |
Ali Far, “Current reference for energy harvesting: 50um per side, at 70 nW, regulating to 125C,” 2014 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, 2014, pp. 1-5. |
Qing Dong et al., “A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology,” 2017 Symposium on VLSI Circuits, Kyoto, 2017, pp. C160-C161, doi: 10.23919/VLSIC.2017.8008465. |
Yen-Cheng Chiu et al., “A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors,” in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2020.3005754. |
Jingcheng Wang et al., “A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing,” in IEEE Journal of Solid-State Circuits, vol. 55, No. 1, pp. 76-86, Jan. 2020, doi: 10.1109/JSSC.2019.2939682. |
Daniel Bankman et al., “An Always-On 3.8 $\mu$ J/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS,” in IEEE Journal of Solid-State Circuits, vol. 54, No. 1, pp. 158-172, Jan. 2019, doi: 10.1109/JSSC.2018.2869150. |
Gobinda Saha et al “An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network,” in IEEE Access, vol. 8, pp. 91405-91414, 2020, doi: 10.1109/ACCESS.2020.2993989. |
Han-Chun Chen et al., “Configurable 8T SRAM for Enbling in-Memory Computing,” 2019 2nd International Conference on Communication Engineering and Technology (ICCET), Nagoya, Japan, 2019, pp. 139-142, doi: 10.1109/ICCET.2019.8726871. |
James Clay et al., “Energy-efficient and reliable in-memory classifier for machine-learning applications,” in IET Computers & Digital Techniques, vol. 13, No. 6, pp. 443-452, Nov. 2019, doi: 10.1049/iet-cdt.2019.0040. |
Naveen Verma et al., “In-Memory Computing: Advances and Prospects,” in IEEE Solid-State Circuits Magazine, vol. 11, No. 3, pp. 43-55, Summer 2019, doi: 10.1109/MSSC.2019.2922889. |
Yu Wang, “Neural Networks on Chip: From CMOS Accelerators to In-Memory-Computing,” 2018 31st IEEE International System-on-Chip Conference (SOCC), Arlington, VA, 2018, pp. 1-3, doi: 10.1109/SOCC.2018.8618496. |
Hossein Valavi et al., “A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement,” 2018 IEEE Symposium on VLSI Circuits, Honolulu, HI, 2018, pp. 141-142, doi: 10.1109/VLSIC.2018.8502421. |
Hossein Valavi et al., “A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute,” in IEEE Journal of Solid-State Circuits, vol. 54, No. 6, pp. 1789-1799, Jun. 2019, doi: 10.1109/JSSC.2019.2899730. |
Yinqi Tang et al., “Scaling Up In-Memory-Computing Classifiers via Boosted Feature Subsets in Banked Architectures,” in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 66, No. 3, pp. 477-481, Mar. 2019, doi: 10.1109/TCSII.2018.2854759. |
Jinato Zhang et al. “A machine-learning classifier implemented in a standard 6T SRAM array,” 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), Honolulu, HI, 2016, pp. 1-2, doi: 10.1109/VLSIC.2016.7573556. |
Jinato Zhang et al. “An In-memory-Computing DNN Achieving 700 TOPS/W and 6 TOPS/mm2 in 130-nm CMOS,” in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, No. 2, pp. 358-366, Jun. 2019, doi: 10.1109/JETCAS.2019.2912352. |
Akhilesh Jaiswal et al.,“8T SRAM Cell as a Multibit Dot-Product Engine for Beyond Von Neumann Computing,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, No. 11, pp. 2556-2567, Nov. 2019, doi: 10.1109/TVLSI.2019.2929245. |
Jinsu Lee et al., “A 17.5-fJ/bit Energy-Efficient Analog SRAM for Mixed-Signal Processing,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, No. 10, pp. 2714-2723, Oct. 2017, doi: 10.1109/TVLSI.2017.2664069. |
Qing Dong et al., “15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications,” 2020 IEEE International Solid-State Circuits Conference—(ISSCC), San Francisco, CA, USA, 2020, pp. 242-244, doi: 10.1109/ISSCC19947.2020.9062985. |
Number | Date | Country | |
---|---|---|---|
62912407 | Oct 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16746897 | Jan 2020 | US |
Child | 16945528 | US |