The present invention relates a CMOS analog integrated circuit that can function as an artificial neuromorphic network and, more particularly, to such a CMOS circuit that connects directly with artificial synaptic networks that are made of analog memory arrays.
Spiking neural networks (SNNs) have been proposed to replicate the spatiotemporal information processing observed in biological neural systems more faithfully than traditional artificial neural networks (ANNs). Unlike ANNs, which process and transmit information in the form of levels for abstracted mathematical models, SNNs process and communicate information in sparse spike trains. This approach closely resembles the information processing of the nervous system in the brain. Consequently, SNNs implemented in hardware typically exhibit higher energy efficiency than their ANN counterparts, making them particularly suitable for applications where energy is limited and spatiotemporal reasoning is required, such as autonomous vehicle guidance [1], [2] and brain-machine interfaces [3], [4]. Most existing SNNs are emulated on digital hardware, for example, CPUs, GPUs or FPGAs, however the translation overhead to digital representations results in high energy consumption. To address this issue, several SNNs accelerators, known as neuromorphic systems, have been proposed and demonstrated [5]-[10] since the concept was first introduced by Carver Mead in 1990[11]. Neuromorphic systems are naturally suitable for implementation in analog hardware because of their similarities with biological neural systems, characterized by asynchronous operation and the ability to leverage device physics to perform essential functions such as excitatory and inhibitory synapses, spike integration, membrane potential leakage and inhibition, thresholding, and spike firing and transmission. As a result, neuromorphic systems can operate faster and more efficiently than their digital and mixed-signal counterparts.
Recently, many emerging memories/devices and materials have been explored as potential candidates for implementing the biologically-inspired neuromorphic systems with high energy efficiency, such as ReRAM or memristors [12]-[14], phase change materials [15], Gaussian heterojunction transistors [16], ferroelectric field-effect transistors [17], silicon-on-insulator metal-oxide-semiconductor field-effect transistors [18], [19], memtransistors [20], carbon nanotube transistors [21], and 2D MoS2 transistors [22]. These designs aim to leverage biologically plausible physical characteristics exhibited by the emerging memories/devices and employ bio-inspired unsupervised localized synaptic learning rules, such as spike timing-dependent plasticity for training. However, due to the limited understanding of biological neural systems, the accuracy of these systems built from the bottom-up falls short of competitiveness, thereby restricting their practical utility to implement elementary tasks, such as the classification of a small set of letters composed of a limited number of pixels. Additionally, present designs based on emerging memories suffer from limited endurance, making their practical implementation challenging at this stage. Various efforts have been made to build the system by converting ANNs to SNNs through neural coding [23]-[26], which allows SNNs to attain accuracy levels similar to those of state-of-the-art (SOTA) ANNs. However, the additional coding diminishes the inherent advantages of SNNs by introducing considerable latency and energy consumption, and is prone to conversion approximation errors. Another recent success involved applying backpropagation [27]-[29], the cornerstone of ANN training methods, to directly train SNNs. This is achieved through employing surrogate gradients, allowing this method to achieve accuracy levels competitive with SOTA ANN models. However, executing these SNN inference computations on digital von-Neumann hardware, such as CPUs and GPUs, is considerably inefficient, due to the computational complexity of temporal dynamics. Efficient hardware implementations have yet to be reported.
In an article entitled “An ultra-low power sigma-delta neuron circuit,” https://www.researchgate.net/publication/331222752, MOSFETs are disclosed that are used in a neuron design that operate in the subthreshold domain. However, this prior-art neuron is difficult to speed up to a time scale of or within nanoseconds because it would require capacitance approaching or smaller than the parasitic capacitance of integrated circuits.
U.S. Pat. No. 6,242,988 also discloses a neuron design that uses a MOSFET as a switch to inhibit the voltage on the capacitor, and hence the firing. However, this method imposes a minimum limit on the spike width to avoid incomplete resetting of the circuit neuron following each firing. As a consequence, the operational speed of this design is restricted to the scale of seconds.
The article “Leaky Integrate and Fire Neuron by Charge-Discharge Dynamics in Floating-Body MOSFET,” Scientific Reports, 7:8257, 2017 discloses the use of the floating-body effect in partially depleted silicon-on-insulator MOSFET to implement the spiking-neuron operations, i.e., integration, leaking, firing, and resetting. However, this approach necessitates the use of an external control circuit to reset the neuron for tens of nanoseconds after each firing, which restricts its operational speed in the time scale of microseconds.
The present invention is an all-analog hardware SNN that can achieve spatiotemporal reasoning in the N-MNIST dataset [30] with an accuracy comparable to SOTA ANN algorithms, while preserving significantly low latency and high energy efficiency. The concept is validated on physical ReRAM arrays and physical analog neuron circuits.
In carrying out the present invention, a new all-analog spiking neural network (SNN) circuit is disclosed. This circuit is designed through a software-hardware codesign approach, and consists of ReRAM-crossbar synapse arrays and custom-designed spike response model (SRM) neuron circuits, built with complementary metal-oxide-semiconductor (CMOS) technology. This SNN hardware achieves 97.78% accuracy on the N-MNIST dataset, which is comparable to SOTA ANN accuracy on the MNIST dataset, using a similar number of parameters (22,360) and an experimentally calibrated model considering the ReRAM's conductance variation and the device variations of analog neuron circuits.
Meanwhile, the SNN hardware promises low latency and high energy efficiency, with an inter-spike interval of 94.75 ps and energy consumption of 1.16 pJ per spike, representing an improvement of one order of magnitude over SOTA designs (˜1 ns [31] and ˜10 pJ [32]). This hardware enables spatiotemporal recognition within 10 ns per N-MNIST sample. In comparison with other SNN implementations that have achieved accuracies of 84% [33], 91.2% [23], and 83.24% [34] on the MNIST dataset, the SNN hardware of the present invention achieves considerably higher accuracy while requiring 100× and 1,000× less inference time per sample, with energy consumption per sample being similar. This is achieved in the more challenging N-MNIST dataset with each sample size over 600× larger than that of the MNIST dataset. Furthermore, when compared with a GPU (NVIDIA GeForce 3090), the SNN implementation of the present invention exhibits 78,400× and 3,700,000× lower latency and higher energy efficiency, respectively, in classifying N-MNIST samples.
The foregoing and other objects and advantages of the present invention will become more apparent when considered in connection with the following detailed description and appended drawings in which like designations denote like elements in the various views, and wherein:
SNN learning algorithms significantly influence the accuracy and efficiency that an SNN implementation can achieve. The SNN implementations employing localized synaptic learning rules, represented by spike-timing-dependent plasticity, can achieve high efficiency, but with limited accuracy comparable to SOTA ANN counterparts.
The use of neural coding schemes to convert artificial neural networks (ANNs) to spiking neural networks (SNNs) can result in SNN implementations achieving accuracy levels close to those of ANNs. This, however, comes at the cost of efficiency, as well as the ease of capturing the temporal dynamics of neuromorphic systems and conversion approximation precision. Direct training of SNNs through spike-based error back-propagation algorithms [27]-[29] allows for the application of approaches that endow ANNs with superior performance to train SNNs. For example, [28] achieved 99.20% accuracy on N-MNIST dataset comparable to the SOTA ANN accuracy on MNIST dataset [35]. However, this algorithm suffers from the high computational complexity of unfolding in time even for inferences. Performing the inference computations on CPU and/or GPU is both time- and energy-exhausting. The present invention takes inspiration from this algorithm and serves as the foundation for the hardware SNNs that achieve comparable SOTA ANN accuracy on the N-MNIST dataset, with four orders and six orders of magnitude improvement in latency and energy efficiency compared to the software SNNs running on a GPU.
A spiking neuron maintains its internal membrane potential by accumulating input spikes over time and encodes this continuous-time membrane potential into output spikes. The Hodgkin-Huxley model [36] describes the spiking neuronal dynamics using differential equations, but this model is computationally expensive, costing 1,200 floating point operations to evaluate 1 ms of model time [37]. By contrast, the widely used leaky integrate-and-fire (LIF) neuron model is the most efficient implementation, taking only 5 floating-point operations to simulate the model for 1 ms [37]. However, the simplicity of the LIF model restricts its ability to exhibit complex spiking behaviors [37], [38], limiting the accuracy that can be achieved. SRM [39] is a simple but versatile spiking neuron model balancing biological plausibility and computational efficiency. SRM describes membrane potential by integrating kernels over incoming spikes from synapses and output spikes from the neuron itself. Appropriate kernel choices enable the SRM to approximate the Hodgkin-Huxley model with a significant increase in energy efficiency and computation speed [40]. The SNN circuits of the present invention are thus based on SRM neurons.
SNNs can be implemented with ReRAM synapses and differential pair integrator circuits [32], [41], which exhibit high energy efficiency but are characterized by relatively slow biological time constants. Accelerating to below the order of microseconds, however, requires significantly smaller capacitors approaching the parasitic capacitance of integrated circuits, making it difficult to work in accelerated time scales. BrainScaleS, a mixed-signal LIF-based SNN system [42], [43], emulates biological network activity at a speed 1,000× faster than biological time scales, enabling the simulation of long-term biological processes within a relatively shorter period. However, further acceleration is constrained by the bandwidth of the on-chip digital communication fabric and the increased power consumption of digital circuits. The operation of their system remains at the millisecond time scale. Two rate-based SNN implementations with ReRAM synapses and CMOS neurons achieved 84% [33] and 91.2% [23] accuracy in MNIST dataset, respectively. However, they lack spatiotemporal reasoning capability or the inherent SNN advantages in latency and energy efficiency.
The present invention is a fully analog hardware SNN that can achieve a comparable SOTA ANN accuracy on the N-MNIST dataset and simultaneously low latency and high energy efficiency, one order of magnitude better than SOTA designs [31], [32] as well as four orders and six orders of magnitude better than a GPU. The present hardware SNN consists of analog memory synapses and custom-designed CMOS SRM neuron circuits and can deploy the SNN model trained by backpropagation, realizing a software-equivalent accuracy.
Forward propagation of the SNN model of the present invention in a layer l with Nl neurons and a weight matrix W(l) [w1, w2, . . . , wNl] is described in Eqs. 1-6:
where * is a convolution operator, s(l)(t)=Σδ(t-ti) is the input spike trains with ti denoting the timing of the ith input spike and δ being the Dirac delta function, ReLU is the rectified linear unit, o(l)(t) is the post-synaptic spike trains, ε is the response kernel accumulating the post-synaptic spike trains on membrane potential, ν is the refractory kernel for inhibiting membrane potential after firing, u(l)(t) is the membrane potential, and ƒs is a threshold function defined as:
where Vth is the threshold. The output spike trains from the neurons are represented by s(l+1)(t).
The response kernel ε and refractory kernel ν are defined below:
where H(t) represents the Heaviside step function, τs and τr denote neuron time constants for response and refractory signals, respectively, and m is a scale factor determining the magnitude of the refractory signal.
Eq. 1 describes synapse networks that extract spatial information and Eqs. 2-6 model spiking neurons that abstract temporal information. The SNN model thus has spatiotemporal reasoning capability. Specifically, the present designed response kernel ε and refractory kernel ν enable the SNN model to have linear temporal dynamics. By shrinking the time constants of the two kernels, the time dimension of our SNN model can be scaled while maintaining the output responses, enabling the SNN system to perform accelerated spatiotemporal inferences and achieve significantly enhanced throughput. The response and refractory kernels are implemented as the simplest first-order low-pass filters to minimize the computational burden during training and the circuit overhead during inferences. More complex kernels can be adopted and implemented with the combination of resistors, capacitors and inductors, resulting in a more biologically plausible neuron model.
For classification tasks, the output neuron producing the greatest number of spikes corresponds to the inferred class. The error in spike count is used as the loss function. In the backpropagation of error, the derivative of the non-differentiable threshold function ƒs is approximated by a surrogate gradient, which is in the form of an exponentially decaying probability density function, and the derivative of the convolution operation is implemented by a correlation operation which accumulates the future losses up to the current time [28].
The schematic of the SNN implementation is shown in
The two time-linear kernels (ε and ν) of the SRM neurons are implemented as passive resistor-capacitor (RC) filters, ε filter and ν filter, which are the core computing units of the neuron circuit. The ε filter (green box in
In operation, the neuron circuit in
An alternative circuit is shown in
A small two-layer convolutional network architecture is used to illustrate the hardware SNN that classifies moving digits in the benchmark N-MNIST dataset and to build a calibrated model for scale analysis. Each sample lasts 300 ms and the size of each sample is 2×34×34×300. The structure of the SNN model according to the present invention (
Parameters of the trained SNN model were then mapped to the conductance of the ReRAM in the synapse arrays. A total of 3,360 ReRAM devices were programmed to run three times and the readout conductance closest to the target conductance was selected. The readout conductance matched well with the target conductance, with the standard deviation of the readout conductance error being 2.49 μS (
A compact neuron model was built according to Eqs. 2-6 and the model was calibrated with the circuit structure and experimental results. An SNN consisting of the physical ReRAM synapses and the calibrated neuron model was then used to classify the same 200 N-MNIST samples in the experiment of
This accuracy is compared to the ideal simulations, where the synapse network and the neurons are replaced with ideal software counterparts respectively (Table. I). The accuracy of the experimentally validated SNN is 2.52% lower than the ideal software SNN. Nonidealities of ReRAM synapses and neuron circuit implementation both introduce errors into the SNN inference, resulting in a loss of accuracy. This accuracy loss is significantly smaller in larger networks.
After validating the invention with physical ReRAM and neurons, the calibrated model was used to estimate the performance when the hardware SNN is implemented with an advanced technology node for scaled problems. The linear temporal dynamics of the SNN model of the present invention allows for the easy scaling of the time dimension of input spike trains s(l)(t), so that the neurons can operate significantly faster while consuming lower energy per spike. Further, the hardware SNN was simulated in Cadence Virtuoso with the TSMC's 65 nm process development kit (PDK). The accelerated input spike train of
The latency and energy consumption was compared between the hardware SNN and the baseline software SNN running on a GPU (NVIDIA GeForce 3090) for N-MNIST classification. The entire inference of the software SNN was performed on one GPU, and the NVIDIA System Management Interface tool was used to estimate the energy consumed by the software SNN. The energy consumed by the hardware SNN was estimated in Cadence Virtuoso with the TSMC's 65 nm design rules. The middle two columns of Table. II show that the baseline software SNN took an average of 504.65 ps to classify each sample, while the hardware SNN spent ˜10.1 ns. The corresponding energy per sample consumed by the software SNN is 42.31 mJ, of which the calculation of synapse networks covers most. By contrast, the hardware SNN took only 3.39 nJ for each sample, of which L1 neurons consumed the vast majority. The hardware SNN of the present invention spent 50,000× less time and 12,500,000× less energy than the GPU. For the hardware SNN, ReRAM synapse arrays consumed a negligible portion of energy because of the passive nature of the ReRAM devices and the narrow spike width, ˜45.35 ps. Also, most of the energy was dissipated in the first layer (L1) neurons because (1) the neuron circuits of the present invention include active operational amplifiers; and (2) L1 neurons generate significantly more spikes than subsequent layer neurons, resulting in more energy consumption.
To demonstrate that the hardware SNN of the invention can achieve higher accuracy, the SNN model was expanded from 2 layers to 3 layers, with a structure: 34×34×2-p2-(12c5-p2)-(64c5-p2)-(10). The number of parameters was increased to 22,360, and the number of neurons was 1,034. This 3-layer software SNN was trained on 60,000 N-MNIST training samples and the accuracy improved to 98.70% on 10,000 testing samples. Due to the limited size of the physical ReRAM array, the 3-layer ReRAM synapse network was simulated by sampling the readout conductance error of the experimentally programmed ReRAM devices. The distribution of sampled error was consistent with the distribution of readout error. Also, the identical hardware neuron model to that in the experimentally-validated SNN was used. This 3-layer hardware SNN achieved 97.78% accuracy in the 10,000 N-MNIST testing samples, 0.92% lower than the corresponding ideal software SNN. Compared to the accuracy loss in the first small 2-layer network, this accuracy loss is smaller in the large network, and expect to reduce further with an even larger network. The right two columns of Table II show this 3-layer hardware SNN takes 78,400× less time and 3,700,000× less energy than the baseline.
The hardware SNNs were also compared with SOTA SNN implementations in Table III. The SNN implementation of the present invention achieves significantly higher accuracy in the N-MNIST dataset than the others in the MNIST dataset. It is worth noting that classifying N-MNIST samples is more challenging than classifying MNIST samples since the network must have the spatiotemporal reasoning capability to manage saccadic movements. Also, each N-MNIST sample is over 600× larger than the MNIST sample. However, the present implementations achieve significantly faster classification per sample, two orders of magnitude faster than the others, and consumed similar energy per sample.
Rather than following the conventional bottom-up approach, the present hardware SNNs were designed with a top-down approach. Inspired by a SOTA SNN training algorithm [28], the neuron model was designed in the form of the simplest first-order low-pass filter. The hardware SNNs were then designed based on the approximate physical characteristics of devices, a so-called top-down approach. This top-down design methodology resulted in hardware SNNs that demonstrate performance comparable to SOTA ANN algorithms and significantly outperform the hardware SNNs developed through the bottom-up approach. The 2-layer and 3-layer hardware SNNs of the present invention achieve 92.40% and 97.78% accuracy on 10,000 N-MNIST testing samples, respectively, comparable to SOTA ANN accuracy. Using larger network structures with more learnable parameters, these hardware SNNs can achieve even better accuracy.
Besides the comparable ANN accuracy, the hardware SNNs of the invention can simultaneously achieve low latency and high energy efficiency. Its neurons can spike at the inter-spike interval of 94.75 ps and at 1.16 pJ energy per spike, one order of magnitude better than SOTA neuron designs [31], [32]. The neuron circuit is designed to have linear dynamics over time. By scaling down the time constants of the two filters, the SNN system can process an accelerated input event stream, enhancing the throughput and reducing the total energy consumption. Meanwhile, the capacitance and resistance required to realize the RC filters can be significantly reduced, avoiding the problem of large component size restricting large-scale integration. The high energy efficiency is attributed to two reasons: (1) the core computing units of the neuron circuit are implemented by passive RC filters; and (2) the spiking neurons can operate at a very high speed, significantly reducing the inference time, and hence the energy consumed. Using advanced technology nodes, such as 28 nm, the latency and energy efficiency can be further improved. Besides, memristive devices [44] and memcapacitive devices [45], [46] can be used to implement the resistors and the capacitors of the two RC filters, resulting in tunable time constants. Thus, a flexible inference speed can be achieved.
The killer applications of SNNs have not been as extensively and intensively investigated as those of ANNs. Performing inferences of SNNs on current CPU and/or GPU computing systems is both time- and energy-consuming. The proposed neuromorphic systems of the present invention can accelerate SNN inferences just as GPUs accelerate ANN inferences, thereby facilitating the search for killer applications of SNNs. The neuromorphic systems can also be used to study neuroscience by performing large-scale biologically plausible simulations faster and more energy-efficiently than conventional von Neumann computing systems. They pave the way toward analog neuromorphic processors for complex real-world tasks.
The present invention provides consistent and reliable spiking behavior. An RC filter consisting of two passive analog components is used to implement the neuron operations of integration and leaking. Functioning as a low-pass filter, the RC filter filters out the inevitable high-frequency noises in integrated circuits, resulting in stable spiking behavior. The passive resistor and capacitor composing this RC filter are also considerably more endurable than the subthreshold-operated MOSFET structures as well as emerging memories/devices and materials in other spiking neuron designs.
In addition, the present invention has scalability for large-scale integration. The neuron operations of integration and leaking are implemented by an RC filter consisting of only two components without any control circuits, and the capacitor size can be significantly reduced, approaching the limit of parasitic capacitance. The spiking neuron design thus has a comparable packing density to that of the current scalable CMOS neuron designs.
In summary, the invention demonstrates that all-analog hardware SNNs can achieve an accuracy comparable to SOTA ANN algorithms on the N-MNIST dataset. The hardware SNNs of the present invention consist of ReRAM synapse arrays and CMOS neuron circuits and can perform accelerate-time inferences at high rates and energy efficiency, with inter-spike interval 94.75 ps and 1.16 pJ energy per spike, enabling the recognition of spatiotemporal patterns within ˜10 ns. This SNN implementation achieves 97.78% accuracy on the N-MNIST dataset and each inference takes 78,400× less time and 3,700,000× less energy than the baseline software SNN running on a GPU (NVIDIA GeForce 3090). Compared with SOTA SNN implementations on MNIST dataset, the present invention achieves significantly higher accuracy on the more difficult N-MNIST dataset. Also, this SNN implementation spends two orders of magnitude less time and consume similar energy to classify each N-MNIST sample that are over 600× larger in size than each MNIST sample.
The cited references in this application are incorporated herein by reference in their entirety and are as follows:
While the invention is explained in relation to certain embodiments, it is to be understood that various modifications thereof will become apparent to those skilled in the art upon reading the specification. Therefore, it is to be understood that the invention disclosed herein is intended to cover such modifications as fall within the scope of the appended claims.
This application claims the benefit of priority under 35 U.S.C. Section 119(e) of U.S. Application No. 63/432,788 filed Dec. 15, 2022, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63432788 | Dec 2022 | US |