The present invention relates to a CMOS-based architecture for implementing a neural network with accelerated learning.
Biological neural networks process information in a qualitatively different manner from conventional digital processors. Unlike the sequence of instructions programing model employed by conventional von Neumann architectures, the knowledge, or the program in a neural network is largely encoded in the pattern and strength/weight of the synaptic connections. This programming model is key to the adaptability and resilience of neural networks, which can continuously learn by adjusting the weights of the synaptic connections.
Multi-layer neural networks are extremely powerful function approximators that can learn complex input-output relations. Backpropagation is the standard training technique that adjusts the network parameters or weights to minimize a particular objective function. This objective function is chosen so that it is minimized when the network exhibits the desired behavior. The overwhelming majority of compute devices used in the training phase and in the deployment phase of neural networks are digital devices. However, the fundamental compute operation used during training and inference is the multiply and accumulate (MAC) operation, which can be efficiently and cheaply realized in the analog domain. In particular, the accumulate operation can be implemented at zero silicon cost by representing the summands as currents and adding them at a common node. Besides computation, a central efficiency bottleneck when training and deploying neural networks is the large volume of memory traffic to fetch and write back the weights to memory.
Novel nonvolatile memory technologies like resistive random-access memories (RRAM), phase change memories (PCM), and magnetoresistive random-access memories (MRAM) have been described in the context of dense digital storage and read/write power efficiency. These types of memories store information in the conductance states of nano-scale elements. This enables a unique form of analog in-memory computing where voltages applied across networks of such nano-scale elements result in currents and internal voltages that are arithmetic functions of the applied voltages and the conductances of the elements. By storing the weights of the neural network as conductance values of the memory elements, and by arranging these elements in a crossbar configuration as shown in
Deep neural networks have demonstrated state-of-the-art performance on a variety of tasks such as image classification and automatic speech recognition. Before neural networks can be deployed, however, they must first be trained. The training phase for deep neural networks can be very power-hungry and is typically executed on centralized and powerful computing systems. The network is subsequently deployed and operated in the “inference mode” where the network becomes static and its parameters fixed. This use scenario is dictated by the prohibitively high power costs of the “learning mode” which makes it impractical for use on power-constrained deployment devices such as mobile phones or drones. This use scenario, in which the network does not change after deployment, is inadequate in situations where the network needs to adapt online to new stimuli, or to personalize its output to the characteristics of different environments or users.
While the use of crossbar memory structures for implementing the inference phase in neural networks has been previously disclosed, the inventive approach provides a complete network architecture for carrying out both learning and inference in a novel fashion based on binary neural networks and approximate backpropagation learning.
According to embodiments of the invention, an efficient compute-in-memory architecture is provided for on-line deep learning by implementing a combination of neural circuits in complementary metal-oxide semiconductor (CMOS) technology, and synaptic conductance crossbar arrays in resistive nonvolatile random-access memory (RRAM) technology+. The crossbar memory structures store the weight parameters of the deep neural network in the conductances of the RRAM synapse elements, which make interconnects between lines of neurons of consecutive layers in the network at the crossbar intersection points. The architecture makes use of binary neurons. It uses the conductance-based representation of the network weights in order to execute multiply and accumulate (MAC) operations in the analog domain. The architecture uses binary neuron activations, and also uses an approximate version of the backpropagation learning technique to train the RRAM synapse weights with trinary truncated updates during the error backpropagation pass.
Disclosed are design and implementation details for the inventive CMOS device with integrated crossbar memory structures for implementing multi-layer neural networks. The inventive device is capable of running both the inference steps and the learning steps. The learning steps are based on a computationally efficient approximation of standard backpropagation.
According to an exemplary embodiment, the inventive architecture is based on Complementary Metal Oxide Semiconductor (CMOS) technology and crossbar memory structures in order to accelerate both the inference mode and the learning mode of neural networks. The crossbar memory structures store the network parameters in the conductances of the elements at the crossbar intersection points (the points at the intersection of a horizontal metal line and a vertical metal line). The architecture makes use of binary neurons. It uses the conductance-based representation of the network weights in order to execute multiply and accumulate (MAC) operations in the analog domain. With binary outputs, it is not necessary to acquire output current by clamping the voltage to zero (virtual ground) on the output lines, and it is sufficient to compare voltage directly against a zero threshold, which is easily accomplished using a standard voltage comparator, such as a CMOS dynamic comparator, on the output lines. With binary inputs driving the input lines, the “sneak path” problem in the analog crossbar array is entirely avoided. The architecture uses an approximate version of the backpropagation learning technique to train the weights in the “weight memory structures” in order to minimize a cost function. During the backward pass, the approximate backpropagation learning uses ternary errors and a hardware-friendly approximation of the gradient of the binary neurons.
The inventive approach represents the first truly integrated RRAM-CMOS realization that supports fully autonomous on-line deep learning. The ternary truncated error backpropagation architecture offers a hardware-friendly approximation of the true gradient of the error in the binary neurons. Through the use of binary neurons, ternary errors, approximate gradients, and analog-domain MACs, the developed device achieves compact and power-efficient learning and inference in multi-layer networks.
In one aspect of the invention, a neural network architecture for inference and learning includes a plurality of network modules, each network module comprising a combination of CMOS neural circuits and RRAM synaptic crossbar memory structures interconnected by bit lines and source lines, each network module having an input port and an output port, wherein weights are stored in the crossbar memory structures, and wherein learning is effected using approximate backpropagation with ternary errors. The CMOS neural circuits include a source line block having dynamic comparators, so that inference is effected by clamping pairs of bit lines in a differential manner and comparing within the dynamic comparator voltages on each differential bit line pair to obtain a binary output activation for output neurons. The comparison may be performed in parallel across all source line pairs. In a preferred embodiment, the use of binary outputs obviates the need for virtual ground nodes.
The architecture may further include a plurality of switches disposed within the bit lines and source lines between adjacent network modules, so that closing a switch in bit lines between adjacent network modules creates a layer with additional input neurons and closing a switch in source lines between adjacent network modules creates a layer with additional output neurons. A plurality of routing switches may be configured to connect input ports and output ports of the network modules to flow binary activations forward and binary errors backward.
In another aspect of the invention, a neural network architecture configured for inference and learning includes a plurality of network modules arranged in an array, where each network module is configured to implement lines and one or more layers of binary neurons via a combination of CMOS neural circuits and a conductance crossbar array configured to store synapse elements weights, wherein crossbar intersections within the crossbar array define interconnects between lines of neurons of consecutive layers in the network structure, and wherein the synapse element weights are trained using backpropagation with trinary truncated updates. The crossbar intersections correspond to intersections between bit lines and source lines, and the CMOS neural circuits include a source line block having dynamic comparators, so that inference can be effected by clamping pairs of bit lines in a differential manner and comparing within the dynamic comparator voltages on each differential bit line pair to obtain a binary output activation for output neurons. The comparison can be performed in parallel across all source line pairs. The use of binary outputs allows sneak paths to be avoided without relying on virtual ground nodes. A plurality of switches may be disposed within the bit lines and source lines between adjacent network modules so that closing a switch in bit lines between adjacent network modules creates a layer with additional input neurons and closing a switch in source lines between adjacent network modules creates a layer with additional output neurons. A plurality of routing switches may be provided to connect input ports and output ports of the network modules to flow binary activations forward and binary errors backward.
In still another aspect of the invention, a compute-in-memory CMOS architecture includes a combination of neural circuits implemented in complementary metal-oxide semiconductor (CMOS) technology and synaptic conductance crossbar memory structures implemented in resistive nonvolatile random-access memory (RRAM) technology. The crossbar memory structures store weight parameters of a neural network in the conductances of synapse elements at crossbar intersection points, wherein the crossbar intersection points correspond to interconnects between lines of neurons of consecutive layers in the network. The crossbar intersection points correspond to intersections between bit lines and source lines, and the CMOS neural circuits include a source line block having dynamic comparators, so that inference can be effected by clamping pairs of bit lines in a differential manner and comparing within the dynamic comparator voltages on each differential bit line pair to obtain a binary output activation for output neurons. The comparison can be performed in parallel across all source line pairs. The use of binary outputs allows sneak paths to be avoided without relying on virtual ground nodes.
The architecture can be used to form an array of network modules, where each module includes the combination of neural circuits implemented in CMOS technology and synaptic conductance crossbar memory structure implemented in RRAM technology, and a plurality of switches is disposed within the bit lines and source lines between adjacent network modules so that closing a switch in bit lines between adjacent network modules creates a layer with additional input neurons and closing a switch in source lines between adjacent network modules creates a layer with additional output neurons. A plurality of routing switches may be configured to connect input ports and output ports of the network modules to flow binary activations forward and binary errors backward.
The inventive architecture is applicable to virtually all domains of industrial activity and product development that are now heavily investing in deep learning and artificial intelligence (DL/AI) technology to automate the range of functionalities offered to the customer. Self-learning microchips fill an important gap between the bulky, power-hungry computer hardware of central/graphical processor unit (CPU/GPU) clusters running DL/AI algorithms running in the cloud, and the need for ultra-low power for internet-of-things (IoT) running on the edge.
FIG. is a block diagram of a basic building blocks for a network module (NM) according to an embodiment of the invention.
According to an embodiment of the inventive architecture, an array of network modules is assembled using CMOS technology. The array is an N×M array of conductance-based memory elements 18 arranged in a crossbar configuration and connected through switches.
The network module (NM) 20 implements a whole layer or part of a layer of a neural network with N/2 input neurons and M/2 output neurons (2 input neurons (x1, x2) and 2 output neurons y1, y2) are included in the example shown in
Forward pass (inference): Referring still to
Backward pass: The NMs 20 collectively implement an approximate version of backpropagation learning where errors from the top layer are backpropagated down the stack of layers and used to update the weights of the memory elements. The approximation has two components: approximating the back-propagating errors by a ternary value (−1, 0, or 1), and approximating the zero gradient of the neuron's binary activation function by a non-zero value that depends on the neuron's activation and the error arriving at the neuron.
1) The NM 20 receives binary errors (−1, +1) in a bit-serial fashion through the VL line 28. The binary errors are stored in latches in the SL block 29.
2) The NM 20 carries out an XOR operation between a neuron's activation bit and the error bit to obtain the update bit. If the update bit is 0, this means the activation and the error have the same sign and changing the neuron's activation is not required to reduce the error. Otherwise, if the update bit is 1, the error bit has a different sign than the activation and the neuron's output need to change to reduce the error. The ternary error is obtained from the update bit and the binary error bit: if the update bit is 0, the ternary error is 0, otherwise the ternary error is +1 if the binary error is +1 and −1 if the binary error is −1.
3) The ternary error calculated from the previous step at each output neuron (for example y1, y2) are used to clamp the differential source lines corresponding to each neuron as shown in
4) In the forward step and all the previous steps in the backward pass, the applied voltages are small enough to avoid perturbing the conductances of the memory elements 18, i.e., avoid perturbing the weights. In this step, we apply voltages simultaneously on the BLs 24 and the SLs 22 so as to update the conductance elements' values based on the activations of the input neurons (x1 and x2) and the ternary errors at the output neurons (y1 and y2).
The CMOS architecture neural network disclosed herein provides for inference and learning using weights stored in crossbar memory structures, where learning is achieved using approximate backpropagation with ternary errors. The inventive approach provides an efficient inference stage, where dynamic comparators are used to compare voltages across differential wire pairs. The use of binary outputs allows sneak paths to be avoided without having to rely on clamping to virtual ground.
The inventive approach represents the first truly integrated RRAM-CMOS realization that supports fully autonomous on-line deep learning. The ternary truncated error backpropagation architecture offers a hardware-friendly approximation of the true gradient of the error in the binary neurons. Through the use of binary neurons, ternary errors, approximate gradients, and analog-domain MACs, the developed device achieves compact and power-efficient learning and inference in multi-layer networks.
The inventive architecture is applicable to virtually all domains of industrial activity and product development that are now heavily investing in deep learning and artificial intelligence (DL/AI) technology to automate the range of functionalities offered to the customer. Self-learning microchips fill an important gap between the bulky, power-hungry computer hardware of central/graphical processor unit (CPU/GPU) clusters running DL/AI algorithms running in the cloud, and the need for ultra-low power for internet-of-things (IoT) running on the edge.
This application claims the benefit of the priority of U.S. Provisional Application No. 62/700,782, filed Jul. 19, 2018, which is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/042690 | 7/19/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62700782 | Jul 2018 | US |