Aspects of the disclosure are related to the field of artificial neuron circuitry in artificial neural networks.
Artificial neural networks (ANN) can be formed from individual artificial neurons that are emulated using software, integrated hardware, or other discrete elements. Neuromorphic computing can employ ANNs, which focuses on using electronic components such as analog/digital circuits in integrated systems to mimic the human brain, and to attempt a greater understanding of the neuro-biological architecture of the neural system. Neuromorphic computing emphasizes implementing models of neural systems to understand how the morphology of individual neurons, synapses, circuits, and architectures lead to desirable computations. Such bio-inspired computing offers enormous potential for very low power consumption and high parallelism.
Many neuromorphic computing projects that have been carried on including BrainScaleS, SpiNNaker, and the IBM TrueNorth which use semiconductor based random-access memory to emulate the behavior of biological neurons. More recently, emerging non-volatile memory devices, including phase change memory, resistive memory, and magnetic random-access memory, have been proposed to be used to emulate biological neurons as well. Resistive memory technologies in particular have become possible using new materials which have alterable resistance or conductance properties that persist after application of an electric voltage or current.
Unfortunately, various noise effects can occur during neural network operations for neuromorphic computing systems that employ non-volatile memory devices to emulate biological neurons. These noise effects can be significant when designing hardware components for machine learning, among other ANN applications. Also, these sources of noise can have detrimental effects on ANN inference and training operations.
Enhanced techniques and circuitry are presented herein for artificial neural networks. These artificial neural networks are formed from artificial neurons, which in the implementations herein comprise a memory array having non-volatile memory elements. Neural connections among the artificial neurons are formed by interconnect circuitry coupled to control lines of the memory array to subdivide the memory array into a plurality of layers of the artificial neural network. Control circuitry is configured to transmit a plurality of iterations of an input value on input control lines of a first layer of the artificial neural network for inference operations by at least one or more additional layers. The control circuitry is also configured to apply an averaging function across output values successively presented on output control lines of a last layer of the artificial neural network from each iteration of the input value.
Many aspects of the disclosure can be better understood with reference to the following drawings. While several implementations are described in connection with these drawings, the disclosure is not limited to the implementations disclosed herein. On the contrary, the intent is to cover all alternatives, modifications, and equivalents.
Artificial neural networks (ANN) have been developed to process sets of complex data using techniques deemed similar to biological neurons. Biological neurons characteristically produce an output in response to various synaptic inputs to the neuron cell body, and some forms of artificial neurons attempt to emulate this behavior. Complex networks of artificial neurons can thus be formed, using artificial neural connections among artificial neurons as well as properties of these artificial neurons to process large sets of data or perform tasks too complex for conventional data processors, such as machine learning.
ANNs can be formed from individual artificial neurons that are emulated using software, or from integrated hardware and discrete circuit elements. As discussed herein, artificial neurons can comprise individual memory elements, such as non-volatile memory elements, or might be represented using other types of memory elements or software elements. Artificial neurons are interconnected using artificial neural connections, which are referred to herein as neural connections for clarity. These neural connections are designed to emulate biological neural synapses and axons which interconnect biological neurons. These neural connections can comprise electrical interconnects, such as wires, traces, circuitry, and various discrete or integrated logic or optical interconnects. When memory elements are employed to form artificial neurons, then these neural connections can be formed in part by control lines of any associated memory array. These control lines can include input control lines that introduce data to artificial neurons, and output control lines which receive data from artificial neurons. In specific implementations, the control lines may comprise word lines and bit lines of a memory array.
Various types of ANNs have been developed, which typically relate to topologies for connecting artificial neurons as well as how data is processed or propagated through an ANN. For example, feedforward ANNs propagate data through sequential layers of artificial neurons in a ‘forward’ manner, which excludes reverse propagation and loops. Fully-connected ANNs have layers of artificial neurons, and each artificial neuron is each connected to all artificial neurons of a subsequent layer. Convolutional neural networks (CNNs) are formed by multiple layers of artificial neurons which are fully connected and propagate data in a feed-forward manner.
The process of propagating and processing data through an ANN to produce a result is typically referred to as inference. However, many ANNs must first be trained before data sets can be processed through the ANN. This training process can establish connectivity among individual artificial neurons as well as data processing properties of each artificial neuron. The data processing properties of artificial neurons can be referred to as weights or synaptic weights. Synaptic weights indicate a strength or amplitude of a connection among two artificial neurons. This can correspond to an amount of influence that firing a first artificial neuron has on another artificial neuron.
Various implementations have been developed to form ANNs that execute machine learning tasks, among other data processing tasks within an ANN framework. For example, a conventional central processing unit (CPU) can typically process very complex instructions efficiently but can be limited in the amount of parallelism achieved. However, in machine learning computation, especially training tasks, the basic operation is vector matrix multiplication, which is a simple task performed an enormous amount of times. A graphics processing unit (GPU), which has started to gain favor over CPUs, uses a parallel architecture and can handle many sets of very simple instructions. Another emerging implementation uses an application specific integrated circuits (ASIC), which can implement a tensor processing unit (TPU) that is efficient at executing one specific task. As machine learning becomes more integrated into more applications; interest has grown into making special purpose circuitry that can efficiently handle machine learning tasks.
Another concern for implementing machine learning is electrical power consumption. A machine learning task can take GPUs or TPUs up to hundreds of watts to execute. In contrast, the human brain can execute similar cognitive tasks by using only around 20 watts. Such power-hungry disadvantages have inspired the study of biologically-inspired or brain-inspired approaches, such as neuromorphic computing, to deal with machine learning limitations.
Neuromorphic computing can employ ANNs, and focuses on using electronic components such as analog/digital circuits in very-large-scale-integration (VLSI) systems to attempt to mimic the human brain, especially in trying to understand and learn from the neuro-biological architecture of the neural system. Neuromorphic computing emphasizes on implementing models of neural systems and understand how the morphology of individual neurons, synapses, circuits, and architecture lead to desirable computations. Such biologically inspired computing offers enormous potential for very low power consumption and high parallelism. Related research has been done to study the spiking neural networks, and synaptic learning rules such as spike-timing dependent plasticity. Many neuromorphic computing projects that have been carried on for several years including BrainScaleS, SpiNNaker, and the IBM TrueNorth which use SRAM or SDRAM to hold synaptic weights.
More recently, emerging non-volatile memory devices, including phase change memory (PCM), resistive random-access memory (RRAM or ReRAM), and magnetic random-access memory (MRAM) formed from magnetic tunnel junctions (MTJs), have been proposed to be used to emulate synaptic weights as well. These devices fall into the broad category of memristor technology and can offer very high density and connectivity due to a correspondingly small footprint. Resistive memory technologies, such as those in the aforementioned memristor category, have become possible using new materials which have alterable resistance states or conductance states that persist after application of an electric voltage or current. Memristors and other related resistive memory devices typically comprise electrical components which relate electric charge to magnetic flux linkage, where an electrical resistance of a memristor depends upon a previous electrical current or voltage passed by the memristor.
Non-volatile memory (NVM) elements representing synaptic weights of artificial neural networks will be considered below, although the enhanced circuitry and techniques can be applied across other circuit types and ANN topologies. Individual NVM elements can be formed into large arrays interconnected via control lines coupled to the NVM elements. In some examples, these control lines can include bit line and word line arrangements, but the control lines, in other embodiments, can include other elements and interfaces with other memory array arrangements. In the examples herein, non-volatile memory (NVM) arrays are employed to implement various types of ANNs. Specifically, resistive memory elements are organized into addressable arrays of artificial neurons used to form an ANN. Control line connections, can be used to not only write and read the NVM elements in an array, but also to logically subdivide the NVM array into logical sub-units of an ANN, referred to as layers. These layers may each comprise an arbitrary quantity of NVM elements, which typically are determined by a desired quantity of artificial neurons, or nodes, for each layer. Typically, the arbitrary quantity of NVM elements is the same in each layer, but other embodiments may use different numbers of NVM elements in each layer. In some examples, nodes of each layer can comprise entire memory pages of an NVM array, or might span multiple memory pages. Furthermore, nodes of a layer might instead only employ a subset of the NVM elements for particular memory pages, and thus a single memory page might be shared among layers. In further examples, the NVM elements might not employ traditional memory page organization, and instead comprise a ‘flat’ array of column/row addressable elements.
As mentioned above, artificial neural networks can be formed using large collections of artificial neurons organized into distinct layers of artificial neurons. These layers can be combined into an arrangement called a deep neural network, among other arrangements. Deep neural networks typically include an input layer, and output layer, and one or more intermediate layers between the input and output layers. These intermediate layers are referred to as hidden layers. Deep neural networks are popular in machine learning, especially for image classification, object detection or speech recognition applications. Deep neural networks are one of the most widely used deep learning techniques. Deep feedforward neural networks like convolutional neural networks (CNNs) or multi-layer perceptions (MLPs) are suitable for processing static patterns such as images. Recurrent deep neural networks like long short-term memory (LSTM) are good at processing temporal data like speech.
Various noise effects can occur during deep neural network training and inference for neuromorphic computing, as well as for other ANN operations. These noise effects can be significant when designing hardware components for machine learning. Two types of noise include forward propagation noise and weight update noise, and will be discussed in more detail below. However, these sources of noise can have detrimental effects on inference operations and may also be detrimental to training operations in some instances. In the enhanced circuitry and techniques presented herein, a pipelining approach can reduce at least forward propagation noise in artificial neural networks. Advantageously, these enhanced pipelined ANNs can increase classification accuracies of inference operations, and potentially approach ideal levels comparable to the Modified National Institute of Standards and Technology database (MNIST) test performed when no noise is present.
Sources of noise in various circuitry, such as forward propagation noise and weight update noise, is now discussed. The basic training operations of a deep feedforward neural network, such as a multi-layer perceptron, can be classified into several categories: forward propagation, computing cost, backward propagation, and parameter updates. The basic inference operations include: forward propagation, feeding a resultant log it vector into a “softmax” layer, and determining a prediction as a result with the highest probability. A softmax layer is employed in artificial neural networks to present a result that is normalized over a target numerical range. For example, a probability can be presented from values of 0 to 1, and a softmax layer can interpret output values from an artificial neural network and normalize these output values over the scale of 0 to 1. Other scales and normalization functions can be applied in a softmax layer.
One source of noise is weight update noise. This source of noise can come from artificial neurons that store synaptic weights in an NVM array. These artificial neurons can be formed by memory devices that have variation effects when synaptic weight updates are made. During training, the synaptic weights are updated during each training epoch. During inference, the synaptic weights are updated only once when we program previously trained synaptic weights from software or a storage device into the initial array. Mitigation solutions to weight update noise are beyond the scope of this discussion.
Another source of noise is forward propagation noise. Forward propagation noise can arise at the circuit and device level, which might affect both the training and inference stages of operating artificial neural networks. Specifically, in a deep neural network with several fully connected layers, forward propagation is conducted in every layer by calculating a vector matrix multiplication of values input to a layer. The input values can comprise an input image or activations from previous layer, and stored weights. In the NVM array examples herein, input values are represented by voltages fed into input control lines comprising word lines of the NVM array, and stored weights are represented by present conductance values or conductance states of NVM elements in the NVM array. An NVM array utilized in this manner can be referred to as a weight memory array. The vector matrix multiplication result of each layer is then read out from associated output control lines comprising output control lines of the NVM array in the form of electrical current values, Forward propagation noise might arise from analog-to-digital converter (ADC) processes, among other sources. In the ANNs discussed herein, ADCs can connect to electrical current outputs from output control lines and convert the analog electrical current outputs to digital representations for transfer to digital peripheral circuits.
Forward propagation noise comprises signal noise that arises during forward propagation operations, which is typically of the Gaussian noise type, among others. This noise can include analog and digital noise introduced by various circuit elements of an ANN, such as layer interconnect circuitry, ADC circuit elements, and other circuit elements. Forward propagation noise can be mathematically represented at the input of the activation function: Wx+b, also called the pre-activation parameter. W is the weight matrix, x is the activations from a previous layer (or input data of a first layer), and b is the bias vector. Without the noise, the activation function of the linear part of a layer forward propagation should be:
ƒ(Wx+b)
After adding the forward propagation noise, the activation function becomes:
ƒ(Wx+b+Z),Z˜N(0,σ2), where σ=β(Wx+b)
β is the forward propagation noise in percentage.
The effect of forward propagation noise on training and inference can be seen in graph 600 of
In some instances, forward propagation noise can be reduced by modifying an associated neural network training method. However, in the examples herein, neural network training is not modified, and an enhanced pipelining technique on inference is employed. This example pipelining can improve inference accuracy by reducing forward propagation noise within deep neural networks. As the domain of artificial intelligence is an emerging application for non-volatile memory (NVM), these pipelining examples herein can improve performance of an associated neural network. Pipelining can also reduce total run time, as discussed below.
Turning now to circuit structures that can be used to implement enhanced artificial neural networks,
Also shown in
Control circuitry 130 comprises various circuitry and processing elements used for introduction of input data to memory array 110 and interpretation of output data presented by memory array 110. The circuitry and processing elements can include activation functions, softmax processing elements, log it vector averaging circuitry, forward propagation noise reduction function circuitry, and storage circuitry. Control circuitry 130 can provide instructions, commands, or data over control lines 161 to interconnect circuitry 120. Control circuitry 130 can receive resultant data determined by memory array 110 over lines 162. Interconnect circuitry 120 can apply any adjustments or signal interpretations to the signaling presented by output control lines 164 before transfer to control circuitry 130. Output data can be transmitted to one or more external systems, such as a host system, over link 160. Moreover, input data can originate over link 160 from one or more external systems before training or inference by the ANN.
Control circuitry can also include one or more memory elements or storage elements, indicated in
Memory array 110 comprises an array of memory devices, specifically non-volatile memory devices. In this example, these NVM devices comprise memristor-class memory devices, such as memristors, ReRAM, MRAM, PCM, or other device technologies. The memory devices may be connected into an array of columns and rows of memory devices accessible using selected word lines and bit lines. However, other memory cell arrangements might be employed and accessed using input control lines 163 and output control lines 164. Memory array 110 can be used to implement a single layer of an artificial neural network, or instead might implement a multi-layer ANN. Each layer of an ANN is comprised of a plurality of nodes or artificial neurons. Each artificial neuron corresponds to at least one NVM element in memory array 110. In operation, individual NVM elements in memory array 110 store synaptic weights, loaded from memory 131 by control circuitry 130, with values established at least by training operations.
Layers, as used herein, refers to any collection or set of nodes which share a similar data propagation phase or stage in an ANN interconnection scheme. For example, nodes of a layer typically share similar connection properties with regard to preceding layers and subsequent layers. However, layering in an ANN, in certain embodiments, may be a logical organization of nodes of the ANN which layers can differ depending upon the ANN topology, size, and implementation. Input layers comprise a first layer of an artificial neural network which receives input data, input values, or input vectors for introduction to the ANN. Typically, an input layer will have a quantity of input nodes which correspond to a size or length of an input value/vector. These input nodes will then be connected to a subsequent layer according to a connection style, such as fully connected or partially connected, and the like. Layers that lie in between an input layer and an output layer are referred to as intermediate layers or ‘hidden’ layers. Hidden layers and hidden nodes are referred to as ‘hidden’ because they are hidden, or not directly accessible for input or output, from external systems. Various interconnection styles can be employed for nodes in hidden layers as well, such as fully connected or partially connected. Finally, an output layer comprises a final layer, or last layer, of nodes of the ANN which receives values from a last hidden layer or a last intermediate layer and presents these values as outputs from the ANN. The quantity of nodes in the output layer typically correspond to a size or length of an output value. These output values are commonly referred to as log its or log it vectors, and relate to a prediction made by the ANN after inference processes have propagated through the various hidden layers. The log it vectors can be further processed in an additional layer, referred to commonly as a softmax layer, which scales the log it vectors according to a predetermined output scale, such as a probability scale from 0 to 1, among others.
In one example operation, ANN 140 can be operated in a non-pipelined manner. In this non-pipelined example, a single instance of input data is introduced at input layer 141 and propagates through hidden layers before an output value is presented at output layer 145. Output layer 145 can pass this output value to an optional softmax process or softmax layer which normalizes the output value before transfer to an external system as a prediction result. The total time to propagate this single instance of input data through ANN 140 takes ‘m’ time steps, one for each layer of ANN 140. However, propagating a single instance of the input through ANN 140 might lead to an increased influence of forward propagation noise in the output value from each layer of ANN 140.
ANN 140 can also be operated in an enhanced pipelined manner. Configuration 101 illustrates a pipelined operation of ANN 140. In configuration 101, several layers have been established which comprise one or more artificial neurons. These layers can include input layer 141, one or more hidden layers 142-144, and output layer 145. The interconnection among these layers can vary according to implementation, and the pipelining techniques can apply across various amounts of layer interconnection.
In the pipelined operation, a particular input value can be propagated through the layers of ANN 140 as part of an inference operation. However, more than one instance of this input value can be iteratively introduced to input layer 141. Specifically, control circuitry 130 presents an input value more than one time to input layer 141. ANN 140 produces an output value for each instance or iteration of the input value propagated through ANN 140. As shown in configuration 101, output values T1, T2, and Tn can result from the same input value introduced to ANN 140 a target quantity of times. However, output values T1, T2, and Tn will typically vary even though output values T1, T2, and Tn result from the same input value. This variation is due in part to forward propagation noise which can arise in circuitry between layers of ANN 140.
Although each output value might be employed by control system 130 or one or more external systems, in this example noise reduction function 150 is employed. Noise reduction function 150 stores or buffers each output value produced from a particular input value, which might span several instances or iterations of the same input data. After a target quantity of iterations completes, then noise reduction function 150 executes a noise reduction process to reduce at least the forward propagation noise in the output values. Noise reduction function 150 thus produces a noise-reduced result for ANN 140.
As used herein, noise reduction function refers to a style, form, or type of digital, optical, or analog electrical signal noise reducing process, function, or feature, and associated circuits, circuit elements, software elements, and the like that perform the noise reduction function. In one example, a noise reduction function comprises an averaging function applied across more than one output value or more than one set of output values. In certain example noise reduction functions, various weightings or scaling among the output values might be applied to give preference to one or more of the output values over time. Other noise reduction functions can include, but are not limited to, various noise filters, ‘companding’ (compression/expansion) functions, noise limiter functions, linear or non-linear filters, smoothing filters, high-pass or low-pass filters, Gaussian filtering, smoothing filters, wavelet filters, statistical filters, machine-learning based filtering function, or anisotropic diffusion, among others.
Turning now to an addition discussion on the operation of elements of
In operation, control circuitry 130 transmits (201) an input value to input layer 141 of artificial neural network (ANN) 140. The input value might comprise a digital representation of image data or a portion of image data, among other data to be processed by ANN 140. This input value can be processed by ANN 140 according to the synaptic weights and neural connections that form the various layers of ANN 140 in a process called inference. To initiate this inference process, control circuitry 130 transfers the input value over links 161 for presentation to artificial neurons comprising input layer 141 formed in memory array 140. Interconnect circuitry 120 presents the input value as a vector of input voltages over at least a portion of input control lines 163 that correspond to NVM elements in input layer 141. The input voltages can vary depending upon the requirements of the memory array technology, but typically comprise a binary representation of the input value.
Control circuitry 130 and interconnect continues to present (202) the input value for a target quantity of iterations. Each iteration comprises a period of time for the input data to propagate through input layer 141 to a subsequent layer of ANN 140, such as hidden layer 142 in
ANN 140 propagates (203) successive input value iterations through hidden layers of ANN 140. As seen in
However, the simplified view of ANN 140 in configuration 101 of
Each individual layer will process a layer-specific input value introduced on associated input control lines to produce a layer-specific result on associated output control lines. The layer-specific result will depend in part on the connectivity among layers, established by interconnection configurations in interconnect circuitry 120 and control circuitry 130. The layer-specific result will also depend in part on the synaptic weights stored in the individual NVM elements. The synaptic weights for each layer are programmed by control circuitry 130, such as from synaptic weights stored in memory 131. When resistive memory elements are employed, the synaptic weights can be stored as conductance values or conductance states, which comprises the memory values stored in each NVM memory element. Each NVM element might have a plurality of input connections from a preceding layer, which are represented by input voltages on corresponding input control lines. Forward propagation operations in each layer are thus conducted by calculating a vector matrix multiplication of input voltages on corresponding input control lines for each NVM element and stored synaptic weights. This vector matrix multiplication result is presented on output control lines of the layer as analog electrical ent values.
Circuit elements in interconnect circuitry 120 and control circuitry 130 convert received output control line currents in an analog format into digital representations. First, output control lines can be coupled to sense amplifier circuitry to convert the electrical currents into voltage representations. Then, analog-to-digital converter (ADC) circuitry can convert the electrical voltage representations into digital representations. Various operations can be performed on these digital representations, such as when the present digital representations comprise output values for ANN 140 from output layer 145. Also, various activation functions might be applied. If the digital representations correspond to an intermediate layer, such as a hidden layer, then the digital representations might be presented onto input control lines of a subsequent layer for propagation operations by that subsequent layer. Noise can be introduced by any of the circuit elements involved in the layer interconnection and intermediate value sensing/conversion processes discussed above. This noise comprises forward propagation noise, and can reduce an accuracy in a final result produced by ANN 140.
To reduce the influence of forward propagation noise, the pipelined approach shown in configuration 101 is employed. This pipelined approach produces several output values (T1, T2, . . . Tn), which can all vary due to variations in forward propagation noise encountered by each successive instance of an input value. Control circuit 130 receives these output values and determines (204) a result by applying noise reduction function 150 across values presented by output layer 145 of ANN 140. This result comprises a noise-reduced result, which applies a noise reduction function over output values T1, T2, . . . Tn. In some examples, the noise reduction function comprises an averaging function applied over all output values resultant from a particular input value. However, the noise reduction function might be another function that assigns weights or confidence levels among different instances of output values according to various factors, such as estimated noise for each instance, level of interconnect employed among the layers, number of layer neurons for each layer, anticipated noise levels in ADC circuitry, or other factors, including combinations thereof.
This noise-reduced result can then be transferred for use in various applications. For example, when image data is used as the input value, then this result might be used in machine learning applications, image processing or image recognition applications, or other applications. Moreover, the result might be a partial result which is combined with other results from other input values pipelined through ANN 140.
In further operations, control circuit 130 can select a target quantity of propagations through the artificial neural network for each instance of input data or input values. For example, control circuit 130 can be configured to select a target quantity of propagations for an averaging function to bring a forward propagation noise of the artificial neural network to below a threshold level. Control circuit 130 can be configured to select a quantity of the successive instances to reduce the forward propagation noise and reach at least a target inference accuracy in the result, or select the target quantity of iterations to reduce forward propagation noise of the artificial neural network and reach a target inference accuracy in the result. Example target inference accuracy which relates to a target quantity of successive instances of an input value can be seen in graph 601 of
Turning now to another implementation of an artificial neural network,
As discussed herein, artificial neural networks can be implemented in hardware. This hardware can generate noise from associated circuits or devices used to implement the artificial neural network. In an artificial neural network with several fully connected layers, forward propagation is conducted in every layer by calculating a vector matrix multiplication of stored weights and input values, where the input values are either from an input value presented to an input layer or intermediate activations from a previous layer.
Forward propagation can take the mathematical form of calculating f(w*X+b), where f: an activation function comprising terms of X: input, w: synaptic weights, and b: biases. The input values are usually represented by voltages fed into word lines of a layer, and stored weights are represented by conductance states or conductance values in a weight memory array. A weight memory array might comprise an array of NVM devices, such as memristors, coupled via associated word lines and bit lines. The vector matrix multiplication is read out from the bit lines in the form of current values. The forward propagation noise can be introduced by at least analog-to-digital converters (ADCs) which connect a present layer output from bit lines to digital peripheral circuits, among other circuit elements. Distorted results can appear after the aforementioned vector matrix multiplication due in part to this circuit and device noise introduced during forward pass. This forward propagation noise can harm inference accuracy of the ANN.
Turning now to a discussion on the elements of
Elements of an individual exemplary layer are shown in
Word line decoder and driver digital to analog converters (DACs) 311 receive input data over links 340 from a preceding layer of an ANN, or from a control system when used on an input layer. When received in an analog format, then DACs can be included to convert this analog format into a digital voltage used to drive word lines 341. Word line decoder elements can be included to drive specific word lines associated with the present layer. When a large memory array, such as an NVM array, is employed, then many layers might share the same NVM array. Subsets of the memory elements of the NVM array can correspond to individual layers, and thus word line decoders can use an address or control signal to only drive input values onto corresponding word lines of the particular layer.
Non-volatile memory (NVM) synaptic weight array 312 comprises an array of memory elements, such as resistive memory elements, memristors, MRAM elements, PCM elements, or others. Moreover, memory elements of NVM synaptic weight array 312 are configured to store values corresponding to synaptic weights of particular layers of an ANN. These values can be pre-loaded before inference operations by a control system. The synaptic weights can be determined during a training process for the associated ANN initiated by a control system, or might established by software models or algorithmic processes. Each layer can have a corresponding set of synaptic weights. Synaptic weight refers to a strength or amplitude of connection between two artificial neurons, also referred to as nodes. Synaptic weight corresponds to the amount of influence that a biological neuron has on the firing of another.
Column multiplexer (mux) 313 is employed in read operations to select bit lines of NVM synaptic weight array 312 for each layer. Column mux 313 can select among bit lines 342 to present values read from selected bit lines on links 343. Analog or digital accumulation circuit 314 receives read values on links 343 from column mux 313 and can buffer or store values temporarily before conversion into a digital format by multi-bit sense amplifiers or analog-to-digital converters (ADCs) 315. When sense amplifiers are employed, the sense amplifiers can sense read values presented on the bit lines and condition or convert these read values received over links 344 into logic levels, perform current-to-voltage conversion processes, or condition read values, among other operations. ADCs can convert analog representations of the read values into digital representations which represent the read values, such as those read values converted by sense amplifier portions. ADCs can output the digital representations over links 345 for input to one or more activation functions 316 or for input to one or more subsequent layers when activation functions 316 are not employed.
Typically, an activation function provides a behavioral definition for an artificial neuron, referred to as a node. The digital representations received over links 345 can be used as inputs to the activation function which define an output of each artificial neuron. As the activation function can define behavior among artificial neurons in response to inputs, any result from the activation function is considered an output of the artificial neuron. The outputs of the activation function are then used to drive another layer of artificial neurons over links 346. When the activation function is on a last layer or output layer, then the output of the activation function can be considered an output of the ANN.
In operation, layers of an ANN will be interconnected according to layer connections, and data will propagate through the ANN and be altered according to synaptic weights and activation functions of each layer and associated nodes. Layers of
Specifically,
As mentioned herein, nodes each comprise an artificial neuron, and are represented by nodes 460 in
When used in pipelining techniques described herein, the same input vector 401 might be presented to ANN 400 for more than one iteration. As outputs log it vectors 455 are generated by ANN 400 on output layer 450, averaging function 470 can buffer each log it vector 455 over the course of the iterations of the same input vector. Once the predetermined quantity of iterations completes propagation and computation through ANN 400, then a noise reduced result can be presented to softmax layer 480. In
Before inference operations by ANN 400, indicated by propagation and computation operations in
In
During inference, the examples herein run a deep neural network for ‘k’ (for k>1) times for each input image, and log it vector noise is averaged out over ‘k’ outputs before feeding it into a final softmax layer to get a final prediction probability. Specifically, an average is taken of the log it vectors at the output layer (before a final softmax layer) to get a final prediction probability. Thus, each image input can be run for ‘k’ times instead of just 1 time to increase accuracy in final prediction probability.
To reduce total run time, a pipelining approach is presented in
Continuing with the discussion regarding pipelining in an artificial neural network, this enhanced pipelining technique differs from other computing pipelining techniques. Data is introduced to the artificial neural network pipeline via an input layer. A predetermined number ‘n’ of internal hidden layers is used within the neural network pipeline, followed by an output layer. The number of hidden layers ‘n’ can be selected based on the application, implementation, complexity, and depth of the neural network, and can vary from the number of layers shown in
A selected run time or run period is used for applying the same input data to the neural network pipeline to reduce forward propagation noise. In the examples herein, the quantity is used to generalize the number of sequential inputs or ‘runs’ of the same input data to the artificial neural network pipeline. The number ‘k’ can be selected based on a desired accuracy or based on expected forward propagation noise levels.
Graph 610 of
This specific test scenario of graph 610 is run on a one layer fully connected neural network for MNIST classification. The neural network is run ‘k’ times to take the average of log it vector before the softmax layer. By using only one run per image, severe inference accuracy degradation can be seen at large noise, indicated by the first data point in the plot of graph 610. By using more runs per image with the pipelining approach, accuracy can be improved as seen. After using the pipelining approach by running each input image for ‘k’ (k>1) times, classification accuracy improves quickly when ‘k’ (number of runs) increases.
As mentioned above for graph 600, test predictions can be conducted using different levels of Gaussian noise that is added during forward propagation at inference. The weights are trained offline and have been programmed into the memory array before running the inference. Graph 600 shows a decreasing trend in classification accuracy with increasing noise level. In graph 600, the number of inference runs for a particular input value is fixed at 1. However, graph 601 illustrates results from using the pipelining approach discussed herein. After using the pipelining approach by running each input image for k (k>1) times, classification accuracy improves quickly when k (number of runs) increases. Graph 601 indicates that k=5 is sufficient to bring the accuracy up to a desired value, although a greater number of runs can be used for greater accuracy.
By using the pipelining approach, each instance of image data is run on the network for ‘k’ (k>1) times, and the noise effect can be statistically minimized by averaging results from the ‘k’ runs. This pipelining approach can increase the classification accuracy while keeping the inference run time relatively short. As a further example of pipelining, inference processes run a deep neural network for ‘k’ (k>1) times for each input image to average out the noisy log it vectors before feeding it into a final softmax layer to get the final prediction probability.
For example, when ‘k’ is selected as 6, then six cycles of data introduction occur for the same input data to the input layer. Four (4) hidden layers might be employed in this example as a value for ‘n’. The same input data is introduced on successive cycles of the neural network pipeline, namely six times in this example. As the data propagates through the four hidden layers of the neural network pipeline, eventually six different output values are obtained at an output layer. This timeframe is approximately 6±4=10 cycles or time steps (i.e. ˜k+n).
The six different output values can vary among each other, even though same input data is introduced to the neural network pipeline. This variation can be due to forward propagation noise inherent in the neural network, among other noise sources. The output values are buffered or otherwise held until 10 cycles are completed through the neural network pipeline. Once all 10 cycles have been completed, then the six different output values are averaged together to form a result. This result comprises an average output value from the six output values, which arise from the same input data introduced into the neural network pipeline.
Thus, instead of transmitting the individual output values as independent results, as done in many computing pipelines, the neural network pipeline discussed herein combines many output values over a selected ‘k’ number of runs, such as the six output values mentioned above. An averaging function or other mathematical function can be applied across all of the output values for the same input data to establish a result. Forward propagation noise is advantageously, reduced in the result.
Many modern machine learning hardware applications focus on inference, or edge machines, and many networks are trained offline in GPUs or TPUs. Advantageously, the examples herein can use pipelining to tolerate forward propagation noise during inference. During inference, the examples herein run a deep neural network for k>1) times for each input image to average out the noisy log it vectors before feeding it into the final softmax layer to get the final prediction probability. To reduce total run time, a pipelining approach is presented. Suppose we have ‘n’ hidden layers in our deep neural network, and we need n+1 time steps to finish one run for one input image. So to save the total run time to run each input image for ‘k’ times with the pipelining approach, we start executing the (m−1)th layer of the (r+1)th run while running the mth layer of the rth run, Under this pipelining scenario, we only need n+k time steps to run an input image on a deep neural network with ‘n’ hidden layers for ‘k’ times.
As discussed herein, implementing machine learning in hardware can expect many noise sources coming from the circuits or the devices. The examples herein can relate to the forward propagation noise that can be caused by periphery circuitry. For training of neural networks, weight update noise can be tolerated better than the forward propagation noise. Both noise types can harm the training performance to some significant extent. Inference comprises using a trained neural network to determine predictions based on input data. The pipelined approach is presented herein to address and forward propagation noise issue for at least inference operations.
The examples herein discuss various example structures, arrangements, configurations, and operations for enhanced artificial neural networks and associated artificial neuron circuits. One example arrangement comprises using pipelining to reduce forward propagation noise. During inference operations, a pipelined neural network is run for times for each input image to average out forward propagation noise before feeding it into a final layer to get a final prediction probability. Specifically, an average is taken of log it vectors presented at an output layer of the neural network to get a final prediction probability with reduced forward propagation noise.
Another example arrangement comprises using a pipelined neural network to increase result generation speeds. Without enhanced pipelining, a fully-connected neural network with ‘n’ hidden layers might need (n+2)*k time steps to run, where the neural network is run for ‘k’ times for each input image. Advantageously, with the enhanced pipelining, the neural network needs only about (n+k) time steps to run. Thus, the pipelined neural networks herein are run for multiple times for one input image, and pipelining is used to reduce the total run time.
In one example implementation, a circuit comprising a feedforward artificial neural network is provided. This feedforward artificial neural network comprises an input layer, output layer, and ‘n’ hidden layers between the input layer and output layer. An input circuit is configured to introduce input data to the input layer for propagation through at least the ‘n’ hidden layers. An output circuit configured to calculate an average of ‘k’ log it vectors presented at the output layer for the input data to produce a result. The input circuit can be further configured to introduce the input data to the input layer ‘k’ number of iterations, where each iteration of the ‘k’ number of iterations comprises waiting until a previous introduction of the input data has propagated through at least one of the ‘n’ hidden layers. Moreover a method of operating the example circuit can be provided. The method includes running the feedforward artificial neural network for the ‘k’ number of iterations with the input data, and averaging the ‘k’ log it vectors presented at the output layer resultant from the input data to reduce forward propagation noise associated with processing the input data with the feedforward artificial neural network.
In another example implementation, a circuit comprising a feedforward artificial neural network comprises an input layer, an output layer, and one or more hidden layers between the input layer and output layer. An input control circuit is configured to introduce iterations of an input value to the input layer for propagation through at least the one or more hidden layers. An output control circuit configured to calculate an average of output values presented at the output layer from the iterations of the input value to produce a result. The circuit comprising the input control circuit can be configured to introduce the input value to the input layer a target quantity of iterations, where each iteration of the target quantity of iterations comprises waiting until a previous introduction of the input value has propagated through at least one hidden layer.
The example circuit can further comprise a memory element coupled to the output layer that is configured to store at least the output values from the target quantity of iterations for calculation of the average. The circuit comprising the input control circuit configured to select the target quantity of iterations to reduce forward propagation noise in the result to reach a target inference accuracy in the result.
In yet another example implementation, an artificial neural network is presented. This artificial neural network comprises a means for pipelining a target quantity of instances of a same input value through at least a hidden layer of the artificial neural network, a means for producing a series of output values resultant from the target quantity of instances of the same input value propagating through at least the hidden layer, and a means for applying a propagation noise reduction function across the series of output values to determine a result. The artificial neural network can also comprise means for selecting the target quantity to mitigate forward propagation noise of the artificial neural network and reach a target inference accuracy in the result. The propagation noise reduction function might comprise an averaging function.
Examples of computing system 701 include, but are not limited to, computers, smartphones, tablet computing devices, laptops, desktop computers, hybrid computers, rack servers, web servers, cloud computing platforms, cloud computing systems, distributed computing systems, software-defined networking systems, and data center equipment, as well as any other type of physical or virtual machine, and other computing systems and devices, as well as any variation or combination thereof.
Computing system 701 may be implemented as a single apparatus, system, or device or may be implemented in a distributed manner as multiple apparatuses, systems, or devices. Computing system 701 includes, but is not limited to, processing system 702, storage system 703, software 705, communication interface system 707, and user interface system 708. Processing system 702 is operatively coupled with storage system 703, communication interface system 707, and user interface system 708.
Processing system 702 loads and executes software 705 from storage system 703. Software 705 includes artificial neural network (ANN) environment 720, which is representative of the processes discussed with respect to the preceding Figures. When executed by processing system 702 to implement and enhance ANN operations, software 705 directs processing system 702 to operate as described herein for at least the various processes, operational scenarios, and sequences discussed in the foregoing implementations. Computing system 701 may optionally include additional devices, features, or functionality not discussed for purposes of brevity.
Referring still to
Storage system 703 may comprise any computer readable storage media readable by processing system 702 and capable of storing software 705, and capable of optionally storing synaptic weights 710. Storage system 703 may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of storage media include random access memory, read only memory, magnetic disks, optical disks, flash memory, virtual memory and non-virtual memory, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, resistive storage devices, magnetic random access memory devices, phase change memory devices, or any other suitable non-transitory storage media.
In addition to computer readable storage media, in some implementations storage system 703 may also include computer readable communication media over which at least some of software 705 may be communicated internally or externally. Storage system 703 may be implemented as a single storage device, but may also be implemented across multiple storage devices or sub-systems co-located or distributed relative to each other. Storage system 703 may comprise additional elements, such as a controller, capable of communicating with processing system 702 or possibly other systems.
Software 705 may be implemented in program instructions and among other functions may, when executed by processing system 702, direct processing system 702 to operate as described with respect to the various operational scenarios, sequences, and processes illustrated herein. For example, software 705 may include program instructions for enhanced pipelined ANN operations using multiple instances of input data to reduce noise in results of the ANN, among other operations.
In particular, the program instructions may include various components or modules that cooperate or otherwise interact to carry out the various processes and operational scenarios described herein. The various components or modules may be embodied in compiled or interpreted instructions, or in some other variation or combination of instructions. The various components or modules may be executed in a synchronous or asynchronous manner, serially or in parallel, in a single threaded environment or multi-threaded, or in accordance with any other suitable execution paradigm, variation, or combination thereof. Software 705 may include additional processes, programs, or components, such as operating system software or other application software, in addition to or that include ANN environment 720. Software 705 may also comprise firmware or some other form of machine-readable processing instructions executable by processing system 702.
In general, software 705 may, when loaded into processing system 702 and executed, transform a suitable apparatus, system, or device (of which computing system 701 is representative) overall from a general-purpose computing system into a special-purpose computing system customized to facilitate enhanced pipelined ANN operations using multiple instances of input data to reduce noise in results of the ANN. Indeed, encoding software 705 on storage system 703 may transform the physical structure of storage system 703. The specific transformation of the physical structure may depend on various factors in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the storage media of storage system 703 and whether the computer-storage media are characterized as primary or secondary storage, as well as other factors.
For example, if the computer readable storage media are implemented as semiconductor-based memory, software 705 may transform the physical state of the semiconductor memory when the program instructions are encoded therein, such as by transforming the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory. A similar transformation may occur with respect to magnetic or optical media. Other transformations of physical media are possible without departing from the scope of the present description, with the foregoing examples provided only to facilitate the present discussion.
ANN environment 720 includes one or more software elements, such as OS 721 and applications 722. These elements can describe various portions of computing system 701 with which elements of artificial neural networks or external systems can interface or interact. For example, OS 721 can provide a software platform on which application 722 is executed and allows for enhanced pipelined ANN operations using multiple instances of input data to reduce noise in results of the ANN.
In one example, NVM array service 724 implements and executes training operations of ANNs to determine synaptic weights for artificial neurons. NVM array service 724 can interface with NVM elements to load and store synaptic weights for use in inference operations. Moreover, NVM array service 724 can establish layers among NVM elements to implement layers and nodes of an ANN, such as by controlling interconnect circuitry. In further examples, NVM array service 724 receives intermediate values from intermediate or hidden layers and provides these intermediate values to subsequent layers.
In another example, ANN pipelining service 725 controls operation of a pipelined ANN as described herein. For example, ANN pipelining service 725 can implement one or more activation functions for layers of an ANN. ANN pipelining service 725 can also buffer output values after inference of individual input values to a pipelined ANN. ANN pipelining, service 725 can apply one or more noise reduction functions, such as averaging functions, to the buffered output values to produce noise-reduced results. ANN pipelining service 725 can implement softmax layers or softmax functions as well. Moreover, ANN pipelining service 725 can determine thresholds for noise levels based on target quantities of iterations for input values introduced to an ANN. ANN pipelining service 725 can also receive input values from one or more external systems for introduction to a pipelined ANN, and provide noise-reduced results to the one or more external systems.
Communication interface system 707 may include communication connections and devices that allow for communication with other computing systems (not shown) over communication networks (not shown). Communication interface system 707 might also communicate with portions of hardware-implemented ANNs, such as with layers of ANNs, NVM-implemented weight arrays, or other ANN circuitry, Examples of connections and devices that together allow for inter-system communication may include NVM memory interfaces, network interface cards, antennas, power amplifiers, RF circuitry, transceivers, and other communication circuitry. The connections and devices may communicate over communication media to exchange communications or data with other computing systems or networks of systems, such as metal, glass, air, or any other suitable communication media.
User interface system 708 is optional and may include a keyboard, a mouse, a voice input device, a touch input device for receiving input from a user. Output devices such as a display, speakers, web interfaces, terminal interfaces, and other types of output devices may also be included in user interface system 708. User interface system 708 can provide output and receive input over a data interface or network interface, such as communication interface system 707. User interface system 708 may also include associated user interface software executable by processing system 702 in support of the various user input and output devices discussed above. Separately or in conjunction with each other and other hardware and software elements, the user interface software and user interface devices may support a graphical user interface, a natural user interface, or any other type of user interface.
Communication between computing system 701 and other computing systems (not shown may occur over a communication network or networks and in accordance with various communication protocols, combinations of protocols, or variations thereof. Examples include intranets, internets, the Internet, local area networks, wide area networks, wireless networks, wired networks, virtual networks, software defined networks, data center buses, computing backplanes, or any other type of network, combination of network, or variation thereof. The aforementioned communication networks and protocols are well known and need not be discussed at length here. However, some communication protocols that may be used include, but are not limited to, the Internet protocol (IP, IPv4, IPv6, etc.), the transmission control protocol (TCP), and the user datagram protocol (UDP), as well as any other suitable communication protocol, variation, or combination thereof.
The included descriptions and figures depict specific embodiments to teach those skilled in the art how to make and use the best mode. For the purpose of teaching inventive principles, some conventional aspects have been simplified or omitted. Those skilled in the art will appreciate variations from these embodiments that fall within the scope of the disclosure. Those skilled in the art will also appreciate that the features described above can be combined in various ways to form multiple embodiments. As a result, the invention is not limited to the specific embodiments described above, but only by the claims and their equivalents.
This application hereby claims the benefit of and priority to U.S. Provisional Patent Application 62/693,615, titled “USE OF PIPELINING TO IMPROVE NEURAL NETWORK INFERENCE ACCURACY,” filed Jul. 3, 2018, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62693615 | Jul 2018 | US |