This disclosure relates generally to analog resistive processing systems for neuromorphic computing, and techniques for calibrating computations performed on analog resistive processing systems. Information processing systems such as neuromorphic computing systems and artificial neural network systems are utilized in various applications such as machine learning and inference processing for cognitive recognition and computing. Such systems are hardware-based systems that generally include a large number of highly interconnected processing elements (referred to as “artificial neurons”) which operate in parallel to perform various types of computations. The artificial neurons (e.g., pre-synaptic neurons and post-synaptic neurons) are connected using artificial synaptic devices which provide synaptic weights that represent connection strengths between the artificial neurons. The synaptic weights can be implemented using an array of resistive processing unit (RPU) cells having tunable resistive memory devices (e.g., tunable conductance), wherein the conductance states of the RPU cells are encoded or otherwise mapped to the synaptic weights.
Exemplary embodiments of the disclosure provide techniques for automatically calibrating matrix-vector operations performed on a resistive processing unit system. In an exemplary embodiment, a system comprises a processor, and a resistive processing resistive processing unit coupled to the processor. The resistive processing unit comprises an array of cells, wherein the cells respectively comprise resistive memory devices, wherein at least a portion of the resistive memory devices are programmable to store weight values of a given matrix in the array of cells. The processor is configured to store the given matrix in the array of cells of the resistive processing unit, and perform a calibration process to generate a first set of calibration parameters for calibrating forward pass matrix-vector multiplication operations performed on the stored matrix in the array of cells of the resistive processing unit, and a second set of calibration parameters for calibrating backward pass matrix-vector multiplication operations performed on a transpose of the stored matrix in the array of cells of the resistive processing unit.
Other embodiments will be described in the following detailed description of exemplary embodiments, which is to be read in conjunction with the accompanying figures.
Embodiments of the disclosure will now be described in further detail with regard to systems and methods for automatically calibrating matrix-vector operations performed on a resistive processing unit system. It is to be understood that the various features shown in the accompanying drawings are schematic illustrations that are not drawn to scale. Moreover, the same or similar reference numbers are used throughout the drawings to denote the same or similar features, elements, or structures, and thus, a detailed explanation of the same or similar features, elements, or structures will not be repeated for each of the drawings. Further, the term “exemplary” as used herein means “serving as an example, instance, or illustration.” Any embodiment or design described herein as “exemplary” is not to be construed as preferred or advantageous over other embodiments or designs.
Further, it is to be understood that the phrase “configured to” as used in conjunction with a circuit, structure, element, component, or the like, performing one or more functions or otherwise providing some functionality, is intended to encompass embodiments wherein the circuit, structure, element, component, or the like, is implemented in hardware, software, and/or combinations thereof, and in implementations that comprise hardware, wherein the hardware may comprise discrete circuit elements (e.g., transistors, inverters, etc.), programmable elements (e.g., application specific integrated circuit (ASIC) chips, field-programmable gate array (FPGA) chips, etc.), processing devices (e.g., central processing units (CPUs), graphics processing units (GPUs), etc.), one or more integrated circuits, and/or combinations thereof. Thus, by way of example only, when a circuit, structure, element, component, etc., is defined to be configured to provide a specific functionality, it is intended to cover, but not be limited to, embodiments where the circuit, structure, element, component, etc., is comprised of elements, processing devices, and/or integrated circuits that enable it to perform the specific functionality when in an operational state (e.g., connected or otherwise deployed in a system, powered on, receiving an input, and/or producing an output), as well as cover embodiments when the circuit, structure, element, component, etc., is in a non-operational state (e.g., not connected nor otherwise deployed in a system, not powered on, not receiving an input, and/or not producing an output) or in a partial operational state.
In general, the artificial neural network 124 comprises a plurality of layers which comprise the artificial neurons 126, wherein the layers include an input layer, an output layer, and one or more hidden model layers between the input and output layers. Each layer is connected to another layer using an array of artificial synaptic devices which provide synaptic weights that represent connection strengths between artificial neurons in one layer with the artificial neurons in another layer. The input layer of the artificial neural network 124 comprises artificial input neurons, which receive initial data that is input to the artificial neural network for further processing by subsequent hidden model layers of artificial neurons. The hidden layers perform various computations, depending on type and framework of the artificial neural network 124. The output layer (e.g., classification layer) implements an activation function and produces the classification/predication results for given inputs.
More specifically, depending on the type of artificial neural network, the layers of the artificial neural network 124 can include functional layers including, but not limited to, fully connected layers, activation layers, convolutional layers, pooling layers, normalization layers, etc. As is known in the art, a fully connected layer in a neural network is a layer in which all the inputs from the layer are connected to every activation unit of the next layer. An activation layer in a neural network comprises activation functions which define how a weighted sum of an input is transformed into an output from a node or nodes in a layer of the network. For example, activation functions include, but are not limited to, a rectifier or ReLU activation function, a sigmoid activation function, a hyperbolic tangent (tanH) activation function, a softmax activation function, etc.
In some embodiments, the digital processing system 110 performs various methods through execution of program code by the processors 112. The processors 112 may include various types of processors that perform processing functions based on software, hardware, firmware, etc. For example, the processors 112 may comprise any number and combination of CPUs, ASICs, FPGAs, GPUs, Microprocessing Units (MPUs), deep learning accelerator (DLA), artificial intelligence (AI) accelerators, and other types of specialized processors or coprocessors that are configured to execute one or more fixed functions. The digital processing system 110 executes various processes including, but not limited to, an autocalibration process 130 (which comprises a weight extraction process 132 and a calibration parameters computation process 134), an artificial neural network configuration process 136, and an artificial neural network training process 138.
The autocalibration process 130 implements methods that are configured to automatically perform a calibration process to generate (i) a first set of calibration parameters for calibrating forward pass matrix-vector multiplication operations performed on a stored matrix in an RPU array, and (ii) a second set of calibration parameters for calibrating backward pass matrix-vector multiplication operations performed on a transpose of the stored matrix in the RPU array. The first and second sets of calibration parameters (alternatively, correction parameters) are applied to forward pass and backward pass matrix-vector multiplications performed by RPU arrays during neural network training operations. The first set of calibration parameters comprises a first set of offset correction parameters, and a first set of scaling correction parameters. The second set of calibration parameters comprises a second set of offset correction parameters, and a second set of scaling correction parameters. The first set and the second set of calibration parameters are utilized to ensure that the encoded weights of a given RPU array are the same or substantially the same for the forward and backward pass training operations despite the existence of non-idealities (e.g., hardware offsets and mismatches) of the RPU system hardware, which would otherwise result in disparities between the encoded weights of a given RPU array for the forward and backward pass training operations.
In some embodiments, the calibration parameters are automatically determined by performing the weight extraction process 132 and the calibration parameters computation process 134. The weight extraction process 132 implements methods that are configured to enable accurate extraction of weight values of a given weight matrix W stored in a given RPU array, despite the non-idealities of the RPU system hardware. More specifically, in some embodiments, the weight extraction process 132 is configured to (i) perform a first weight extraction process to extract an effective forward weight matrix (denoted herein as WF), and (ii) perform a second weight extraction process to extract an effective backward weight matrix (denoted herein as WB). The forward and backward weight matrices WF and WB are utilized to determine the first set and the second set of calibration parameters for calibrating forward and backward pass matrix-vector multiplications performed on the given RPU array.
As explained in further detail below, such weight extraction techniques are configured to compute a matrix of effective forward and backward weight values from the RPU hardware, which correspond to the stored weight matrix values of the matrix W and the transpose matrix WT, wherein the computation of the effective forward and backward weight values is configured to compensate for non-idealities associated with the RPU hardware. In effect, the effective forward and backward weight values WF and WB characterize the effective behavior of the RPU hardware with respect to, e.g., forward pass and backward pass matrix-vector multiplication operations performed by the RPU hardware on a stored weight matrix W and corresponding transpose matrix WT in the given RPU array. Exemplary modes of operation of the weight extraction process 132 will be discussed in further detail below in conjunction with, e.g.,
In some embodiments, as explained in further detail below, the weight extraction process 132 is configured to compute a set of offset correction parameters for forward operations performed on the given RPU array (denoted herein as OF), and a set of offset correction parameters for backward operations performed on the given RPU array (denoted herein as OB). In addition, the calibration parameters computation process 134 utilizes the effective forward and backward weight matrices WF and WB, which are computed by the weight extraction process 132, to automatically determine scaling calibration parameters. As explained in further detail below, in some embodiments, the calibration parameters computation process 134 implements a multivariate linear regression optimization to compute: SF WF−WB SB=0, where SF and SB each comprise a diagonal matrix (more generally, a scaling matrix). The scaling matrix SF comprises computed scaling correction parameters which are applied to matrix-vector computations in the forward pass directions, and scaling matrix SB comprises computed scaling correction parameters which are applied to matrix-vector computations in the backward pass directions. Exemplary modes of operation of the calibration parameters computation process 134 will be discussed in further detail below in conjunction with, e.g.,
The artificial neural network configuration process 136 implements methods for configuring the neural cores 122 of the neuromorphic computing system 120 to implement an architecture of an artificial neural network in RPU hardware, which is trained by executing the artificial neural network training process 138. For example, in some embodiments, the artificial neural network configuration process 136 includes methods for configuring the neuromorphic computing system 120 (e.g., RPU system) to perform hardware accelerated computation operations that will be needed to perform a model training process (e.g., the backpropagation process. For example, in some embodiments, the artificial neural network configuration process 136 communicates with a programming interface of the neuromorphic computing system 120 to configure one or more artificial neurons and a routing system of the neuromorphic computing system 120 to allocate and configure one or more neural cores to (i) implement one or more interconnected RPU arrays for storing initial weight matrices, and to (ii) perform in-memory computations (e.g., matrix-vector computations, outer product computations, etc.) needed to implement the training process. Furthermore, in some embodiments, the autocalibration process 130 is configured to operate in conjunction with the artificial neural network configuration process 136 to configure the RPU system to apply the offset correction parameters and the scaling correction parameters, which were computed by the autocalibration process 130, for calibrating forward pass and backward pass matrix-vector multiplication operations that are performed by the RPU system during the training process. The type of training process that is implemented depends on the type and size of the artificial neural network to be trained. Model training methods generally include data parallel training methods (data parallelism) and model parallel training methods (model parallelism), which can be implemented at least in part in the analog domain using a network of interconnected RPU compute nodes.
In some embodiments, the artificial neural network training process 138 implements a backpropagation process for training an artificial neural network. As is known in the art, the backpropagation process comprises three repeating processes including (i) a forward process, (ii) a backward process, and (iii) a model parameter update process. During the digital training process, training data are randomly sampled into mini-batches, and the mini-batches are input to the model to traverse the model in two phases: forward and backward passes. The forward pass generates predictions and calculates errors between the predictions and the ground truth. The backward pass backpropagates errors through the model to obtain gradients to update model weights. The forward and backward cycles mainly involve performing matrix-vector multiplication operations in forward and backward directions. The weight update involves performing incremental weight updates for weight values of the synaptic weight matrices of the neural network model being trained. The processing of a given mini-batch via the forward and backward phases is referred to as an iteration, and an epoch is defined as performing the forward-backward pass through an entire training dataset. The training process iterates multiple epochs until the model converges to a convergence criterion. In some embodiments, a stochastic gradient descent (SGD) process is utilized to train artificial neural networks using the backpropagation method in which an error gradient with respect to each model parameter (e.g., weight) is calculated using the backpropagation algorithm.
In some embodiments, the computing system 100 is implemented using an RPU computing system, an exemplary embodiment of which is shown in
The RPU system 300 further comprises peripheral circuitry 320 coupled to the row control lines RL1, RL2, . . . , RLm, as well as peripheral circuitry 330 coupled to the column control lines CL1, CL2, . . . , CLn. More specifically, the peripheral circuitry 320 comprises blocks of peripheral circuitry 320-1, 320-2, . . . , 320-m (collectively peripheral circuitry 320) connected to respective row control lines RL1, RL2, . . . , RLm, and the peripheral circuitry 330 comprises blocks of peripheral circuitry 330-1, 330-2, . . . , 330-n (collectively, peripheral circuitry 330) connected to respective column control lines CL1, CL2, . . . , CLn. Further, each block of peripheral circuitry 320-1, 320-2, . . . , 320-m is connected to data input/output (I/O) interface circuitry 325, and each block of peripheral circuitry 330-1, 330-2, . . . , 330-n is connected to data I/O interface circuitry 335. The RPU system 300 further comprises control signal circuitry 340 which comprises various types of circuit blocks such as power, clock, bias and timing circuitry to provide power distribution and control signals and clocking signals for operation of the peripheral circuitry 320 and 330 of the RPU system 300. While the row control lines RL and column control lines CL are each shown in
In some embodiments, each RPU cell 310 in the RPU system 300 comprises a resistive memory element with a tunable conductance. For example, the resistive memory elements of the RPU cells 310 can be implemented using resistive devices such as resistive switching devices (interfacial or filamentary switching devices), ReRAM, memristor devices, phase change memory (PCM) devices, and other types of resistive memory devices having a tunable conductance (or tunable resistance level) which can be programmatically adjusted within a range of a plurality of different conductance levels to tune the values (e.g., matrix values, synaptic weights, etc.) of the RPU cells 310. In some embodiments, the variable conductance elements of the RPU cells 310 can be implemented using ferroelectric devices such as ferroelectric field-effect transistor devices. Furthermore, in some embodiments, the RPU cells 310 can be implemented using an analog CMOS-based framework in which each RPU cell 310 comprises a capacitor and a read transistor. With the analog CMOS-based framework, the capacitor serves as a memory element of the RPU cell 310 and stores a weight value in the form a capacitor voltage, and the capacitor voltage is applied to a gate terminal of the read transistor to modulate a channel resistance of the read transistor based on the level of the capacitor voltage, wherein the channel resistance of the read transistor represents the conductance of the RPU cell and is correlated to a level of a read current that is generated based on the channel resistance.
For certain applications, some or all of the RPU cells 310 within the RPU array 305 comprise respective conductance values that are mapped to respective numerical matrix values of a given matrix W (e.g., computational matrix or synaptic weight matrix, etc.) that is stored in the RPU array 305. For example, for an artificial neural network application, some or all of the RPU cells 310 with the RPU array 305 serve as artificial synaptic devices that are encoded with synaptic weights of a synaptic array which connects two layers of artificial neurons of the artificial neural network. More specifically, in an exemplary embodiment, the RPU array 305 comprises an array of artificial synaptic devices which connect artificial pre-synaptic neurons (e.g., artificial neurons of an input layer or hidden layer of the artificial neural network) and artificial post-synaptic neurons (e.g., artificial neuron of a hidden layer or output layer of the artificial neural network), wherein the artificial synaptic devices provide synaptic weights that represent connection strengths between the pre-synaptic and post-synaptic neurons. As shown in
The peripheral circuitry 320 and 330 comprises various circuit blocks that are configured to perform functions such as, e.g., programming the conductance values of the RPU cells 310 to store encoded values (e.g., matrix values, synaptic weights, etc.), reading the programmed states of the RPU cells 310, and performing functions to support analog, in-memory computation operations such as matrix-vector multiply functions, matrix-matrix multiply functions, outer product update operations, etc., as discussed herein. For example, in some embodiments, each block of peripheral circuitry 320-1, 320-2, . . . , 320-m comprises corresponding pulse-width modulation (PWM) circuitry and associated driver circuitry, and readout circuitry for each row of RPU cells 310 of the RPU array 305. Similarly, each block of peripheral circuitry 330-1, 330-2, . . . , 330-n comprises corresponding PWM circuitry and associated driver circuitry, and readout circuitry for each column of RPU cells 310 of the RPU array 305.
The PWM circuitry and associated pulse driver circuitry of the peripheral circuitry 320 and 330 is configured to generate and apply PWM read pulses to the rows and columns of the array of RPU cells 310 in response to digital input vector values (read input values) that are received during different operations (e.g., forward pass and backward pass training operations). In some embodiments, the PWM circuitry implements digital-to-analog (D/A) converter circuitry which is configured to receive a digital input vector (to be applied to rows or columns) and convert the elements of the digital input vector into analog input vector values that are represented by input voltage voltages of varying pulse width. In some embodiments, a time-encoding scheme is used when input vectors are represented by fixed amplitude Vin=1 V pulses with a tunable duration (e.g., pulse duration is a multiple of 1 ns and is proportional to the value of the input vector). The input voltages applied to rows (or columns) generate output vector values on the columns (or rows) which are represented by output currents, wherein the output currents are processed by the readout circuitry.
For example, in some embodiments, the readout circuitry of the peripheral circuitry 320 and 330 comprises current integrator circuitry and analog-to-digital (A/D) converter circuitry to integrate read currents (IREAD) which are output and accumulated from the rows and columns of connected RPU cells 310 and convert the integrated currents into digital values (read output values) for subsequent computation. In particular, the currents generated by the RPU cells 310 are summed on the columns (or rows) and the summed current is integrated over a measurement time, tmeas, by the readout circuitry of the peripheral circuitry 320 and 330. In some embodiments, each current integrator comprises an operational amplifier that integrates the current output from a given column (or row) (or differential currents from pairs of RPU cells implementing negative and positive weights) on a capacitor, and an analog-to-digital (A/D) converter that converts the integrated current (e.g., an analog value) to a digital value.
The data I/O interface circuitry 325 and 335 are configured to interface with digital processing cores, wherein the digital processing cores are configured to process digital I/O vectors to the RPU system 300 and route data between different RPU arrays. The data I/O interface circuitry 325 and 335 are configured to receive external control signals and data from digital processing cores and provide the received control signals and data to the peripheral circuitry 320 and 330, receive digital read output values from peripheral circuitry 320 and 330, and send the digital read output values to a digital processing core for processing. In some embodiments, the digital processing cores implement non-linear function circuitry which calculates activation functions (e.g., sigmoid neuron function, softmax, etc.) and other arithmetical operations on data that is to be provided to a next or previous layer of an artificial neural network.
In some embodiments, the RPU system 300 comprises noise and bound management circuitry which is configured to dynamically condition (e.g., via scaling) input vectors and output vectors to overcome issues related to noise and signal saturation when performing analog matrix-vector multiplication operations on an RPU array. In some embodiments, the data I/O interface circuitry 325 and 335 implement noise and bound management circuitry. For example, an input vector having digital values which are relatively small can be scaled up by the noise and bound management circuitry before performing a matrix-vector multiplication operation on the RPU array. The scaling up of the input vector values prevents the output signals that are generated as a result of vector-matrix multiplication operation from being too small and not readily detectable or quantizable in instances where the readout circuitry is configured with an output signal bound (e.g., operating signal range) which is not optimal for processing small signals outside the operating signal range. For instance, the output signal bound is a result of the current integrator circuits of the readout circuitry having fixed size integration capacitors, or the ADC circuits of the readout circuitry having a fixed ADC resolution, etc. In such instances, the analog output signals that are relatively small (e.g., close to zero) will be quantized to zero because of the finite ADC resolution.
Moreover, an input vector having digital values which are relatively large can be scaled down by the noise and bound management circuitry before performing a matrix-vector multiplication operation on the RPU array. The scaling down of the digital values of the input vector prevents saturation of the readout circuitry. In particular, the output signals generated by the matrix-vector multiplication operations include analog voltages which are bounded by signal range limits imposed by the readout circuitry. In particular, the readout circuitry is bounded in a given signal range, −β, . . . , β, as a result of (i) a saturation voltage of the operational amplifiers of the current integrator circuits (wherein a gain of the current integrator circuits is based on the size of the integration capacitors), and/or (ii) the ADC resolution and/or gain of the ADC circuits of the readout circuitry. In this regard, scaling down the values of the input digital input signals can prevent saturation of the readout circuitry by ensuring that matrix-vector compute results of the RPU system are within the range of an acceptable voltage swing, thus overcoming the bound problem.
In some RPU configurations, the noise and bound management circuitry implements dynamic schemes in which input and output scaling parameters are dynamically computed, during runtime, based on, e.g., maximum values of the digital input vectors. In some embodiments, the noise and bound management circuitry implements the dynamic schemes disclosed in U.S. Ser. No. 15/838,992, filed Dec. 12, 2017, entitled “Noise and Bound Management for RPU Array,” which is now U.S. Pat. No. 10,360,283, which is commonly assigned, and the disclosure of which is incorporated herein by reference. Such dynamic schemes are typically used in instances where the analog RPU system is configured to analog computations that are needed for training an artificial neural network, wherein the input vectors for forward pass operations can be relatively large, and wherein the input error vectors for backward pass operations can be relatively small.
For training an artificial neural network using RPU hardware, the RPU system 300 can be configured to perform a backpropagation training process which, as noted above, includes multiple iterations of (i) a forward pass operation, (ii) a backward pass operation, and (iii) a synaptic weight update operation. During the training process, batches of training data are input to the artificial neural network to traverse the neural network in two phases: forward and backward passes. The forward pass operation generates predictions and calculates errors between the predictions and the ground truth. The backward pass operation backpropagates errors through the model to obtain gradients to update model weights. The forward pass and backward pass operations mainly involve performing matrix-vector multiplication operations in forward and backward directions. The synaptic weight update operation involves performing incremental updates of synaptic weight values of synaptic weight matrices of the artificial neural network being trained.
Exemplary methods for configuring the RPU system 300 to perform forward pass and backward pass operations for training an artificial neural network will now be discussed in further detail with regard to the exemplary embodiments of
As shown in
As collectively shown in
As further schematically shown in
The peripheral circuitry 420 and 430 comprises switching circuitry (not specifically shown in
In the exemplary configuration of
More specifically, in some embodiments, as noted above, the column DAC circuits 432-1, 432-2, . . . , 432-n are configured to perform a digital-to-analog conversion process using a time-encoding scheme where the elements x1, x2, . . . , xn of the input vector x are represented by fixed amplitude pulses (e.g., V=1V) with a tunable duration, wherein the pulse duration is a multiple of a prespecified time period (e.g., 1 nanosecond) and is proportional to the value of the elements x1, x2, . . . , xn of the input vector x. For example, a given digital input value of 0.5 can be represented by a voltage pulse of 4 ns, while a digital input value of 1 can be represented by a voltage pulse of 80 ns (e.g., a digital input value of 1 can be encoded to an analog voltage pulse with a pulse duration that is equal to the integration time Tmeas of the readout circuitry).
To perform a matrix-vector multiplication, the analog input voltages V1, V2, . . . , Vn (e.g., pulses), are applied to the column lines C1, C2, . . . , Cn, wherein each RPU cell 410 generates a corresponding read current IREAD=Vj×Gij (based on Ohm's law), wherein Vj denotes the analog input voltage applied to the given RPU cell 410 on the given column j and wherein Gij denotes the conductance value of the given RPU cell 410 (at the given row i and column j). As shown in
The resulting aggregate read currents I1, I2, . . . , Im at the output of the respective rows R1, R2, . . . , Rm are input to respective row readout circuits 424-1, 424-2, . . . , 424-m. The aggregate read currents I1, I2, . . . , Im are integrated by the respective current integrator circuits 426-1, 426-2, . . . , 426-m to generate respective output voltages, which are quantized by the respective ADC circuits 428-1, 428-2, . . . , 428-m to generate a resulting output vector y=[y1, y2, . . . , ym], which represents the result of the matrix-vector multiplication operation y=Wx (or I=GV).
The forward pass operation shown in
As data propagates forward through layers of the neural network, vector-matrix multiplications are performed, wherein the hidden neurons/nodes take the inputs, perform a non-linear transformation, and then send the results to the next weight matrix. This process continues until the data reaches an output layer of the artificial neural network comprising output neurons/nodes. The output neurons/nodes evaluate classification errors, and generate classification error signals which are propagated back through the artificial neural network using backward pass operations. The error signals can be determined as a difference between the results of the forward inference classification (estimated labels) and the correct labels at the output layer of the artificial neural network.
For example,
After the backward pass operation is completed on the given RPU array 405, a weight update process is performed to tune the conductance values of the RPU cells 410 (and thus update the weight values of the given synaptic weight matrix W) based on the forward-propagated digital vector x=[x1, x2, . . . , xn] (
In some embodiments, the weight update operation involves updating the weight matrix W in the given RPU array 405 by performing an outer product of the two vectors x=[x1, x2, . . . , xn] and xerr=[x1, x2, . . . , xn], that were applied to the RPU array 405 in the forward and the backward pass cycles. In particular, implementing the weight update for the given RPU array 405 involves performing a vector-vector outer product operation which consists of a multiplication operation and an incremental weight update to be performed locally in each RPU cell 410, i.e., wij←wij+ηxi×xerr_j, where wij represents the weight value for the ith row and the jth column (for simplicity layer index is omitted), where xi is the activity at the input neuron (ith row), xerr_j is the error computed by the output neuron (and input to the jth column), and where η denotes a global learning rate. In some embodiments, to determine the product xi×xerr_j for the weight update operation, stochastic translator circuitry in the peripheral circuitry 420 and 430 can be utilized to generate stochastic bit streams that represent the input signals xi and xerr_j. The stochastic bits streams for the input signals xi and xerr_j are applied to the rows and columns of the RPU cells 410 in the RPU array, wherein the conductance of a given RPU cell 410 will change depending on the coincidence of the xi and xerr_j stochastic pulse streams input to the given RPU cell 410. The vector cross product operations for the weight update operation are implemented based on the known concept that coincidence detection (using an AND logic gate operation) of stochastic streams representing real numbers is equivalent to a multiplication operation.
The exemplary embodiment of
While
More specifically,
Next,
More specifically, in the exemplary embodiment of
A shown in
In some embodiments where complex matrices are implemented (e.g., a complex matrix which comprises a real part and an imaginary part), the RPU framework of
As noted above, exemplary embodiments of the disclosure comprise automated calibration techniques which are configured to determine correction parameters that are applied to analog matrix-vector multiplication operations for forward pass and backward pass operations. The correction parameters serve to compensate for differences in actual effective weight values that are realized in the forward and backward pass operations as a result of offsets and mismatches introduced by the RPU hardware when performing the analog matrix-vector operations. Such automated calibration methods take into consideration that while the conductance values of the RPU cells of a given RPU array can be programmed to encode weight values of a weight matrix W that is stored in the RPU array, the actual effective weight values of the stored weight matrix W (which are effectively read when performing forward pass or backward pass operations) can differ from the encoded weight values as a result of various types of offsets and mismatches, etc., of the RPU hardware (e.g., peripheral circuitry).
For example, when performing a matrix-vector multiplication operation using the RPU system 400 configured to perform a forward pass operation (as shown in
More specifically, the error component b collectively represents linear errors (e.g., offsets) associated with the RPU hardware. For example, referring to the RPU hardware shown in
Further, the error component f(x) collectively represents non-linear behaviors of the RPU hardware resulting from, e.g., degraded performance of the operational amplifiers or power supplies, non-linearities of the current mirrors, ADCs, integration capacitors, resistances, etc. The error component noise denotes cycle-to-cycle noise of the RPU hardware such as thermal noise or hardware drift, etc.
When performing a matrix-vector multiplication operation y=Wx for the forward pass operation of
In this regard, techniques that read weight values of an RPU row-by-row, or which otherwise attempt to read the actual conductance values of the RPU cells, result in the extraction of inaccurate weight values due to such error components, wherein the extracted weight values do not match the true encoded/programmed weights. In other words, the effective weight values of the weight matrix W stored in the RPU array are encoded based on the entire RPU hardware, e.g., the programmed/encoded conductance values of the RPU cells, and the various offsets and mismatches of the RPU hardware. The various offsets and mismatches of the RPU hardware (linear error components b) do not affect the actual analog matrix-vector multiplication operation y=Wx, but rather only affect the effective weight values W that are encoded by the RPU hardware as a whole.
While various techniques can be used to calibrate the RPU hardware to compensate for such linear error components b, it is extremely difficult to calibrate the RPU hardware so that the effective weight values of the weight matrix W realized in the forward pass and backward pass operations are the same. By way of example, in the exemplary embodiments of
In this regard, to calibrate the forward weights against the backward weights, an automated calibration process is performed to determine correction parameters that are used to calibrate forward weights WF against backward weights WB, which are realized for a given weight matrix W stored in a given RPU array, to thereby ensure that the forward weights WF and the backward weights WB encode the same weight matrix when performing forward and backward pass matrix-vector multiplication operations on the given RPU array.
For example, in some embodiments,
In accordance with exemplary embodiments, the weight extraction process 132 is configured to accurately extract weight values from RPU hardware despite non-idealities of the RPU hardware. In general, the weight extraction process 132 implements optimization techniques to minimize errors in the weight values of a weight matrix W, which are read from a given RPU array (which stores the weight matrix W) by utilizing a linear transformation between (i) a set of input vectors x that are applied to the given RPU array, and (ii) a corresponding set of output vectors y that are generated by the RPU hardware performing matrix-vector multiplication operations. More specifically, techniques are provided to extract effective forward weight values WF and effective backward weight values WB from the RPU hardware in which the computation of the effective forward and backward weight values WF and WB is configured to compensate/correct the non-idealities associated with the RPU hardware.
For example, in some embodiments, the effective forward and backward weight values WF and WB comprise values that minimize an objective function such as a multivariate linear regression function. In this regard, in some embodiments, the effective forward and backward weight values WF and WB of a given weight matrix W stored in an RPU array are determined by performing a multivariate linear regression computation based on (i) a set of input vectors x that are applied to a given RPU array in forward and backward directions, and (ii) a corresponding set of output vectors y that are generated by the RPU hardware performing matrix-vector multiplication operations in the forward and backward directions.
In some embodiments, the multivariate linear regression computation is configured to relate the set of input vectors x and corresponding set of resulting output vectors y to the given weight matrix W stored in an RPU array such that y=W x+b. In this regard, a multivariate linear regression computation allows for an accurate estimation of the effective forward and backward weight values WF and WB of the given weight matrix W stored in an RPU array, wherein the computation of the effective forward and backward weight values WF and WB compensates/corrects the error component b (e.g., linear offset errors) of the RPU hardware and, thus, provides a true measure of the matrix-vector multiplication performance of the RPU hardware in the forward and backward directions.
As shown in
The matrix-vector multiplication operations in the forward direction, i.e., yFi=WxFi, result in a set of vector pairs, {xFi, yFi}i=1S, comprising s pairs of vectors xFi and YFi (or s observations), which are utilized by the forward weight determination process 610 to compute a matrix of effective forward weight values WF 616 for the m×n weight matrix W stored in the matrix-vector multiplication hardware block 600. In some embodiments, the forward weight determination process 610 generates (i) a first matrix XF of size n×s in which each column of the first matrix XF comprises a corresponding one of the input vectors {xFi}i=1s and (ii) a second matrix YF of size m×s in which each column of the second matrix YF comprises a corresponding one of the resulting output vectors {yFi}i=1s.
In some embodiments, the forward weight determination process 610 computes the effective forward weight values WF of a given weight matrix W stored in the matrix-vector multiplication hardware block 600 by performing a multivariate linear regression computation based on the first matrix XF and the second matrix YF. In some embodiments, a multivariate linear regression computation is performed using an ordinary least squares (OLS) estimator process which is configured to estimate parameters in a regression model by minimizing the sum of the squared residuals, Wmin∥YF−WXF∥2.
For example, in some embodiments, when the matrix-vector multiplication hardware block 600 is configured to compute yFi=WxFi (forward direction), the forward weight determination process 610 computes the matrix of effective forward weight values WF as:
W
F=[(XFXFT)−1XFYFT]T Eqn. 1
wherein WF denotes an OLS estimator, the matrix XF comprises a matrix of regressor variables, the matrix YF comprises a matrix of values of a response variable, and wherein YFT denotes a transpose of the matrix YF. In the above exemplary embodiment, where the weight matrix W is a m×n matrix and the matrix XF is a n×s matrix, the computation of the matrix XF XFT in Eqn. 1 yields an n×n matrix. In this regard, to properly compute the inverse matrix (XF XFT)−1, the rank of the matrix XF XFT in Eqn. 1 should be equal to n, wherein the rank of a matrix is defined as the maximum number of linearly independent row vectors in the matrix.
Another factor that should be considered in Eqn. 1 for accurately computing WF is the sensitivity of WF based on the condition number of the matrix XF XFT for inversion. A condition number for a matrix and computational task measures how sensitive the resulting solution is to perturbations in the input data and to roundoff errors made during the solution process. In some embodiments, it is preferable that the condition number of the matrix XF XFT be equal to 1, or as close as possible to 1. Ideally, the matrix XF XFT will be an identity matrix I. In this regard, the matrix XF XFT should be well-conditioned in order to more accurately compute the inverse matrix (XF XFT)−1. In some embodiments, the set of input vectors xFi which make up the matrix XF can be selected to achieve a well-conditioned matrix XF XFT for inversion.
Next,
As shown in
The matrix-vector multiplication operations in the backward direction, i.e., yBi=WTxBi, result in a set of vector pairs, {xB1, yBi}i=1S, comprising s pairs of vectors xBi and yBi (or s observations), which are utilized by the backward weight determination process 620 to compute a matrix of effective backward weight values WB 626 for the m×n weight matrix W stored in the matrix-vector multiplication hardware block 600. In some embodiments, the backward weight determination process 620 generates (i) a first matrix XB of size m×s in which each column of the first matrix XB comprises a corresponding one of the input vectors {xBi}i=1s and (ii) a second matrix YB of size n×s in which each column of the second matrix YB comprises a corresponding one of the resulting output vectors {yBi}i=1s.
In some embodiments, the backward weight determination process 620 computes the effective backward weight values WB of the given weight matrix W stored in the matrix-vector multiplication hardware block 600 by performing a multivariate linear regression computation based on the first matrix XB and the second matrix YB. In some embodiments, a multivariate linear regression computation is performed using an ordinary least squares (OLS) estimator process which is configured to estimate parameters in a regression model by minimizing the sum of the squared residuals, Wmin∥YB−WTXB∥2.
For example, in some embodiments, when the matrix-vector multiplication hardware block 600 is configured to compute yBi=WTxBi (backward pass direction), the backward weight determination process 620 computes the matrix of effective backward weight values WB as:
W
B=[(XBXBT)−1XBYBT] Eqn. 2,
wherein WB denotes an OLS estimator, the matrix XB comprises a matrix of regressor variables, the matrix YB comprises a matrix of values of a response variable, and wherein YBT denotes a transpose of the matrix YB. In the above exemplary embodiment, where the transposed weight matrix WT is a n×m matrix and the matrix XB is a m×s matrix, the computation of the matrix XB XBT in Eqn. 2 yields an m×m matrix. In this regard, to properly compute the inverse matrix (XB XBT)−1, the rank of the matrix XB XBT in Eqn. 2 should be equal to m, where (as noted above) the rank of a matrix is defined as the maximum number of linearly independent row vectors in the matrix. In addition, in some embodiments, as discussed above, the set of input vectors xBi which make up the matrix XB is preferably selected to achieve a well-conditioned matrix XB XBT for inversion.
As discussed above, the forward weight determination process 610 and the backward weight determination process 620 are configured to determine the effective forward weight values WF and the effective backward weight values WB of the given weight matrix W stored in the matrix-vector multiplication hardware block 600. As explained in further detail below in conjunction with
More specifically, as noted above, the forward weight determination process 610 is configured to determine the effective forward weight values WF for forward pass matrix-vector multiplication operations by taking into consideration that the linear errors in the RPU hardware actually result in the computation of y=W x+bF, where bF denotes a bias term for the forward operation which is caused by various offset errors in the RPU hardware. Similarly, the backward weight determination process 620 is configured to determine the effective backward weight values WB for backward pass matrix-vector multiplication operations by taking into consideration that the linear errors in the RPU hardware actually result in the computation of y=WT x+bB, where bB denotes a bias term for the backward pass operation which is caused by various offset errors in the RPU hardware. The forward weight determination process 610 and the backward weight determination process 620 can be configured to determine the respective bias terms bF and bB, wherein such bias terms are then utilized to determine a set of offset correction parameters OF and OB that are to be applied during forward and backward pass operations.
For example, assume that the forward weight determination process 610 is performed on the RPU system 400 as shown in
In addition, assume that the backward weight determination process 620 is performed on the RPU system 400 as shown in
Next,
In some embodiments, the calibration parameters computation process 630 implements a multivariate linear regression optimization to compute:
S
F
W
F
−W
B
S
B=0 Eqn. 3,
where SF denotes a forward scaling matrix, and SB denotes a backward scaling matrix, wherein SF and SB each comprise a diagonal matrix (more generally, a scaling matrix). A diagonal matrix is a matrix in which the matrix values outside the main diagonal are all zero, and the matrix values of the main diagonal can either be zero or nonzero. The forward scaling matrix SF comprises a set of scaling correction parameters which are applied to the forward pass matrix-vector computations, and the backward scaling matrix SB comprises a set of scaling correction parameters which are applied to backward pass matrix-vector computations.
By way of example, for the RPU array 405 shown in
In view of the above, for the exemplary RPU configurations shown in
As an initial step, the weight extraction process obtains a first set of input vectors xFi={xF1, xF2, . . . xFs} comprising s input vectors (block 701), which are to be utilized for performing forward pass matrix-vector multiplication operations using the stored weight matrix W in the RPU array. The number of elements of each input vector will depend on the dimensions of the stored weight matrix W. In some embodiments, the first set of input vectors comprises a set of random vectors which are configured to provide a high entropy input. For example, in some embodiments, the set of input vectors comprises a set of linearly independent vectors. The vectors in a given set of input vectors are deemed to be linearly independent vectors if no vector in the given set of input vectors is a linear combination of other vectors in the set of input vectors. By way of example, in some embodiments, the set of input vectors can be obtained from rows of a Hadamard matrix, which is a square matrix having entries of either +1 or −1, wherein the rows of the Hadamard matrix are mutually orthogonal (i.e., all rows are orthogonal to each other and are therefore linearly independent). In some embodiments, the number s of input vectors that are utilized for the weight extraction process will vary depending on, e.g., the size of the stored weight matrix W. For example, assuming that the weight matrix W has matrix size of m×n, the number of input vectors s can be on the order of 10×n or greater, or 10×m or greater.
Furthermore, as noted above, to enable computation of forward pass offset correction parameters, each input vector xFi={xF1, xF2, . . . xFs} will have an additional element of value “1” added to the input vector, which is applied to a dummy row or dummy column of the RPU array, depending on how the RPU array is configured for forward pass operations (e.g., whether the input vectors are input to the columns or rows of the RPU array). As further noted above, in some embodiments, the dummy row or dummy column will be initially encoded with weight values of “0”.
The weight extraction process sequentially inputs each input vector xFi to the RPU system to perform forward pass matrix-vector multiplication by multiplying the weight matrix W stored in the RPU array by each input vector xFi to obtain a first set of output vectors (block 702). More specifically, as noted above, the matrix-vector multiplication operations in the forward direction, i.e., YFi=WxFi, result in a set of vector pairs, {xFi, yFi}i=1S, comprising s pairs of vectors xFi and yFi. The weight extraction process performs a computation using the set of vector pairs, {xFi, yFi}i=1S to determine an effective forward weight matrix WF (block 703). For example, in some embodiments, as discussed above in conjunction with
In some embodiments, the inverse matrix (XF XFT)−1 of Eqn. 1 can be computed in the digital domain using any suitable matrix inversion process to compute an estimate of the inverse matrix, For example, in some embodiments, the matrix inversion process is implemented using a Neuman series process and/or a Newton iteration process to compute an approximation of the inverse matrix (XF XFT)−1, which exemplary methods are known to those of ordinary skill in the art. In some embodiments, the matrix inversion process is performed using the hardware acceleration computing techniques as disclosed in U.S. patent application Ser. No. 17/134,814, filed on Dec. 28, 2020, entitled: Matrix Inversion Using Analog Resistive Crossbar Array hardware, which is commonly assigned and fully incorporated herein by reference.
After computing the effective forward weight matrix WF, the weight extraction process will determine a set of offset correction parameters OF for forward pass matrix-vector multiplication operations based on the effective forward weight values of the weights in the dummy column (or row) of the forward weight matrix WF (block 704). As noted above, the weight values of the weights in the dummy column (or row) of the forward weight matrix WF would represent the respective bias terms bF (offsets value) of the respective rows (or columns) of the RPU array. The offset correction parameters OF for the forward pass operation would have values that are determined to negate the respective bias terms bF to thereby ensure that offset errors for the forward pass operations would be corrected to “0” (e.g., OF+bF=0).
Following completion of the forward pass matrix-vector multiplication operations (in block 702), the weight extraction process proceeds to obtain a second set of input vectors xBi={xB1, xB2, . . . xBs} comprising s input vectors (block 705), which are to be utilized for performing backward pass matrix-vector multiplication operations using the stored transpose WT of the weight matrix W in the RPU array. The number of elements of each input vector xBi will depend on the dimensions of the stored weight matrix W. In some embodiments, assuming the stored weight matrix W is a square matrix, the second set of input vectors be the same first set of input vectors used for the forward pass matrix-vector multiplication operations. In other embodiments, the second set of input vectors comprises a set of random vectors which are configured to provide a high entropy input. For example, in some embodiments, the set of input vectors comprises a set of linearly independent vectors, which can be obtained from rows of a Hadamard matrix. As noted above, the number s of input vectors that are utilized for the weight extraction process will vary depending on, e.g., the size of the stored weight matrix W. For example, assuming that the weight matrix W has matrix size of m×n, the second set of input vector can have a number of input vectors s on the order of 10×n or greater, or 10×m or greater.
Furthermore, as noted above, to enable computation of backward pass offset correction parameters, each input vector xBi={xB1, xB2, . . . xBs} will have an additional element of value “1” added to the input vector, which is applied to a dummy row or dummy column of the RPU array, depending on how the RPU array is configured for backward pass operations (e.g., whether the input vectors are input to the columns or rows of the RPU array). As further noted above, in some embodiments, the dummy row or dummy column will be initially encoded with weight values of “0”.
The weight extraction process sequentially inputs each input vector xBi to the RPU system to perform backward pass matrix-vector multiplication by multiplying the transpose WT of the weight matrix W stored in the RPU array by each input vector xBi to obtain a second set of output vectors (block 706). More specifically, as noted above, the matrix-vector multiplication operations in the backward direction, i.e., yBi=WTxBi, result in a set of vector pairs, {xBi, yBi}i=1S, comprising s pairs of vectors xBi and yBi. The weight extraction process performs a computation using the set of vector pairs, {xBi, yBi}i=1S to determine an effective backward weight matrix WB (block 707). For example, in some embodiments, as discussed above in conjunction with
After computing the effective backward weight matrix WB, the weight extraction process will determine a set of offset correction parameters OB for backward pass matrix-vector multiplication operations based on the effective backward weight values of the weights in the dummy row (or column) of the backward weight matrix WB (block 708). As noted above, the weight values of the weights in the dummy row (or column) of the backward weight matrix WB would represent the respective bias terms bB (offsets value) of the respective columns (or rows) of the RPU array. The offset correction parameters OB for the backward pass operation would have values that are determined to negate the respective bias terms bB to thereby ensure that offset errors for the backward pass operations would be corrected to “0” (e.g., OF+bF=0).
After computing the effective forward and backward weight matrices WF and WB, the autocalibration process 130 performs an optimization computation (e.g., Eqn. 3) using the effective forward and backward weight matrices WF and WB to determine (i) a set of scaling correction parameters SF for forward pass matrix-vector multiplication operations performed by the RPU array, and (ii) a set of scaling correction parameters SB for backward pass matrix-vector multiplication operations performed by the RPU array (block 709). The autocalibration process 130 will then configure the RPU system to enable the RPU to apply the determined offset correction parameters and the scaling correction parameter to forward pass and backward pass matrix-vector multiplication operation performed by the RPU array (block 710).
In some embodiments, the RPU system is configured to apply the offset correction parameters and the scaling correction parameters to the output vectors that are generated as a result of the forward and backward pass matrix-vector multiplication operations. For example, in some embodiments, the artificial neurons, which process the output vectors generated by the RPU array for forward pass and back operations, are configured to apply the offset and scaling correction parameters to the output vectors before performing, e.g., NLF operations. In other embodiments, the RPU system is configured (i) to apply the offset and scaling correction parameters for the forward pass operation to the input vectors before performing a forward pass matrix-vector multiplication operation, and (ii) to apply the offset and scaling correction parameters for the backward pass operation to the input error vectors before performing a backward pass matrix-vector multiplication operation.
In some embodiments where the RPU system comprises noise and bound management circuitry as described above, the autocalibration process can configure the noise and bound management circuitry to apply the offset correction parameters and the scaling correction parameters to the input vectors or output vectors for calibrating the forward and backward pass matrix-vector multiplication operations. In such embodiments, the offset correction parameters and the scaling correction parameters are applied to the input vectors or output vectors in addition to the dynamic scaling up and scaling down that is performed by noise and bound management circuitry as described above to overcome noise and signal bound issues.
For example, in some embodiments, the digital processing system 110 communicates with a programming interface of the neuromorphic computing system 120 to configure one or more artificial neurons and a routing system of the neuromorphic computing system 120 to allocate and configure one or more neural cores to (i) implement one or more interconnected RPU arrays for storing initial weight matrices and to (ii) perform in-memory computations (e.g., matrix-vector computations, outer product computations, etc.) needed to implement the training process and weight extraction process. In some embodiments, the number of RPU arrays that are allocated and interconnected to configure the artificial synapses of the artificial neural network will vary depending on, e.g., the number of neural network layers (which can be 10 or more layers for deep neural networks), the number and sizes of the synaptic weight arrays that are needed for connecting the neural network layers, the size of the RPU arrays, etc. For example, if each RPU array has a size of 4096×4096, then one RPU array can be configured to store the values of a given m×n weight matrix W, where m and n are 4096 or less. In some embodiments, when the given m×n weight matrix W is smaller than the physical RPU on which the given m×n weight matrix W is stored, any unused RPU cells can be set to zero and/or unused inputs to the RPU array can be padded by “zero” voltages. In some embodiments, when the size of the given m×n weight matrix W is greater than the size of a single RPU array, then multiple RPU arrays can be operatively interconnected to form a synaptic weight array which is large enough to store the values of the given m×n weight matrix W.
Furthermore, in some embodiments, the autocalibration process 130 is configured to operate in conjunction with the artificial neural network configuration process 136 to configure the RPU system to apply the offset correction parameters and the scaling correction parameters, which were computed by the autocalibration process 130. For example, as noted above, in some embodiments, the artificial neurons of the neural network layers can be configured to apply the offset correction parameters and the scaling correction parameters to input vectors prior to performing forward pass or backward pass matrix-vector multiplication operations. In some embodiments, the artificial neurons of the neural network layers can be configured to apply the offset correction parameters and the scaling correction parameters to output vectors that are generated as a result of performing forward pass or backward pass matrix-vector multiplication operations. In some embodiments, the noise and bound management circuitry of the RPU arrays can be configured to apply the offset correction parameters and the scaling correction parameters to input or output vectors that are processed or generated by the RPU arrays.
Following the initial configuration of the RPU system to implement the architecture of the artificial neural network to be trained, the digital processing system 110 invokes the artificial neural network training process 138 to commence a training process (block 800). For ease of discussion, the process flow of
An initial step of the training process involves storing initial synaptic weight values in the RPU array (block 801). In addition, the digital processing system 110 of computing system 100 obtains a set of training data, such as a MNIST (Modified National Institute of Standards and Technology) dataset, for use in training the artificial neural network. The set of training data is converted to a set of input vectors that are applied to the input layer of the artificial neural network. As part of the training process, an input vector would be applied to the input layer of the neural network and then propagated through the neural network as part of a forward pass iteration. In this process, the input vectors to a given synaptic weight matrix in the RPU array would represent the input activity of the specific layer connected to the input of the synaptic weight matrix.
During a given forward pass iteration of the training process, an input vector x received from an upstream layer (e.g., input layer) would be input the RPU array which stores the given synaptic weight matrix W (block 802), and a forward pass matrix-vector multiplication operation is performed by multiplying the synaptic weight matrix W stored in the given RPU array by the input vector x to generate a resulting output vector y=Wx (block 803). In some embodiments, the calibration parameters for the forward pass matrix-vector multiplication operation are applied to the output vector y (block 804).
More specifically, in some embodiments, the forward pass matrix-vector multiplication operation is calibrated by applying the set of offset correction parameters OF (computed for the given RPU array) to the respective element values of the output vector y=[y1, y2, . . . , ym], followed by applying the set of scaling correction parameters SF to the offset-corrected element values of the output vector y. By way of example, referring to the exemplary embodiment of
Next, during a given backward pass iteration of the training process, an input error vector xerr received from a downstream layer (e.g., output layer, or downstream hidden layer) would be input the RPU array (block 805), and a backward pass matrix-vector multiplication operation is performed by multiplying the transpose WT of the synaptic weight matrix W stored in the given RPU array by the input error vector xerr to generate a resulting output error vector yerr=WTxerr (block 806).
In some embodiments, the calibration parameters for the backward pass matrix-vector multiplication operation are applied to the output vector y (block 807). More specifically, in some embodiments, the backward pass matrix-vector multiplication operation is calibrated by applying the set of offset correction parameters OB (computed for the given RPU array) to the respective element values of the output error vector yerr, followed by applying the set of scaling correction parameters SB to the offset-corrected element values of the output error vector yerr. By way of example, referring to the exemplary embodiment of
Following the forward pass and backward pass operations, a weight update process is performed to update the synaptic weight values of the weight matrix W stored in the RPU array (block 808). As noted above, the weight update process can be implemented by performing an analog vector-vector outer product operation between the input x vector and the input error vector xerr that were input to the RPU array for the given iteration of the backpropagation training process, the details of which are known to those of ordinary skill in the art.
The iterative training process (blocks 802-808) is repeated for remaining input vectors associated with the obtained training dataset, until a convergence criterion is met, indicating completion of the training process (block 809). When the training process is complete (affirmative determination in block 809), the training process is terminated (block 810).
Exemplary embodiments of the present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be accomplished as one step, executed concurrently, substantially concurrently, in a partially or wholly temporally overlapping manner, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
These concepts are illustrated with reference to
Computer system/server 912 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 912 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
In
The bus 918 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) bus.
The computer system/server 912 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 912, and it includes both volatile and non-volatile media, removable and non-removable media.
The system memory 928 can include computer system readable media in the form of volatile memory, such as random-access memory (RAM) 930 and/or cache memory 932. The computer system/server 912 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 934 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 918 by one or more data media interfaces. As depicted and described herein, memory 928 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
The program/utility 940, having a set (at least one) of program modules 942, may be stored in memory 928 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 942 generally carry out the functions and/or methodologies of embodiments of the disclosure as described herein.
Computer system/server 912 may also communicate with one or more external devices 914 such as a keyboard, a pointing device, a display 924, etc., one or more devices that enable a user to interact with computer system/server 912, and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 912 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 922. Still yet, computer system/server 912 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 920. As depicted, network adapter 920 communicates with the other components of computer system/server 912 via bus 918. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 912. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, SSD drives, and data archival storage systems, etc.
Additionally, it is to be understood that although this disclosure includes a detailed description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present invention are capable of being implemented in conjunction with any other type of computing environment now known or later developed.
Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
Characteristics are as follows:
On-demand self-service: a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
Broad network access: capabilities are available over a network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
Resource pooling: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
Rapid elasticity: capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
Measured service: cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.
Service Models are as follows:
Software as a Service (SaaS): the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based e-mail). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS): the capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
Infrastructure as a Service (IaaS): the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Deployment Models are as follows:
Private cloud: the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.
Public cloud: the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
Hybrid cloud: the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
A cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure that includes a network of interconnected nodes.
Referring now to
Referring now to
Hardware and software layer 1160 includes hardware and software components. Examples of hardware components include: mainframes 1161; RISC (Reduced Instruction Set Computer) architecture based servers 1162; servers 1163; blade servers 1164; storage devices 1165; and networks and networking components 1166. In some embodiments, software components include network application server software 1167 and database software 1168.
Virtualization layer 1170 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers 1171; virtual storage 1172; virtual networks 1173, including virtual private networks; virtual applications and operating systems 1174; and virtual clients 1175.
In one example, management layer 1180 may provide the functions described below. Resource provisioning 1181 provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment. Metering and Pricing 1182 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses. Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources. User portal 1183 provides access to the cloud computing environment for consumers and system administrators. Service level management 1184 provides cloud computing resource allocation and management such that required service levels are met. Service Level Agreement (SLA) planning and fulfillment 1185 provide pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
Workloads layer 1190 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation 1191; software development and lifecycle management 1192; virtual classroom education delivery 1193; data analytics processing 1194; transaction processing 1195; and various functions 1196 for performing hardware accelerated computing and analog in-memory computations using an RPU system with RPU arrays, wherein such computation included, but are not limited to, weight extraction computations, autocalibration operations, matrix-vector multiplication operations, vector-vector outer product operations, neural network training operations, etc., based on the exemplary methods and functions discussed above in conjunction with, e.g.,
The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.