Embodiments of the invention relate to analog neural network computing.
A deep neural network (DNN) is a neural network with an input layer, an output layer, and one or more hidden layers between the input layer and the output layer. Each layer performs operations on one or more tensors. A tensor is a mathematical object that can be zero-dimensional (a.k.a. a scaler), one-dimensional (a.k.a. a vector), two-dimensional (a.k.a. a matrix), or multi-dimensional. The operations performed by the layers are numerical computations including, but not limited to: convolution, deconvolution, fully-connected operations, normalization, activation, pooling, resizing, element-wise arithmetic, concatenation, slicing, etc. Some of the layers apply filter weights to a tensor, such as in a convolution operation.
Neural network computing is computation-intensive and often incurs high power consumption. Thus, neural network inference on edge devices needs to be fast and low-power. Well-designed analog circuits, compared to digital circuits, can speed up inference and improve energy efficiency. However, analog computing is more vulnerable to circuit non-idealities, such as process variation, than their digital counterparts. Circuit non-idealities degrades the accuracy of neural network computing. However, it is costly and infeasible to re-train a neural network that suits every manufactured chip. Thus, it is a challenge to improve the accuracy of analog neural network computing.
In one embodiment, a method is provided for calibrating an analog circuit to perform neural network computing. According to the method, calibration input is provided to a pre-trained neural network that includes at least a given layer having pre-trained weights stored in the analog circuit. The analog circuit performs tensor operations of the given layer using the pre-trained weight. Statistics of calibration output is calculated from the analog circuit. determining normalization operations to be performed during neural network inference at a normalization layer that follows the given layer, wherein the normalization operations incorporate the statistics of the calibration output; and writing a configuration of the normalization operations into memory while keeping the pre-trained weights unchanged.
In another embodiment, a method of analog circuit calibration is provided for neural network computing. The method comprises the steps of: performing, by the analog circuit, tensor operations on the calibration input using pre-trained weights stored in the analog circuit to generate calibration output of a given layer of a neural network; receiving a configuration of a normalization layer that follows the given layer; and performing neural network inference including the tensor operations of the given layer using the pre-trained weights and normalization operations of the normalization layer. The normalization layer is defined by the normalization operations that incorporate statistics of the calibration output.
In yet another embodiment, a device is provided to perform neural network computing. The device includes an analog circuit to store pre-trained weights of at least a given layer of a neural network. The analog circuit is operative to generate calibration output from the given layer by performing tensor operations on calibration input using the pre-trained weights during calibration; and perform neural network inference including the tensor operations of the given layer using the pre-trained weights. The device also includes a digital circuit to receive a configuration of a normalization layer that follows the given layer; and to perform normalization operations of the normalization layer during the neural network inference. The normalization layer is defined by the normalization operations that incorporate statistics of the calibration output.
Other aspects and features will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
Embodiments of the invention provide a device and methods for calibrating an analog circuit to improve the accuracy of analog neural network computations. The device may include both an analog circuit and a digital circuit for performing neural network computations according to a deep neural network (DNN) model. The DNN model includes a first set of layers (“A-layers”) mapped to the analog circuit and a second set of layers (“D-layers”) mapped to the digital circuit. Each layer is defined by corresponding operations. For example, a convolution layer is defined by corresponding filter weights and parameters for performing the convolution. The DNN model is pre-trained before loading onto devices. However, analog circuits fabricated on different chips may have different non-ideal characteristics. Thus, the same set of pre-trained filter weights and parameters may cause different analog circuits to generate different outputs. The calibration described herein removes or reduces the variations across different chips.
The calibration is performed offline after DNN training on the output of each A-layer. During the calibration process, calibration input is fed into the DNN and the statistics of the calibration output of each A-layer is collected. The calibration input may be a subset of the training data used for the DNN training. The calibration is different from re-training because the parameters and weights learned in the training remain unchanged during and after the calibration.
In some embodiments, the statistics of each A-layer's calibration output are used to modify or replace some of the operations defined in the DNN model. The statistics may be used to modify a batch normalization (BN) layer that is located immediately after an A-layer in the DNN model. Alternatively, the statistics may be used to define a set of multiply-and-add operations that apply to the output of an A-layer. In the following description, the term “normalization layer” refers to the layer that is located immediately after an A-layer and applies normalization operations to the output of the A-layer. The normalization operations are determined based on the statistics of the calibration output of the A-layer. After the calibration and the configuration of normalization layers, the device carries out inference according to the calibrated DNN model that includes the normalization layers.
In one embodiment, the tensor operations performed by the A-layers and the D-layers may be convolution operations. The convolutions performed by an A-layer and a D-layer may be the same or different types of convolutions. For example, an A-layer may perform normal convolutions and a D-layer may perform depth-wise convolutions or vice versa. The channel dimension is the same as the depth dimension. Suppose that a convolution layer receives an input tensor of M channels and produces an output tensor of N channels, where M and N may be the same number or different numbers. In a “normal convolution” where N filters are used, each filter convolves with M channels of the input tensor to produce M outputs. The M outputs are summed up to generate one of the N channels of the output tensor. In a “depth-wise convolution,” M=N and there is a one-to-one correspondence between M filters used in the convolution and the M channels of the input tensor, where each filter convolves with one channel of the input tensor to produce one channel of the output tensor.
In one embodiment, the digital circuit 110 is coupled to a memory 130, which may include memory devices such as dynamic random-access memory (DRAM), static random access memory (SRAM), flash memory, and other non-transitory machine-readable storage media; e.g., volatile or non-volatile memory devices. To simplify the illustration, the memory 130 is represented as one block; however, it is understood that the memory 130 may represent a hierarchy of memory components such as cache memory, system memory, solid-state or magnetic storage devices, etc. The digital circuit 110 executes instructions stored in the memory 130 to perform operations such as tensor operations and normalization operations for one or more neural network layers.
In one embodiment, the device 100 also includes a controller 140 to schedule and assign operations defined in a DNN model to the digital circuit 110 and the analog circuit 120. In one embodiment, the controller 140 may be part of the digital circuit 110. In one embodiment, the device 100 also includes a calibration circuit 150 for performing calibration of the analog circuit 120. The calibration circuit 150 is illustrated in dashed outlines to show it may be located in an alternative location. The calibration circuit 150 may be on the same chip as the analog circuit 120; alternatively, the calibration circuit 150 may be on a different chip from the analog circuit 120, but in the same device 100. In yet another embodiment, the calibration circuit 150 may be in another system or device, such as a computer or a server.
The device 100 may also include a network interface 160 for communicating with another system or device via a wired and/or wireless network. It is understood that the device 100 may include additional components not shown in
The DNN model 200 in
The calculation of the statistics may be performed by an on-chip processor or circuit; alternatively, the calculation may be performed by off-chip hardware or another device such as a computer or server. At step 450 for each A-layer, the statistics are incorporated into normalization operations that define a normalization layer following the A-layer in the DNN. Non-limiting examples of the normalization operations will be provided with reference to
The normalization layer 500 is defined by normalization operations that apply to a tensor (represented by a cube 550 in solid outlines) output from the A-layer 510. During calibration, this tensor is referred to as the calibration output or calibration output activation. The tensor has a height dimension (H), a width dimension (W), and a depth dimension (C) that is also referred to as a channel dimension. The normalization operations transform each xi (represented by an elongated cube in dashed outlines) into {circumflex over (x)}i. Both xi and {circumflex over (x)}i extend across the entire depth dimension C. In the example of
The normalization layer 600 is defined by normalization operations that apply to a tensor (represented by each cube 650 in solid outlines) output from the A-layer 510. During calibration, this tensor is referred to as the calibration output or calibration output activation. The tensor has a height dimension (H), a width dimension (W), and a depth dimension (C) that is also referred to as a channel dimension. The normalization operations transform each Fk,i,j (represented by one slice of an elongated cube in dashed outlines) into {circumflex over (F)}k,i,j, where the running index k identifies a specific channel. Both Fk,i,j and {circumflex over (F)}k,i,j are per-channel tensors. In the example of
The method 700 begins at step 710 when a calibration circuit sends calibration input to a pre-trained neural network that includes at least a given layer having pre-trained weights stored in the analog circuit. At step 720, the calibration circuit calculates statistics of calibration output from the analog circuit, which performs tensor operations of the given layer on the calibration input using the pre-trained weights. At step 730, the calibration circuit determines normalization operations to be performed during neural network inference at a normalization layer that follows the given layer. The normalization operations incorporate the statistics of the calibration output. At step 740, the calibration circuit writes a configuration of the normalization operations into memory. The pre-trained weights remain unchanged after the calibration.
The method 800 begins at step 810 when the analog circuit performs tensor operations on calibration input using pre-trained weights that are stored in the analog circuit. By performing the tensor operations, the analog circuit generates calibration output of a given layer of a neural network. At step 820, the device receives a configuration of a normalization layer that follows the given layer. The normalization layer is defined by normalization operations that incorporate statistics of the calibration output. At step 830, the device performs neural network inference including the tensor operations of the given layer using the pre-trained weights and the normalization operations of the normalization layer.
In one embodiment, during the neural network inference, the analog circuit is assigned to perform the tensor operations of the given layer using the pre-trained weights, and a digital circuit in the device is assigned to perform the normalization operations of the normalization layer.
Various functional components or blocks have been described herein. As will be appreciated by persons skilled in the art, the functional blocks will preferably be implemented through circuits (either dedicated circuits or general-purpose circuits, which operate under the control of one or more processors and coded instructions), which will typically comprise transistors that are configured in such a way as to control the operation of the circuitry in accordance with the functions and operations described herein.
The operations of the flow diagrams of
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, and can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.
This application claims the benefit of U.S. Provisional Application No. 63/139,463 filed on Jan. 20, 2021, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63139463 | Jan 2021 | US |