Neural networks for embedded devices

Information

  • Patent Grant
  • 11983630
  • Patent Number
    11,983,630
  • Date Filed
    Thursday, January 19, 2023
    a year ago
  • Date Issued
    Tuesday, May 14, 2024
    20 days ago
Abstract
A neural network architecture is used that reduces the processing load of implementing the neural network. This network architecture may thus be used for reduced-bit processing devices. The architecture may limit the number of bits used for processing and reduce processing to prevent data overflow at individual calculations of the neural network. To implement this architecture, the number of bits used to represent inputs at levels of the network and the related filter masks may also be modified to ensure the number of bits of the output does not overflow the resulting capacity of the reduced-bit processor. To additionally reduce the load for such a network, the network may implement a “starconv” structure that permits the incorporation of nearby nodes in a layer to balance processing requirements and permit the network to learn from context of other nodes.
Description
BACKGROUND

This disclosure generally relates to the deployment of deep neural networks for embedded or internet-of-things (JOT) devices.


Neural networks are often used to perform various tasks, particularly for image analysis, such as object recognition, facial recognition, or segmentation. In more typical implementations, such networks are implemented on relatively complex processors, which may include operations with a high level of precision and with significant bit-length, such as 32-bit floating point operations to multiply and sum data at various layers of a neural network. These processors may be too complex or expensive for use in inexpensive devices, such as IOT devices that may include inexpensive processors having a more limited bit-length, preventing such IOT devices from effectively implementing neural networks. In addition to reduced-bit processing, these devices may also implement reduced-bit storage, further limiting the working capacity of such devices to successfully implement neural network structures.


SUMMARY

A neural network architecture is used that reduces the processing load of implementing the neural network. This network architecture may thus be used for reduced-bit processing devices. The architecture may limit the number of bits used for processing and reduce processing to prevent data overflow at individual calculations of the neural network. To implement this architecture, the number of bits used to represent inputs at levels of the network and the related filter masks may also be modified to ensure the number of bits of the output does not overflow the resulting capacity of the reduced-bit processor. To additionally reduce the load for such a network, the network may implement a “starconv” structure that permits the incorporation of nearby nodes in a layer to balance processing requirements and permit the network to learn from context of other nodes.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates the star-shaped convolution filter (star-conv), according to one embodiment.



FIG. 2 illustrates the star-shuffle neural network block, according to one embodiment.



FIG. 3 illustrates an example StarNet deep neural network architecture, according to one embodiment.



FIG. 4 illustrates example equations for quantization and dequantization, according to one embodiment.



FIG. 5 illustrates example equations for determining quantization parameters, according to one embodiment.



FIG. 6 illustrates example equations for adjacent quantization equations, according to one embodiment.



FIG. 7 illustrates an example process for generating a neural network structure including input layers and filters, according to one embodiment.





The figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.


DETAILED DESCRIPTION

Motivation


Computer implementations of deep neural networks (DNNs) commonly use floating-point arithmetic. As used herein, a deep neural network is a computer model that generates a set of outputs based on a set of inputs using a plurality of processing layers between the inputs and outputs. These processing layers may be “hidden” in the sense that the processing layers are not directly exposed during use, and represent arithmetic processes that together generate the set of outputs from the set of inputs. Individual nodes in these layers are typically connected by weights representing a weight of a value in a prior node that affects a current node. As an example, to process an image, the pixels of an image may be represented as an input layer. A subsequent layer may apply various filters, such as a convolutional filter, to a window of pixels in the input layer to generate values for that layer. This is often performed with floating-point arithmetic to increase precision in representing values within the network. However, low-cost and low-power computer processors (such as those used in internet-of-things devices) often do not provide support for floating-point arithmetic, and integer arithmetic must be used instead.


Further, while computer implementations of neural networks commonly use 32-bit arithmetic, low-power computer processors often run most efficiently (i.e. lowest power usage and/or highest throughput) using 8-bit arithmetic.


This presents a need for effective neural networks for use in lower-bit arithmetic and storage (e.g., 8-bit arithmetic and 8-bit storage) that is not well-addressed by existing frameworks.


Design Constraints


When the goal is to get the best tradeoff of speed, energy-efficiency, and accuracy, the optimal DNN architecture (sometimes called a “topology” or “neural structure”) varies depending on the processing platform that it will be deployed on.


This disclosure relates to implementing neural network architectures on a reduced-bit architecture, which may be included reduced-bit (e.g., 8-bit) arithmetic and storage.


As one example architecture, a processing platform is a system-on-chip (SOC) that has multiple types of processing cores. Some of the cores on the SOC are general-purpose central processing unit (CPU) cores that support 8-, 16-, and 32-bit computations. But, these CPU cores are relatively slow and comparatively energy-intensive. However, the SOC also has specialized digital signal processing (DSP) cores that enable fast, energy-efficient, and highly-parallel computations. These DSP cores typically only support efficient computation of 8-bit signed integer computations. The network architecture discussed herein may be implemented on such DSP cores while rarely (or never) using the CPU cores.


The main data type supported in the DSP cores is the 8-bit signed integer.


Some processors support what is called “saturating arithmetic.” In saturating arithmetic, for 8-bit signed integers, if variables X and Y are of type signed int, the maximum value of X+Y is 127. For example, if X=120 and Y=120, a saturating addition of X+Y would give the result of 127. However, with non-saturating arithmetic, X+Y typically overflows such that the result of X+Y would be −16 (i.e. negative 16). The DNN architectures discussed herein are implemented with processors using non-saturating arithmetic. Thus, overflow happens when the result exceeds the maximum value or minimum value that can be represented by the number of bits on the register that are used to perform arithmetic operations.


However, to avoid generating incorrect numerical results, the arithmetic should not overflow. This is particularly challenging when using 8-bit storage and 8-bit arithmetic. For example, multiplying large two 8-bit numbers—e.g., 125 and 126—the correct result is 15750, but the largest value representable in a signed 8-bit number is +127.


Division is an expensive arithmetic operation (requiring more computational cycles than multiplication or additions). Accordingly, effective use of an 8-bit architecture rarely or never uses division.


The bit-shift operator may be used. Bit-shift requires fewer computational cycles than division. For division by powers-of-two, the bit-shift operator can be used in place of division to produce the same results.


Elementary Components of StarNet


A family of neural network architectures, generally termed “StarNet,” is disclosed to effectively implement neural networks on such reduced-bit architectures. In one embodiment, the components and processes described below may refer to or may be performed by an online system in communication with devices including reduced-bit architectures, such as internet-of-thing (IoT) devices.


To avoid overflows while performing 8-bit computations, StarNet applies the following techniques.


Neural networks commonly use convolution filters that each perform thousands of calculations (e.g. a 3×3×512 filter has 4068 elements and performs 4068 multiply-accumulate operations). The result of a 4068-element convolution will overflow with many possible input activations when computed using 8-bit arithmetic. Consider the case where the input activations consists of all ones and the filter (e.g., weights for combining prior layer values) consists of all ones (i.e. every element of the filter has a numerical value of one). The output of a convolution calculation in this example is the number 4068, which is much too large to be represented in 8-bit arithmetic and therefore would overflow and provide incorrect numerical results.


To effectively implement a neural network in reduced-bit architecture, the DNN is structured to have fewer elements per filter, such as 32 elements per filter. In one embodiment, the StarNet DNN architecture for 8-bit arithmetic and 8-bit storage has a maximum of 32 elements per filter.


Even when using a 32-element filter, 8-bit arithmetic can still overflow. For example, consider the case where the input activations consists of all ones. And, the filter consists of all fives (i.e. every element of the filter has a numerical value of five). In this case, the correct output of the convolution calculation is 160, but again the maximum representable value in an 8-bit signed integer is 127, so this overflows.


To avoid overflow, the network architecture may use various approaches to reduce the possible filter outputs within the range of the output values. The particular approach may vary, including within a given network model, based on the number of elements in the filter. In one implementation, linear quantization is used to bin floating-point values of filters and activations into a low-bit width integer representation. In one linear quantization scheme, the range of values of the linear bins is determined by analyzing the maximum and minimum numerical values that are observed in tensors of the neural network, looking at the dimensions of the filters, and then selecting maximum and minimum values for the bins such that the output cannot overflow.


In the case of a 32-element filter, input activations can be quantized to 2 bits plus the sign bit; we abbreviate this to (2+s). And, weights can be quantized to (1+s). So, the maximum value of an activation is 3 (which is the largest number representable in (2+s) arithmetic), and the maximum value of a weight is 1 (which is the largest number representable in (1+s) arithmetic). So, the largest possible output value is 32*3*1=96, which is smaller than 127 and therefore does not overflow during 8-bit arithmetic. Since values are stored in 8-bits in this example the storage of the activations and weights uses a subset of those 8 bits.


In the case of a 16-element filter, input activations are represented as (3+s), with a maximum value of 7, and weights are represented as (1+s), with a maximum value of 1. The maximum output value of this convolution is 16*7*1=112, which is less than 127 and therefore does not overflow.


In the case of an 8-element filter, input activations are represented as (2+s), with a maximum value of 3, and weights are represented as (2+s), with a maximum value of 3. The maximum output value of this convolution is 8*3*3=72, which is less than 127 and therefore does not overflow.


The 32-, 16-, and 8-element filters discussed so far are 1×1×Channels filters, where 32, 16, or 8 is the number of channels in the filter. Note that the number of channels in the input activations can be larger than the number of channels in a filter. This is accomplished using what are called group convolutions. Group convolutions have a hyperparameter called group-length. If the input activations have 1024 channels, and group-length is set to 32, then each filter will span a 32-channel subset of the 1024 input channels.


Convolutional neural networks commonly have some layers with filters of size 1×1×Channels and other layers with filters of size 3×3×Channels. In a 3×3 filter with a group-length of 1, there are 9 elements. A good representation of a 9-element convolution using (unsigned, non-saturating) 8-bit arithmetic is to represent weights as (2+s) and input activations as (2+s). In this configuration, the maximum output is 3*3*9=81.


However, with the goal of minimizing the number of elements (and thus being able to represent filters and activations with more bits), the following is a way to perform a convolution with a 2D spatial resolution while using fewer elements. Rather than a 3×3 filter, the idea is to use a “star-shaped” filter. (See FIG. 1.) Here, with respect to a pixel at location (x,y), the filter has weights that correspond with (x,y) and also with the pixels to the immediate top, bottom, left, and right of (x,y). However, unlike a traditional 3×3 convolution, a star-shaped filter may not have weights or zero out weights that correspond with the upper-left, upper-right, lower-left, and lower-right diagonal elements with respect to location (x,y). Thus, in the example shown in FIG. 1, the star-shaped filter has only 5 elements. With only 5 elements, the weights can be represented as (3+s) and the activations can be represented as (2+s). However, the star-shaped filter can generally refer to non-rectangular filters in which only a subset of elements in the filter have non-zero values or are accounted for in the neural network structure. While the example shown in FIG. 1 illustrates a star-shaped filter with a single channel, in other embodiments, each position of the star-shaped filter may be associated with additional elements along the depth of the filter that correspond to one or more channels. Henceforth, this star-shaped filter will be known as “star-conv,” and 1×1×Channels filter will be known as “1×1-conv.”


Note that all of the aforementioned filters have a value of group-length that is greater than 1. When a series of convolution layers have a group-length of greater than 1, what several independent neural networks may be formed that do not share data for several layers in a row because subsets of channels are processed independently for several layers. This leads to a reduction in representational power. To address this, StarNet adopts the “shuffle” layer, which interleaves the ordering of channels to enable communication across what would otherwise be a collection of independent neural networks. For example, a shuffle layer may receive a set of input values that are arranged with respect to a plurality of channels. At the shuffle layer, the neural network structure may interleave the ordering of the channels to increase representational power.


The Star-Shuffle Block


The StarNet family of DNN architectures uses a recurring block called the star-shuffle block. This block consists of the following ordering of neural network layers: {1×1-conv, relu, star-conv, relu, shuffle}.


The design of the star-shuffle block enables it to see a 2d spatial resolution (using star-conv), to mix information across nearby channels (using 1×1-conv with group-length of no more than 32), and to combine information across far-away channels (using the shuffle layer). All of this is accomplished while performing all computation using non-saturating signed 8-bit arithmetic.


Quantization Mechanism


To quantize a number from a generic 8-bit (7+s) representation to a lower-bit representation, e.g. (2+s), bins are generated as described in the section “Quantization Binning Process” below. The quantization method has a preprocessing step and a runtime step, which are described in the following.


The preprocessing step generates a set of bins that are used during the runtime step of quantization. This set of bins can be described using “quantization parameters,” which describe the bins. Each layer in the neural network has two sets of quantization parameters: “activation quantization parameters” which describe the binning of input and output values of the layer, and “layer quantization parameters” which describe the binning of the parameters of the layer itself. The parameters of a particular layer may refer to the weights of filters associated with the particular layer.


Each set of bins has two processes associated with it. One is called the “quantization” process, where generic 8-bit (7+s) representations are processed into a lower-bit (2+s) representation. The other process, called the “dequantization” process, is the inverse, where the lower-bit (2+s) representation is transformed back into the 8-bit (7+s) representations. Each binning process describes its own mechanism for quantization and dequantization.


To finalize the preprocessing step, the layer parameters are binned according to the quantization process using the layer quantization parameters. These are referred to as “quantized layer parameters.” For example, a filter with a set of trained weights VR,weights may be quantized using the equation:







V

Q
,

weights


=



V

R
,
weights



A

w

e

i

g

h

t

s



-

B
weights







where Aweights and Bweights are the layer quantization parameters, and VQ,weights are the quantized layer parameters.


During runtime, each layer first applies the quantization binning process using the activation quantization parameters to its input if the input is not quantized. For example, a layer with a set of input values VR,input may be quantized to a quantized input using the equation:







V

Q
,
input


=



V

R
,
input



A
input


-

B
input







where Ainput and Binput are the activation quantization parameters.


The parameters associated with this quantization binning process are attached to the input, and the input is fed into the layer. This layer then applies its standard operation using the quantized layer parameters. For example, using the example above, a quantized output may be generated by the equation:

Quantized Output=fVQ,weights(VQ,input)

where fVQ,weights(⋅) denotes an operation on the quantized input using the quantized layer parameters. For example, this may be a dot product between the filter and the quantized input.


Next, the layer applies the dequantization process using the layer quantization parameters. For example, the quantized output may first be dequantized to an output:

VR,output=Aweights·(Quantized Output+Bweights).

Then, the dequantization process uses the activation quantization parameters that are attached to the original input to dequantize the output. For example, the dequantization of the output may be given by:

Dequantized Output=Ainput·(VR,output+Binput)

Quantization Binning Process


This quantization binning approach creates a set of bins implicitly based on a quantization equation and its corresponding dequantization equation, which are described by two parameters, “A” and “B” as shown in FIG. 4.


To solve for the activation quantization parameters, a dataset is passed through the neural network one example at a time and a set of output values is collected for each layer in the neural network. For each set of output values associated with a layer, the minimum and maximum output values are identified. The minimum and maximum output values are plugged into the dequantization equation, along with the selected bit-width, to produce the system of equations pictured in FIG. 5. This system of equations is solved to find the activation quantization parameters associated with each layer.


This same process occurs with the parameters of the StarNet instance being quantized. Each layer has its minimum and maximum parameter passed into the quantization equation, along with the selected bit-width, to produce the system of equations pictured in FIG. 5. This system of equations is solved to find the layer quantization parameters associated with each layer.


Optimizations can be applied to the quantization method above. In particular, we describe the “quantization collapsing” mechanism by which quantization equations for adjacent layers and activations can be collapsed into a single equation. The mathematical transformation is shown in FIG. 6, where the quantization operation of adjacent bins is collapsed, and the corresponding dequantization operations are collapsed as well. This reduces the number of operations between each quantization and dequantization by a factor of two. However, we also can leave out both of them, using only the initial quantization equation and final dequantization equation.


In various embodiments, by using maximum values that correspond to the values representable by quantized representations (e.g., a maximum value of 7 for (3+) representation), calculations such as division can be performed more often with a bit-shift operator, reducing computational complexity and time in the reduced-bit representation and execution.


StarNet Neural Network Family


Various DNNs can be formed using the star-block. As used herein, a StarNet is a DNN containing one or more star-block modules. In the following, one example implementation of a StarNet neural network architecture is described. In this example, called “StarNet-A,” the DNN is tasked with ingesting an RGB image and classifying the image into one of 1024 categories. See FIG. 3 for a summary of the StarNet-A DNN architecture that is described in the following. With the exception of the first convolution layer in StarNet-A, all layers of StarNet-A can be implemented using only 8-bit arithmetic and 8-bit storage.


The first layer of StarNet-A is a star-conv layer, which is applied to an input image. While the inputs to most layers can be quantized without losing accuracy, one exception to this is that quantizing the input image does damage accuracy. Therefore, in the first layer is computed with 8-bit inputs, 16-bit arithmetic, and 16-bit temporary storage for activations. In one implementation, this first layer is computed on the CPU of an IOT system-on-chip (SOC), while all subsequent layers of StarNet-A are computed on an energy-efficient accelerator that is on the same SOC. A rectified linear unit (relu) follows the first star-conv layer, and the first star-conv layer has a stride of 2.


Next, StarNet-A has a series of 2 star-block modules, the details of which are described in FIG. 3. These star-block modules are followed by downsampling operation which is implemented using max-pooling with a stride of 2.


After that, there are 6 more star-block modules, a max-pool, 12 more star-block modules, a max-pool, and finally 12 more star-block modules. After each downsampling operation (e.g. max-pool), the number of filters is increased.


The first series of 2 star-block modules and the next series of 6 star-block modules use a group length of 8 for their 1×1-conv filters. To avoid overflow, the input activations and the weights for the 1×1-convs are represented using (2+s) bits, and these bits are contained in 128-bit outputs. The rationale for using (2+s) is: the maximum value of a (2+s) number is 3, the group length is 8, so the maximum output value is 3*3*8=72, which is smaller than 127 and therefore does not overflow when the output value is represented in 8-bits.


The next two series of 12 star-block modules have a group length of 16 and 32, respectively. Care is taken to develop a quantization scheme for these modules that does not overflow when using 8-bit storage and 8-bit arithmetic. The particular quantization scheme is shown in FIG. 3.


After the final star-block module, global average pooling is applied. This has the effect of reducing a H×W×Channels tensor of output activations down to a 1×1×Channels vector of output activations. In StarNet-A, the final star-block module has 1024 output channels, so the final output vector (after applying global average pooling) is a 1024-dimensional vector.


When running StarNet-A on an image, the largest of the 1024 output channels is the category that StarNet-A predicts is contained in the image.


In an other implementation, the final layers of StarNet-A can be configured to produce an activation grid that represents a semantic segmentation mask of a whole image.


In an other implementation, the input to StarNet-A includes a depth map.



FIG. 7 illustrates an example process for generating a neural network structure including input layers and filters, according to one embodiment. The online system determines 702 a bit length of a set of registers of the device used to perform arithmetic operations. For example, the registers may have an 8-bit architecture. The online system determines 704 a first integer representation for one or more input layers of the neural network structure and a second integer representation for one or more filters. The first integer representation may be associated with a first range of integer values and the second integer representation may be associated with a second range of integer values. Thus, each element in the input layers when quantized, may have a minimum to maximum range of integer values defined by the first integer representation. Similarly, each element in the filters when quantized, may have a minimum to maximum range of integer values defined by the second integer representation.


The online system generates 706 dimensionalities of the one or more input layers and the one or more filters. The dimensionalities are determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the set of registers. The online system generates 708 the neural network structure with the determined dimensionalities.


SUMMARY

The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.


Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.


Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.


Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.


Embodiments of the invention may also relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may include any embodiment of a computer program product or other data combination described herein.


Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.


Terminology

“Output Activation”: The output data produced by a layer of a deep neural network.


“Input Activation”: The input data provided to a layer of a deep neural network.


“Weight”: A learned parameter in a DNN.


“Filter”: A collection of weights organized in a specific pattern (e.g. a 3×3×256 convolution filter).


“Group-Length”: The number of channels in a convolution filter.

Claims
  • 1. A method of generating a neural network structure including one or more input layers each associated with one or more filters, the method comprising: determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; andgenerating the neural network structure with the determined dimensionalities, wherein for each individual layer which forms the neural network structure, activations are quantized using activation parameters for the individual layer and the weights are quantized using layer parameters for the individual layer, wherein the activations and the activation parameters are provided as input to the individual layer, and wherein quantized output associated with the individual layer is dequantized using (1) the input activation parameters and (2) the layer parameters.
  • 2. The method of claim 1, wherein generating the dimensionalities comprises generating the one or more filters for a corresponding input layer as star-shaped filters.
  • 3. The method of claim 1, further comprising: receiving a set of input values corresponding to the elements of an input layer in the one or more input layers which form activations for the input layer, and a set of weights corresponding to the elements of a filter in the one or more filters with the generated dimensionalities;quantizing the set of input values by assigning each input value to a corresponding integer value in the first integer representation;quantizing the set of weights by assigning each weight to a corresponding integer value in the second integer representation; andcombining the set of input values and the set of weights to generate a quantized output.
  • 4. The method of claim 3, wherein the neural network structure includes a shuffle layer placed after the corresponding input layer, the method further comprising: receiving another set of input values at the shuffle layer, wherein the another set of input values are arranged with respect to a plurality of channels; andinterleaving ordering of the plurality of channels at the shuffle layer.
  • 5. The method of claim 3, wherein quantizing the set of input values comprises: obtaining a dataset including a plurality of data instances; propagating the plurality of data instances through the neural network structure to obtain input values at the input layer;identifying a lower bound value and an upper bound value from the input values obtained at the input layer; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the first integer representation.
  • 6. The method of claim 3, wherein quantizing the set of weights comprises: identifying a lower bound value and an upper bound value from the set of weights; and dividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the second integer representation.
  • 7. The method of claim 1, wherein the bit length of the set of registers are 8 bits, and the arithmetic operations are performed using 8-bit arithmetic.
  • 8. A non-transitory computer-readable medium containing instructions for execution on a processor, the instructions comprising: determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; andgenerating the neural network structure with the determined dimensionalities, wherein for each individual layer which forms the neural network structure, activations are quantized using activation parameters for the individual layer and the weights are quantized using layer parameters for the individual layer, wherein the activations and the activation parameters are provided as input to the individual layer, and wherein quantized output associated with the individual layer is dequantized using (1) the input activation parameters and (2) the layer parameters.
  • 9. The non-transitory computer-readable medium of claim 8, wherein generating the dimensionalities comprises generating the one or more filters for a corresponding input layer as star-shaped filters.
  • 10. The non-transitory computer-readable medium of claim 8, the instructions further comprising: receiving a set of input values corresponding to the elements of an input layer in the one or more input layers which form activations for the input layer, and a set of weights corresponding to the elements of a filter in the one or more filters with the generated dimensionalities;quantizing the set of input values by assigning each input value to a corresponding integer value in the first integer representation;quantizing the set of weights by assigning each weight to a corresponding integer value in the second integer representation; andcombining the set of input values and the set of weights to generate a quantized output.
  • 11. The non-transitory computer-readable medium of claim 10, wherein the neural network structure includes a shuffle layer placed after the corresponding input layer, the instructions further comprising: receiving another set of input values at the shuffle layer, wherein the another set of input values are arranged with respect to a plurality of channels; andinterleaving ordering of the plurality of channels at the shuffle layer.
  • 12. The non-transitory computer-readable medium of claim 10, wherein quantizing the set of input values comprises: obtaining a dataset including a plurality of data instances;propagating the plurality of data instances through the neural network structure to obtain input values at the input layer;identifying a lower bound value and an upper bound value from the input values obtained at the input layer; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the first integer representation.
  • 13. The non-transitory computer-readable medium of claim 10, wherein quantizing the set of weights comprises: identifying a lower bound value and an upper bound value from the set of weights; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the second integer representation.
  • 14. The non-transitory computer-readable medium of claim 8, wherein the bit length of the set of registers are 8 bits, and the arithmetic operations are performed using 8-bit arithmetic.
  • 15. A system comprising: a processor configured to execute instructions;a computer-readable medium containing instructions for execution on the processor, the instructions causing the processor to perform steps of: determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; andgenerating the neural network structure with the determined dimensionalities, wherein for each individual layer which forms the neural network structure, activations are quantized using activation parameters for the individual layer and the weights are quantized using layer parameters for the individual layer, wherein the activations and the activation parameters are provided as input to the individual layer, and wherein quantized output associated with the individual layer is dequantized using (1) the input activation parameters and (2) the layer parameters.
  • 16. The system of claim 15, wherein generating the dimensionalities comprises generating the one or more filters for a corresponding input layer as star-shaped filters.
  • 17. The system of claim 15, the instructions further comprising: receiving a set of input values corresponding to the elements of an input layer in the one or more input layers which form activations for the input layer, and a set of weights corresponding to the elements of a filter in the one or more filters with the generated dimensionalities;quantizing the set of input values by assigning each input value to a corresponding integer value in the first integer representation;quantizing the set of weights by assigning each weight to a corresponding integer value in the second integer representation; andcombining the set of input values and the set of weights to generate a quantized output.
  • 18. The system of claim 17, wherein the neural network structure includes a shuffle layer placed after the corresponding input layer, the instructions further comprising: receiving another set of input values at the shuffle layer, wherein the another set of input values are arranged with respect to a plurality of channels; andinterleaving ordering of the plurality of channels at the shuffle layer.
  • 19. The system of claim 17, wherein quantizing the set of input values comprises: obtaining a dataset including a plurality of data instances;propagating the plurality of data instances through the neural network structure to obtain input values at the input layer;identifying a lower bound value and an upper bound value from the input values obtained at the input layer; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the first integer representation.
  • 20. The system of claim 15, wherein the bit length of the set of registers are 8 bits, and the arithmetic operations are performed using 8-bit arithmetic.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims priority to, U.S. patent application Ser. No. 16/559,483 titled “NEURAL NETWORKS FOR EMBEDDED DEVICES” and filed on Sep. 3, 2019. U.S. patent application Ser. No. 16/559,483 claims the benefit of U.S. Provisional Patent Application No. 62/726,396, filed Sep. 3, 2018. The above-recited applications are hereby incorporated by reference herein in their entirety.

GOVERNMENT SUPPORT CLAUSE

This invention was made with government support under SBIR Phase II Grant Award No. 1758546 awarded by the National Science Foundation. The government has certain rights to the invention.

US Referenced Citations (602)
Number Name Date Kind
6697534 Tan et al. Feb 2004 B1
6882755 Silverstein et al. May 2005 B2
7209031 Nakai et al. Apr 2007 B2
7643659 Cao et al. Jan 2010 B2
7747070 Puri Jun 2010 B2
7904867 Burch et al. Mar 2011 B2
7953253 Cao et al. May 2011 B2
7974492 Nishijima Jul 2011 B2
8165380 Choi et al. Apr 2012 B2
8369633 Lu et al. Feb 2013 B2
8406515 Cheatle et al. Mar 2013 B2
8509478 Haas et al. Aug 2013 B2
8588470 Rodriguez et al. Nov 2013 B2
8744174 Hamada et al. Jun 2014 B2
8773498 Lindbergh Jul 2014 B2
8912476 Fogg et al. Dec 2014 B2
8913830 Sun et al. Dec 2014 B2
8928753 Han et al. Jan 2015 B2
8972095 Furuno et al. Mar 2015 B2
8976269 Duong Mar 2015 B2
9008422 Eid et al. Apr 2015 B2
9081385 Ferguson et al. Jul 2015 B1
9275289 Li et al. Mar 2016 B2
9586455 Sugai et al. Mar 2017 B2
9672437 McCarthy Jun 2017 B2
9710696 Wang et al. Jul 2017 B2
9738223 Zhang et al. Aug 2017 B2
9754154 Craig et al. Sep 2017 B2
9767369 Furman et al. Sep 2017 B2
9965865 Agrawal et al. May 2018 B1
10133273 Linke Nov 2018 B2
10140252 Fowers et al. Nov 2018 B2
10140544 Zhao et al. Nov 2018 B1
10146225 Ryan Dec 2018 B2
10152655 Krishnamurthy et al. Dec 2018 B2
10167800 Chung et al. Jan 2019 B1
10169680 Sachdeva et al. Jan 2019 B1
10192016 Ng et al. Jan 2019 B2
10216189 Haynes Feb 2019 B1
10228693 Micks et al. Mar 2019 B2
10242293 Shim et al. Mar 2019 B2
10248121 VandenBerg, III Apr 2019 B2
10262218 Lee et al. Apr 2019 B2
10282623 Ziyaee et al. May 2019 B1
10296828 Viswanathan May 2019 B2
10303961 Stoffel et al. May 2019 B1
10310087 Laddha et al. Jun 2019 B2
10311312 Yu et al. Jun 2019 B2
10318848 Dijkman et al. Jun 2019 B2
10325178 Tang et al. Jun 2019 B1
10331974 Zia et al. Jun 2019 B2
10338600 Yoon et al. Jul 2019 B2
10343607 Kumon et al. Jul 2019 B2
10359783 Williams et al. Jul 2019 B2
10366290 Wang et al. Jul 2019 B2
10372130 Kaushansky et al. Aug 2019 B1
10373019 Nariyambut Murali et al. Aug 2019 B2
10373026 Kim et al. Aug 2019 B1
10380741 Yedla et al. Aug 2019 B2
10394237 Xu et al. Aug 2019 B2
10395144 Zeng et al. Aug 2019 B2
10402646 Klaus Sep 2019 B2
10402986 Ray et al. Sep 2019 B2
10414395 Sapp et al. Sep 2019 B1
10423934 Zanghi et al. Sep 2019 B1
10436615 Agarwal et al. Oct 2019 B2
10452905 Segalovitz et al. Oct 2019 B2
10460053 Olson et al. Oct 2019 B2
10467459 Chen et al. Nov 2019 B2
10468008 Beckman et al. Nov 2019 B2
10468062 Levinson et al. Nov 2019 B1
10470510 Koh et al. Nov 2019 B1
10474160 Huang et al. Nov 2019 B2
10474161 Huang et al. Nov 2019 B2
10474928 Sivakumar et al. Nov 2019 B2
10489126 Kumar et al. Nov 2019 B2
10489972 Atsmon Nov 2019 B2
10503971 Dang et al. Dec 2019 B1
10514711 Bar-Nahum et al. Dec 2019 B2
10528824 Zou Jan 2020 B2
10529078 Abreu et al. Jan 2020 B2
10529088 Fine et al. Jan 2020 B2
10534854 Sharma et al. Jan 2020 B2
10535191 Sachdeva et al. Jan 2020 B2
10542930 Sanchez et al. Jan 2020 B1
10546197 Shrestha et al. Jan 2020 B2
10546217 Albright et al. Jan 2020 B2
10552682 Jonsson et al. Feb 2020 B2
10559386 Neuman Feb 2020 B1
10565475 Lecue et al. Feb 2020 B2
10567674 Kirsch Feb 2020 B2
10568570 Sherpa et al. Feb 2020 B1
10572717 Zhu et al. Feb 2020 B1
10574905 Srikanth et al. Feb 2020 B2
10579058 Oh et al. Mar 2020 B2
10579063 Haynes et al. Mar 2020 B2
10579897 Redmon et al. Mar 2020 B2
10586280 McKenna et al. Mar 2020 B2
10591914 Palanisamy et al. Mar 2020 B2
10592785 Zhu et al. Mar 2020 B2
10599701 Liu Mar 2020 B2
10599930 Lee et al. Mar 2020 B2
10599958 He et al. Mar 2020 B2
10606990 Tuli et al. Mar 2020 B2
10609434 Singhai et al. Mar 2020 B2
10614344 Anthony et al. Apr 2020 B2
10621513 Deshpande et al. Apr 2020 B2
10627818 Sapp et al. Apr 2020 B2
10628432 Guo et al. Apr 2020 B2
10628686 Ogale et al. Apr 2020 B2
10628688 Kim et al. Apr 2020 B1
10629080 Kazemi et al. Apr 2020 B2
10636161 Uchigaito Apr 2020 B2
10636169 Estrada et al. Apr 2020 B2
10642275 Silva et al. May 2020 B2
10645344 Marman et al. May 2020 B2
10649464 Gray May 2020 B2
10650071 Asgekar et al. May 2020 B2
10652565 Zhang et al. May 2020 B1
10656657 Djuric et al. May 2020 B2
10657391 Chen et al. May 2020 B2
10657418 Marder et al. May 2020 B2
10657934 Kolen et al. May 2020 B1
10661902 Tavshikar May 2020 B1
10664750 Greene May 2020 B2
10671082 Huang et al. Jun 2020 B2
10671886 Price et al. Jun 2020 B2
10678244 Iandola et al. Jun 2020 B2
10678839 Gordon et al. Jun 2020 B2
10678997 Ahuja et al. Jun 2020 B2
10679129 Baker Jun 2020 B2
10685159 Su et al. Jun 2020 B2
10685188 Zhang et al. Jun 2020 B1
10692000 Surazhsky et al. Jun 2020 B2
10692242 Morrison et al. Jun 2020 B1
10693740 Coccia et al. Jun 2020 B2
10698868 Guggilla et al. Jun 2020 B2
10699119 Lo et al. Jun 2020 B2
10699140 Kench et al. Jun 2020 B2
10699477 Levinson et al. Jun 2020 B2
10713502 Tiziani Jul 2020 B2
10719759 Kutliroff Jul 2020 B2
10725475 Yang et al. Jul 2020 B2
10726264 Sawhney et al. Jul 2020 B2
10726279 Kim et al. Jul 2020 B1
10726374 Engineer et al. Jul 2020 B1
10732261 Wang et al. Aug 2020 B1
10733262 Miller et al. Aug 2020 B2
10733482 Lee et al. Aug 2020 B1
10733638 Jain et al. Aug 2020 B1
10733755 Liao et al. Aug 2020 B2
10733876 Moura et al. Aug 2020 B2
10740563 Dugan Aug 2020 B2
10740914 Xiao et al. Aug 2020 B2
10748062 Rippel et al. Aug 2020 B2
10748247 Paluri Aug 2020 B2
10751879 Li et al. Aug 2020 B2
10755112 Mabuchi Aug 2020 B2
10755575 Johnston et al. Aug 2020 B2
10757330 Ashrafi Aug 2020 B2
10762396 Vallespi et al. Sep 2020 B2
10768628 Martin et al. Sep 2020 B2
10768629 Song et al. Sep 2020 B2
10769446 Chang et al. Sep 2020 B2
10769483 Nirenberg et al. Sep 2020 B2
10769493 Yu et al. Sep 2020 B2
10769494 Xiao et al. Sep 2020 B2
10769525 Redding et al. Sep 2020 B2
10776109 Chen Sep 2020 B2
10776626 Lin et al. Sep 2020 B1
10776673 Kim et al. Sep 2020 B2
10776939 Ma et al. Sep 2020 B2
10779760 Lee et al. Sep 2020 B2
10783381 Yu et al. Sep 2020 B2
10783454 Shoaib et al. Sep 2020 B2
10789402 Vemuri et al. Sep 2020 B1
10789544 Fiedel et al. Sep 2020 B2
10790919 Kolen et al. Sep 2020 B1
10796221 Zhang et al. Oct 2020 B2
10796355 Price et al. Oct 2020 B1
10796423 Goja Oct 2020 B2
10798368 Briggs et al. Oct 2020 B2
10803325 Bai et al. Oct 2020 B2
10803328 Bai et al. Oct 2020 B1
10803743 Abari et al. Oct 2020 B2
10805629 Liu et al. Oct 2020 B2
10809730 Chintakindi Oct 2020 B2
10810445 Kangaspunta Oct 2020 B1
10816346 Wheeler et al. Oct 2020 B2
10816992 Chen Oct 2020 B2
10817731 Vallespi et al. Oct 2020 B2
10817732 Porter et al. Oct 2020 B2
10819923 McCauley et al. Oct 2020 B1
10824122 Mummadi et al. Nov 2020 B2
10824862 Qi et al. Nov 2020 B2
10828790 Nemallan Nov 2020 B2
10832057 Chan et al. Nov 2020 B2
10832093 Taralova et al. Nov 2020 B1
10832414 Pfeiffer Nov 2020 B2
10832418 Karasev et al. Nov 2020 B1
10833785 O'Shea et al. Nov 2020 B1
10836379 Xiao et al. Nov 2020 B2
10838936 Cohen Nov 2020 B2
10839230 Charette et al. Nov 2020 B2
10839578 Coppersmith et al. Nov 2020 B2
10843628 Kawamoto et al. Nov 2020 B2
10845820 Wheeler Nov 2020 B2
10845943 Ansari et al. Nov 2020 B1
10846831 Raduta Nov 2020 B2
10846888 Kaplanyan et al. Nov 2020 B2
10853670 Sholingar et al. Dec 2020 B2
10853739 Truong et al. Dec 2020 B2
10860919 Kanazawa et al. Dec 2020 B2
10860924 Burger Dec 2020 B2
10867444 Russell et al. Dec 2020 B2
10871444 Al et al. Dec 2020 B2
10871782 Milstein et al. Dec 2020 B2
10872204 Zhu et al. Dec 2020 B2
10872254 Mangla et al. Dec 2020 B2
10872326 Garner Dec 2020 B2
10872531 Liu et al. Dec 2020 B2
10885083 Moeller-Bertram et al. Jan 2021 B2
10887433 Fu et al. Jan 2021 B2
10890898 Akella et al. Jan 2021 B2
10891715 Li Jan 2021 B2
10891735 Yang et al. Jan 2021 B2
10893070 Wang et al. Jan 2021 B2
10893107 Callari et al. Jan 2021 B1
10896763 Kempanna et al. Jan 2021 B2
10901416 Khanna et al. Jan 2021 B2
10901508 Laszlo et al. Jan 2021 B2
10902551 Mellado et al. Jan 2021 B1
10908068 Amer et al. Feb 2021 B2
10908606 Stein et al. Feb 2021 B2
10909368 Guo et al. Feb 2021 B2
10909453 Myers et al. Feb 2021 B1
10915783 Hallman et al. Feb 2021 B1
10917522 Segalis et al. Feb 2021 B2
10921817 Kangaspunta Feb 2021 B1
10922578 Banerjee et al. Feb 2021 B2
10924661 Vasconcelos et al. Feb 2021 B2
10928508 Swaminathan Feb 2021 B2
10929757 Baker et al. Feb 2021 B2
10930065 Grant et al. Feb 2021 B2
10936908 Ho et al. Mar 2021 B1
10937186 Wang et al. Mar 2021 B2
10943101 Agarwal et al. Mar 2021 B2
10943132 Wang et al. Mar 2021 B2
10943355 Fagg et al. Mar 2021 B2
11179064 Ng et al. Nov 2021 B2
11562231 Iandola et al. Jan 2023 B2
20030035481 Hahm Feb 2003 A1
20050162445 Sheasby et al. Jul 2005 A1
20060072847 Chor et al. Apr 2006 A1
20060224533 Thaler Oct 2006 A1
20060280364 Ma et al. Dec 2006 A1
20070154095 Cao et al. Jul 2007 A1
20070154096 Cao et al. Jul 2007 A1
20090016571 Tijerina et al. Jan 2009 A1
20100118157 Kameyama May 2010 A1
20120109915 Kamekawa May 2012 A1
20120110491 Cheung May 2012 A1
20120134595 Fonseca et al. May 2012 A1
20150104102 Carreira et al. Apr 2015 A1
20160132786 Balan et al. May 2016 A1
20160283842 Pescianschi Sep 2016 A1
20160328856 Mannino et al. Nov 2016 A1
20170011281 Dihkman et al. Jan 2017 A1
20170158134 Shigemura Jun 2017 A1
20170206434 Nariyambut et al. Jul 2017 A1
20180012411 Richey et al. Jan 2018 A1
20180018590 Szeto et al. Jan 2018 A1
20180039853 Liu et al. Feb 2018 A1
20180067489 Oder et al. Mar 2018 A1
20180068459 Zhang et al. Mar 2018 A1
20180068540 Romanenko et al. Mar 2018 A1
20180074506 Branson Mar 2018 A1
20180121762 Han et al. May 2018 A1
20180150081 Gross et al. May 2018 A1
20180211403 Hotson et al. Jul 2018 A1
20180308012 Mummadi et al. Oct 2018 A1
20180314878 Lee et al. Nov 2018 A1
20180357511 Misra et al. Dec 2018 A1
20180374105 Azout et al. Dec 2018 A1
20190023277 Roger et al. Jan 2019 A1
20190025773 Yang et al. Jan 2019 A1
20190042894 Anderson Feb 2019 A1
20190042919 Peysakhovich et al. Feb 2019 A1
20190042944 Nair et al. Feb 2019 A1
20190042948 Lee et al. Feb 2019 A1
20190057314 Julian et al. Feb 2019 A1
20190065637 Bogdoll et al. Feb 2019 A1
20190072978 Levi Mar 2019 A1
20190079526 Vallespi et al. Mar 2019 A1
20190080602 Rice et al. Mar 2019 A1
20190095780 Zhong et al. Mar 2019 A1
20190095946 Azout et al. Mar 2019 A1
20190101914 Coleman et al. Apr 2019 A1
20190108417 Talagala et al. Apr 2019 A1
20190122111 Min et al. Apr 2019 A1
20190130255 Yim et al. May 2019 A1
20190145765 Luo et al. May 2019 A1
20190146497 Urtasun et al. May 2019 A1
20190147112 Gordon May 2019 A1
20190147250 Zhang et al. May 2019 A1
20190147254 Bai et al. May 2019 A1
20190147255 Homayounfar et al. May 2019 A1
20190147335 Wang et al. May 2019 A1
20190147372 Luo et al. May 2019 A1
20190158784 Ahn et al. May 2019 A1
20190180154 Orlov et al. Jun 2019 A1
20190185010 Ganguli et al. Jun 2019 A1
20190189251 Horiuchi et al. Jun 2019 A1
20190197357 Anderson et al. Jun 2019 A1
20190204842 Jafari et al. Jul 2019 A1
20190205402 Sernau et al. Jul 2019 A1
20190205667 Avidan et al. Jul 2019 A1
20190217791 Bradley et al. Jul 2019 A1
20190227562 Mohammadiha et al. Jul 2019 A1
20190228037 Nicol et al. Jul 2019 A1
20190230282 Sypitkowski et al. Jul 2019 A1
20190235499 Kazemi et al. Aug 2019 A1
20190236437 Shin et al. Aug 2019 A1
20190243371 Nister et al. Aug 2019 A1
20190244138 Bhowmick et al. Aug 2019 A1
20190250622 Nister et al. Aug 2019 A1
20190250626 Ghafarianzadeh et al. Aug 2019 A1
20190250640 O'Flaherty et al. Aug 2019 A1
20190258878 Koivisto et al. Aug 2019 A1
20190266418 Xu et al. Aug 2019 A1
20190266610 Ghatage et al. Aug 2019 A1
20190272446 Kangaspunta et al. Sep 2019 A1
20190276041 Choi et al. Sep 2019 A1
20190279004 Kwon et al. Sep 2019 A1
20190286652 Habbecke et al. Sep 2019 A1
20190286972 El Husseini et al. Sep 2019 A1
20190287028 St Amant et al. Sep 2019 A1
20190289281 Badrinarayanan et al. Sep 2019 A1
20190294177 Kwon et al. Sep 2019 A1
20190294975 Sachs Sep 2019 A1
20190311290 Huang et al. Oct 2019 A1
20190318099 Carvalho et al. Oct 2019 A1
20190325088 Dubey et al. Oct 2019 A1
20190325266 Klepper et al. Oct 2019 A1
20190325269 Bagherinezhad et al. Oct 2019 A1
20190325580 Lukac et al. Oct 2019 A1
20190325595 Stein et al. Oct 2019 A1
20190329790 Nandakumar et al. Oct 2019 A1
20190332875 Vallespi-Gonzalez et al. Oct 2019 A1
20190333232 Vallespi-Gonzalez et al. Oct 2019 A1
20190336063 Dascalu Nov 2019 A1
20190339989 Liang et al. Nov 2019 A1
20190340462 Pao et al. Nov 2019 A1
20190340492 Burger et al. Nov 2019 A1
20190340499 Burger et al. Nov 2019 A1
20190347501 Kim et al. Nov 2019 A1
20190349571 Herman et al. Nov 2019 A1
20190354782 Kee et al. Nov 2019 A1
20190354786 Lee et al. Nov 2019 A1
20190354808 Park et al. Nov 2019 A1
20190354817 Shlens et al. Nov 2019 A1
20190354850 Watson et al. Nov 2019 A1
20190370398 He et al. Dec 2019 A1
20190370575 Nandakumar et al. Dec 2019 A1
20190370935 Chang et al. Dec 2019 A1
20190373322 Rojas-Echenique et al. Dec 2019 A1
20190377345 Bachrach et al. Dec 2019 A1
20190377965 Totolos et al. Dec 2019 A1
20190378049 Widmann et al. Dec 2019 A1
20190378051 Widmann et al. Dec 2019 A1
20190382007 Casas et al. Dec 2019 A1
20190384303 Muller et al. Dec 2019 A1
20190384304 Towal et al. Dec 2019 A1
20190384309 Silva et al. Dec 2019 A1
20190384994 Frossard et al. Dec 2019 A1
20190385048 Cassidy et al. Dec 2019 A1
20190385360 Yang et al. Dec 2019 A1
20200004259 Gulino et al. Jan 2020 A1
20200004351 Marchant et al. Jan 2020 A1
20200012936 Lee et al. Jan 2020 A1
20200017117 Milton Jan 2020 A1
20200025931 Liang et al. Jan 2020 A1
20200026282 Choe et al. Jan 2020 A1
20200026283 Barnes et al. Jan 2020 A1
20200026992 Zhang et al. Jan 2020 A1
20200027210 Haemel et al. Jan 2020 A1
20200033858 Xiao Jan 2020 A1
20200033865 Mellinger et al. Jan 2020 A1
20200034665 Ghanta et al. Jan 2020 A1
20200034710 Sidhu et al. Jan 2020 A1
20200036948 Song Jan 2020 A1
20200039520 Misu et al. Feb 2020 A1
20200051550 Baker Feb 2020 A1
20200060757 Ben-Haim et al. Feb 2020 A1
20200065711 Clément et al. Feb 2020 A1
20200065879 Hu et al. Feb 2020 A1
20200069973 Lou et al. Mar 2020 A1
20200073385 Jobanputra et al. Mar 2020 A1
20200074230 Englard et al. Mar 2020 A1
20200086880 Poeppel et al. Mar 2020 A1
20200089243 Poeppel et al. Mar 2020 A1
20200089969 Lakshmi et al. Mar 2020 A1
20200090056 Singhal et al. Mar 2020 A1
20200097841 Petousis et al. Mar 2020 A1
20200098095 Borcs et al. Mar 2020 A1
20200103894 Cella et al. Apr 2020 A1
20200104705 Bhowmick et al. Apr 2020 A1
20200110416 Hong et al. Apr 2020 A1
20200117180 Cella et al. Apr 2020 A1
20200117889 Laput et al. Apr 2020 A1
20200117916 Liu Apr 2020 A1
20200117917 Yoo Apr 2020 A1
20200118035 Asawa et al. Apr 2020 A1
20200125844 She et al. Apr 2020 A1
20200125845 Hess et al. Apr 2020 A1
20200126129 Lkhamsuren et al. Apr 2020 A1
20200134427 Oh et al. Apr 2020 A1
20200134461 Chai et al. Apr 2020 A1
20200134466 Weintraub et al. Apr 2020 A1
20200134848 El-Khamy et al. Apr 2020 A1
20200143231 Fusi et al. May 2020 A1
20200143279 West et al. May 2020 A1
20200148201 King et al. May 2020 A1
20200149898 Felip et al. May 2020 A1
20200151201 Chandrasekhar et al. May 2020 A1
20200151619 Mopur et al. May 2020 A1
20200151692 Gao et al. May 2020 A1
20200158822 Owens et al. May 2020 A1
20200158869 Amirloo et al. May 2020 A1
20200159225 Zeng et al. May 2020 A1
20200160064 Wang et al. May 2020 A1
20200160104 Urtasun et al. May 2020 A1
20200160117 Urtasun et al. May 2020 A1
20200160178 Kar et al. May 2020 A1
20200160532 Urtasun et al. May 2020 A1
20200160558 Urtasun et al. May 2020 A1
20200160559 Urtasun et al. May 2020 A1
20200160598 Manivasagam et al. May 2020 A1
20200162489 Bar-Nahum et al. May 2020 A1
20200167438 Herring May 2020 A1
20200167554 Wang et al. May 2020 A1
20200174481 Van Heukelom et al. Jun 2020 A1
20200175326 Shen et al. Jun 2020 A1
20200175354 Volodarskiy et al. Jun 2020 A1
20200175371 Kursun Jun 2020 A1
20200175401 Shen Jun 2020 A1
20200183482 Sebot et al. Jun 2020 A1
20200184250 Oko Jun 2020 A1
20200184333 Oh Jun 2020 A1
20200192389 ReMine et al. Jun 2020 A1
20200193313 Ghanta et al. Jun 2020 A1
20200193328 Guestrin et al. Jun 2020 A1
20200202136 Shrestha et al. Jun 2020 A1
20200202196 Guo et al. Jun 2020 A1
20200205697 Zheng et al. Jul 2020 A1
20200209857 Djuric et al. Jul 2020 A1
20200209867 Valois et al. Jul 2020 A1
20200209874 Chen et al. Jul 2020 A1
20200210717 Hou et al. Jul 2020 A1
20200210769 Hou et al. Jul 2020 A1
20200210777 Valois et al. Jul 2020 A1
20200211154 Ng et al. Jul 2020 A1
20200216064 du Toit et al. Jul 2020 A1
20200218722 Mai et al. Jul 2020 A1
20200218979 Kwon et al. Jul 2020 A1
20200223434 Campos et al. Jul 2020 A1
20200225758 Tang et al. Jul 2020 A1
20200226377 Campos et al. Jul 2020 A1
20200226430 Ahuja et al. Jul 2020 A1
20200238998 Dasalukunte et al. Jul 2020 A1
20200242381 Chao et al. Jul 2020 A1
20200242408 Kim et al. Jul 2020 A1
20200242511 Kale et al. Jul 2020 A1
20200245869 Sivan et al. Aug 2020 A1
20200247433 Scharfenberger et al. Aug 2020 A1
20200249685 Elluswamy et al. Aug 2020 A1
20200250456 Wang et al. Aug 2020 A1
20200250515 Rifkin et al. Aug 2020 A1
20200250874 Assouline et al. Aug 2020 A1
20200257301 Weiser et al. Aug 2020 A1
20200257306 Nisenzon Aug 2020 A1
20200258057 Farahat et al. Aug 2020 A1
20200265247 Musk et al. Aug 2020 A1
20200272160 Djuric et al. Aug 2020 A1
20200272162 Hasselgren et al. Aug 2020 A1
20200272859 Iashyn et al. Aug 2020 A1
20200273231 Schied et al. Aug 2020 A1
20200279354 Klaiman Sep 2020 A1
20200279364 Sarkisian et al. Sep 2020 A1
20200279371 Wenzel et al. Sep 2020 A1
20200285464 Brebner Sep 2020 A1
20200286256 Houts et al. Sep 2020 A1
20200293786 Jia et al. Sep 2020 A1
20200293796 Sajjadi et al. Sep 2020 A1
20200293828 Wang et al. Sep 2020 A1
20200293905 Huang et al. Sep 2020 A1
20200294162 Shah Sep 2020 A1
20200294257 Yoo et al. Sep 2020 A1
20200294310 Lee et al. Sep 2020 A1
20200297237 Tamersoy et al. Sep 2020 A1
20200298891 Liang et al. Sep 2020 A1
20200301799 Manivasagam et al. Sep 2020 A1
20200302276 Yang et al. Sep 2020 A1
20200302291 Hong Sep 2020 A1
20200302299 Nagel et al. Sep 2020 A1
20200302627 Duggal et al. Sep 2020 A1
20200302662 Homayounfar et al. Sep 2020 A1
20200304441 Bradley et al. Sep 2020 A1
20200306640 Kolen et al. Oct 2020 A1
20200307562 Ghafarianzadeh et al. Oct 2020 A1
20200307563 Ghafarianzadeh et al. Oct 2020 A1
20200309536 Omari et al. Oct 2020 A1
20200309923 Bhaskaran et al. Oct 2020 A1
20200310442 Halder et al. Oct 2020 A1
20200311601 Robinson et al. Oct 2020 A1
20200312003 Borovikov et al. Oct 2020 A1
20200315708 Mosnier et al. Oct 2020 A1
20200320132 Neumann Oct 2020 A1
20200324073 Rajan et al. Oct 2020 A1
20200327192 Hackman et al. Oct 2020 A1
20200327443 Van et al. Oct 2020 A1
20200327449 Tiwari et al. Oct 2020 A1
20200327662 Liu et al. Oct 2020 A1
20200327667 Arbel et al. Oct 2020 A1
20200331476 Chen et al. Oct 2020 A1
20200334416 Vianu et al. Oct 2020 A1
20200334495 Al et al. Oct 2020 A1
20200334501 Lin et al. Oct 2020 A1
20200334551 Javidi et al. Oct 2020 A1
20200334574 Ishida Oct 2020 A1
20200337648 Saripalli et al. Oct 2020 A1
20200341466 Pham et al. Oct 2020 A1
20200342350 Madar et al. Oct 2020 A1
20200342548 Mazed et al. Oct 2020 A1
20200342652 Rowell et al. Oct 2020 A1
20200348909 Das Sarma et al. Nov 2020 A1
20200350063 Thornton et al. Nov 2020 A1
20200351438 Dewhurst et al. Nov 2020 A1
20200356107 Wells Nov 2020 A1
20200356790 Jaipuria et al. Nov 2020 A1
20200356864 Neumann Nov 2020 A1
20200356905 Luk et al. Nov 2020 A1
20200361083 Mousavian et al. Nov 2020 A1
20200361485 Zhu et al. Nov 2020 A1
20200364481 Kornienko et al. Nov 2020 A1
20200364508 Gurel et al. Nov 2020 A1
20200364540 Elsayed et al. Nov 2020 A1
20200364746 Longano et al. Nov 2020 A1
20200364953 Simoudis Nov 2020 A1
20200372362 Kim Nov 2020 A1
20200372402 Kursun et al. Nov 2020 A1
20200380362 Cao et al. Dec 2020 A1
20200380383 Kwong et al. Dec 2020 A1
20200393841 Frisbie et al. Dec 2020 A1
20200394421 Yu et al. Dec 2020 A1
20200394457 Brady Dec 2020 A1
20200394495 Moudgill et al. Dec 2020 A1
20200394813 Theverapperuma et al. Dec 2020 A1
20200396394 Zlokolica et al. Dec 2020 A1
20200398855 Thompson Dec 2020 A1
20200401850 Bazarsky et al. Dec 2020 A1
20200401886 Deng et al. Dec 2020 A1
20200402155 Kurian et al. Dec 2020 A1
20200402226 Peng Dec 2020 A1
20200410012 Moon et al. Dec 2020 A1
20200410224 Goel Dec 2020 A1
20200410254 Pham et al. Dec 2020 A1
20200410288 Capota et al. Dec 2020 A1
20200410751 Omari et al. Dec 2020 A1
20210004014 Sivakumar Jan 2021 A1
20210004580 Sundararaman et al. Jan 2021 A1
20210004611 Garimella et al. Jan 2021 A1
20210004663 Park et al. Jan 2021 A1
20210006835 Slattery et al. Jan 2021 A1
20210011908 Hayes et al. Jan 2021 A1
20210012116 Urtasun et al. Jan 2021 A1
20210012210 Sikka et al. Jan 2021 A1
20210012230 Hayes et al. Jan 2021 A1
20210012239 Arzani et al. Jan 2021 A1
20210015240 Elfakhri et al. Jan 2021 A1
20210019215 Neeter Jan 2021 A1
20210026360 Luo Jan 2021 A1
20210027112 Brewington et al. Jan 2021 A1
20210027117 McGavran et al. Jan 2021 A1
20210030276 Li et al. Feb 2021 A1
20210034921 Pinkovich et al. Feb 2021 A1
20210042575 Firner Feb 2021 A1
20210042928 Takeda et al. Feb 2021 A1
20210046954 Haynes Feb 2021 A1
20210049378 Gautam et al. Feb 2021 A1
20210049455 Kursun Feb 2021 A1
20210049456 Kursun Feb 2021 A1
20210049548 Grisz et al. Feb 2021 A1
20210049700 Nguyen et al. Feb 2021 A1
20210056114 Price et al. Feb 2021 A1
20210056306 Hu et al. Feb 2021 A1
20210056317 Golov Feb 2021 A1
20210056420 Konishi et al. Feb 2021 A1
20210056701 Vranceanu et al. Feb 2021 A1
20210350210 Gong Nov 2021 A1
20210378554 Au et al. Dec 2021 A1
20220079472 Au et al. Mar 2022 A1
Foreign Referenced Citations (270)
Number Date Country
2019261735 Jun 2020 AU
2019201716 Oct 2020 AU
2021101172 May 2021 AU
110599537 Dec 2010 CN
102737236 Oct 2012 CN
103366339 Oct 2013 CN
104835114 Aug 2015 CN
103236037 May 2016 CN
103500322 Aug 2016 CN
106419893 Feb 2017 CN
106504253 Mar 2017 CN
107031600 Aug 2017 CN
107169421 Sep 2017 CN
107507134 Dec 2017 CN
107885214 Apr 2018 CN
108122234 Jun 2018 CN
107133943 Jul 2018 CN
107368926 Jul 2018 CN
105318888 Aug 2018 CN
108491889 Sep 2018 CN
108647591 Oct 2018 CN
108710865 Oct 2018 CN
105550701 Nov 2018 CN
108764185 Nov 2018 CN
108845574 Nov 2018 CN
108898177 Nov 2018 CN
109086867 Dec 2018 CN
107103113 Jan 2019 CN
109215067 Jan 2019 CN
109359731 Feb 2019 CN
109389207 Feb 2019 CN
109389552 Feb 2019 CN
106779060 Mar 2019 CN
109579856 Apr 2019 CN
109615073 Apr 2019 CN
106156754 May 2019 CN
106598226 May 2019 CN
106650922 May 2019 CN
109791626 May 2019 CN
109901595 Jun 2019 CN
109902732 Jun 2019 CN
109934163 Jun 2019 CN
109948428 Jun 2019 CN
109949257 Jun 2019 CN
109951710 Jun 2019 CN
109975308 Jul 2019 CN
109978132 Jul 2019 CN
109978161 Jul 2019 CN
110060202 Jul 2019 CN
110069071 Jul 2019 CN
110084086 Aug 2019 CN
110096937 Aug 2019 CN
110111340 Aug 2019 CN
110135485 Aug 2019 CN
110197270 Sep 2019 CN
110310264 Oct 2019 CN
110321965 Oct 2019 CN
110334801 Oct 2019 CN
110399875 Nov 2019 CN
110414362 Nov 2019 CN
110426051 Nov 2019 CN
110473173 Nov 2019 CN
110516665 Nov 2019 CN
110543837 Dec 2019 CN
110569899 Dec 2019 CN
110599864 Dec 2019 CN
110619282 Dec 2019 CN
110619283 Dec 2019 CN
110619330 Dec 2019 CN
110659628 Jan 2020 CN
110688992 Jan 2020 CN
107742311 Feb 2020 CN
110751265 Feb 2020 CN
110751280 Feb 2020 CN
110826566 Feb 2020 CN
107451659 Apr 2020 CN
108111873 Apr 2020 CN
110956185 Apr 2020 CN
110966991 Apr 2020 CN
111027549 Apr 2020 CN
111027575 Apr 2020 CN
111047225 Apr 2020 CN
111126453 May 2020 CN
111158355 May 2020 CN
107729998 Jun 2020 CN
108549934 Jun 2020 CN
111275129 Jun 2020 CN
111275618 Jun 2020 CN
111326023 Jun 2020 CN
111428943 Jul 2020 CN
111444821 Jul 2020 CN
111445420 Jul 2020 CN
111461052 Jul 2020 CN
111461053 Jul 2020 CN
111461110 Jul 2020 CN
110225341 Aug 2020 CN
111307162 Aug 2020 CN
111488770 Aug 2020 CN
111507952 Aug 2020 CN
111539514 Aug 2020 CN
111565318 Aug 2020 CN
111582216 Aug 2020 CN
111598095 Aug 2020 CN
108229526 Sep 2020 CN
111693972 Sep 2020 CN
106558058 Oct 2020 CN
107169560 Oct 2020 CN
107622258 Oct 2020 CN
111767801 Oct 2020 CN
111768002 Oct 2020 CN
111783545 Oct 2020 CN
111783971 Oct 2020 CN
111797657 Oct 2020 CN
111814623 Oct 2020 CN
111814902 Oct 2020 CN
111860499 Oct 2020 CN
111881856 Nov 2020 CN
111882579 Nov 2020 CN
111897639 Nov 2020 CN
111898507 Nov 2020 CN
111898523 Nov 2020 CN
111899227 Nov 2020 CN
112101175 Dec 2020 CN
112101562 Dec 2020 CN
112115953 Dec 2020 CN
112132261 Dec 2020 CN
111062973 Jan 2021 CN
111275080 Jan 2021 CN
112183739 Jan 2021 CN
112232497 Jan 2021 CN
112288658 Jan 2021 CN
112308095 Feb 2021 CN
112308799 Feb 2021 CN
112313663 Feb 2021 CN
112329552 Feb 2021 CN
112348783 Feb 2021 CN
111899245 Mar 2021 CN
112463078 Mar 2021 CN
112488291 Mar 2021 CN
112686384 Apr 2021 CN
202017102235 May 2017 DE
202017102238 May 2017 DE
102017116017 Jan 2019 DE
102018130821 Jun 2020 DE
102019008316 Aug 2020 DE
1 437 004 Jul 2004 EP
1215626 Sep 2008 EP
2 179 589 Apr 2010 EP
2 465 093 Jun 2012 EP
2228666 Sep 2012 EP
2 567 347 Mar 2013 EP
2420408 May 2013 EP
2 618 559 Jul 2013 EP
2 723 069 Apr 2014 EP
2741253 Jun 2014 EP
3115772 Jan 2017 EP
3 198 557 Dec 2017 EP
3285485 Feb 2018 EP
3 295 424 Mar 2018 EP
3 320 486 May 2018 EP
3 185 758 Oct 2018 EP
2863633 Feb 2019 EP
3113080 May 2019 EP
3 494 514 Jun 2019 EP
3525132 Aug 2019 EP
3531689 Aug 2019 EP
3 535 692 Sep 2019 EP
3537340 Sep 2019 EP
3543917 Sep 2019 EP
3 598 874 Jan 2020 EP
3608840 Feb 2020 EP
3 616 119 Mar 2020 EP
3 657 387 May 2020 EP
2396750 Jun 2020 EP
3664020 Jun 2020 EP
3 673 233 Jul 2020 EP
3690712 Aug 2020 EP
3690742 Aug 2020 EP
3 718 048 Oct 2020 EP
3 729 002 Oct 2020 EP
3722992 Oct 2020 EP
3 732 618 Nov 2020 EP
3690730 Nov 2020 EP
3739486 Nov 2020 EP
3501897 Dec 2020 EP
3751455 Dec 2020 EP
3 766 023 Jan 2021 EP
3783527 Feb 2021 EP
2402572 Aug 2005 GB
2548087 Sep 2017 GB
2577485 Apr 2020 GB
2517270 Jun 2020 GB
2578262 Aug 1998 JP
3941252 Jul 2007 JP
4282583 Jun 2009 JP
4300098 Jul 2009 JP
2015004922 Jan 2015 JP
5863536 Feb 2016 JP
6044134 Dec 2016 JP
6525707 Jun 2019 JP
2019101535 Jun 2019 JP
2020101927 Jul 2020 JP
2020173744 Oct 2020 JP
100326702 Feb 2002 KR
101082878 Nov 2011 KR
101738422 May 2017 KR
101969864 Apr 2019 KR
101996167 Jul 2019 KR
102022388 Aug 2019 KR
102043143 Nov 2019 KR
102095335 Mar 2020 KR
102097120 Apr 2020 KR
1020200085490 Jul 2020 KR
102189262 Dec 2020 KR
1020200142266 Dec 2020 KR
200630819 Sep 2006 TW
I294089 Mar 2008 TW
I306207 Feb 2009 TW
WO 02052835 Jul 2002 WO
WO 15134900 Sep 2015 WO
WO 16032398 Mar 2016 WO
WO 16048108 Mar 2016 WO
WO 16207875 Dec 2016 WO
WO 17158622 Sep 2017 WO
WO 17214507 Dec 2017 WO
WO 19005547 Jan 2019 WO
WO 19067695 Apr 2019 WO
WO 19089339 May 2019 WO
WO 19092456 May 2019 WO
WO 19099622 May 2019 WO
WO 19122952 Jun 2019 WO
WO 19125191 Jun 2019 WO
WO 19126755 Jun 2019 WO
WO 19144575 Aug 2019 WO
WO 19182782 Sep 2019 WO
WO 19191578 Oct 2019 WO
WO 19216938 Nov 2019 WO
WO 19220436 Nov 2019 WO
WO 20006154 Jan 2020 WO
WO 20012756 Jan 2020 WO
WO 20025696 Feb 2020 WO
WO 20034663 Feb 2020 WO
WO 20056157 Mar 2020 WO
WO 20076356 Apr 2020 WO
WO 20097221 May 2020 WO
WO 20101246 May 2020 WO
WO 20120050 Jun 2020 WO
WO 20121973 Jun 2020 WO
WO 20131140 Jun 2020 WO
WO 20139181 Jul 2020 WO
WO 20139355 Jul 2020 WO
WO 20139357 Jul 2020 WO
WO 20142193 Jul 2020 WO
WO 20146445 Jul 2020 WO
WO 20151329 Jul 2020 WO
WO 20157761 Aug 2020 WO
WO 20163455 Aug 2020 WO
WO 20167667 Aug 2020 WO
WO 20174262 Sep 2020 WO
WO 20177583 Sep 2020 WO
WO 20185233 Sep 2020 WO
WO 20185234 Sep 2020 WO
WO 20195658 Oct 2020 WO
WO 20198189 Oct 2020 WO
WO 20198779 Oct 2020 WO
WO 20205597 Oct 2020 WO
WO 20221200 Nov 2020 WO
WO 20240284 Dec 2020 WO
WO 20260020 Dec 2020 WO
WO 20264010 Dec 2020 WO
Non-Patent Literature Citations (10)
Entry
Machine Translation of Chinese Patent Application CN 115861767 A, filed 2022. (Year: 2022).
‘Quantized Memory-Augmented Neural Networks’ by Park et al., from the Thirty-Second AAAI Conference on Artificial Intelligence ( AAAI-18). (Year: 2018).
‘Going Deeper with Embedded FPGA Platform for Convolutional Neural Network’by Qiu et al., FPGA'16, Feb. 21-23, 2016. ( Year: 2016).
‘Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference’by Jacob et al., Dec. 15, 2017. ( Year: 2017).
‘Filter Shaping for Convolutional Neural Networks’by Li et al., published as a conference paper at ICLR 2017. ( Year: 2017).
‘A Neural Network Implementation on an Inexpensive Eight Bit Microcontroller’by Cotton et al., copyright 2008, IEEE. (Year: 2008).
‘A Neural Network Implementation on Embedded Systems’by Nicholas Jay Cotton, Aug. 9, 2010. (Year: 2010).
‘Moving Convolutional Neural Networks to Embedded Systems: the AlexNet and VGG-16 case’by Alippi et al., Apr. 2018. (Year 2018).
‘Software-Hardware Codesign for Efficient Neural Network Acceleration’by Guo, copyright 2017, IEEE. ( Year: 2017).
‘Pact: Parameterized Clipping Activation for Quantized Neural Networks’by Choi, Jul. 17, 2018. (Year: 2018.
Related Publications (1)
Number Date Country
20230237331 A1 Jul 2023 US
Provisional Applications (1)
Number Date Country
62726396 Sep 2018 US
Continuations (1)
Number Date Country
Parent 16559483 Sep 2019 US
Child 18156628 US