Neural networks for embedded devices

Description

BACKGROUND

This disclosure generally relates to the deployment of deep neural networks for embedded or internet-of-things (JOT) devices.

Neural networks are often used to perform various tasks, particularly for image analysis, such as object recognition, facial recognition, or segmentation. In more typical implementations, such networks are implemented on relatively complex processors, which may include operations with a high level of precision and with significant bit-length, such as 32-bit floating point operations to multiply and sum data at various layers of a neural network. These processors may be too complex or expensive for use in inexpensive devices, such as IOT devices that may include inexpensive processors having a more limited bit-length, preventing such IOT devices from effectively implementing neural networks. In addition to reduced-bit processing, these devices may also implement reduced-bit storage, further limiting the working capacity of such devices to successfully implement neural network structures.

SUMMARY

A neural network architecture is used that reduces the processing load of implementing the neural network. This network architecture may thus be used for reduced-bit processing devices. The architecture may limit the number of bits used for processing and reduce processing to prevent data overflow at individual calculations of the neural network. To implement this architecture, the number of bits used to represent inputs at levels of the network and the related filter masks may also be modified to ensure the number of bits of the output does not overflow the resulting capacity of the reduced-bit processor. To additionally reduce the load for such a network, the network may implement a “starconv” structure that permits the incorporation of nearby nodes in a layer to balance processing requirements and permit the network to learn from context of other nodes.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the star-shaped convolution filter (star-conv), according to one embodiment.

FIG. 2 illustrates the star-shuffle neural network block, according to one embodiment.

FIG. 3 illustrates an example StarNet deep neural network architecture, according to one embodiment.

FIG. 4 illustrates example equations for quantization and dequantization, according to one embodiment.

FIG. 5 illustrates example equations for determining quantization parameters, according to one embodiment.

FIG. 6 illustrates example equations for adjacent quantization equations, according to one embodiment.

FIG. 7 illustrates an example process for generating a neural network structure including input layers and filters, according to one embodiment.

The figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.

DETAILED DESCRIPTION

Motivation

Computer implementations of deep neural networks (DNNs) commonly use floating-point arithmetic. As used herein, a deep neural network is a computer model that generates a set of outputs based on a set of inputs using a plurality of processing layers between the inputs and outputs. These processing layers may be “hidden” in the sense that the processing layers are not directly exposed during use, and represent arithmetic processes that together generate the set of outputs from the set of inputs. Individual nodes in these layers are typically connected by weights representing a weight of a value in a prior node that affects a current node. As an example, to process an image, the pixels of an image may be represented as an input layer. A subsequent layer may apply various filters, such as a convolutional filter, to a window of pixels in the input layer to generate values for that layer. This is often performed with floating-point arithmetic to increase precision in representing values within the network. However, low-cost and low-power computer processors (such as those used in internet-of-things devices) often do not provide support for floating-point arithmetic, and integer arithmetic must be used instead.

Further, while computer implementations of neural networks commonly use 32-bit arithmetic, low-power computer processors often run most efficiently (i.e. lowest power usage and/or highest throughput) using 8-bit arithmetic.

This presents a need for effective neural networks for use in lower-bit arithmetic and storage (e.g., 8-bit arithmetic and 8-bit storage) that is not well-addressed by existing frameworks.

Design Constraints

When the goal is to get the best tradeoff of speed, energy-efficiency, and accuracy, the optimal DNN architecture (sometimes called a “topology” or “neural structure”) varies depending on the processing platform that it will be deployed on.

This disclosure relates to implementing neural network architectures on a reduced-bit architecture, which may be included reduced-bit (e.g., 8-bit) arithmetic and storage.

As one example architecture, a processing platform is a system-on-chip (SOC) that has multiple types of processing cores. Some of the cores on the SOC are general-purpose central processing unit (CPU) cores that support 8-, 16-, and 32-bit computations. But, these CPU cores are relatively slow and comparatively energy-intensive. However, the SOC also has specialized digital signal processing (DSP) cores that enable fast, energy-efficient, and highly-parallel computations. These DSP cores typically only support efficient computation of 8-bit signed integer computations. The network architecture discussed herein may be implemented on such DSP cores while rarely (or never) using the CPU cores.

The main data type supported in the DSP cores is the 8-bit signed integer.

Some processors support what is called “saturating arithmetic.” In saturating arithmetic, for 8-bit signed integers, if variables X and Y are of type signed int, the maximum value of X+Y is 127. For example, if X=120 and Y=120, a saturating addition of X+Y would give the result of 127. However, with non-saturating arithmetic, X+Y typically overflows such that the result of X+Y would be −16 (i.e. negative 16). The DNN architectures discussed herein are implemented with processors using non-saturating arithmetic. Thus, overflow happens when the result exceeds the maximum value or minimum value that can be represented by the number of bits on the register that are used to perform arithmetic operations.

However, to avoid generating incorrect numerical results, the arithmetic should not overflow. This is particularly challenging when using 8-bit storage and 8-bit arithmetic. For example, multiplying large two 8-bit numbers—e.g., 125 and 126—the correct result is 15750, but the largest value representable in a signed 8-bit number is +127.

Division is an expensive arithmetic operation (requiring more computational cycles than multiplication or additions). Accordingly, effective use of an 8-bit architecture rarely or never uses division.

The bit-shift operator may be used. Bit-shift requires fewer computational cycles than division. For division by powers-of-two, the bit-shift operator can be used in place of division to produce the same results.

Elementary Components of StarNet

A family of neural network architectures, generally termed “StarNet,” is disclosed to effectively implement neural networks on such reduced-bit architectures. In one embodiment, the components and processes described below may refer to or may be performed by an online system in communication with devices including reduced-bit architectures, such as internet-of-thing (IoT) devices.

To avoid overflows while performing 8-bit computations, StarNet applies the following techniques.

Neural networks commonly use convolution filters that each perform thousands of calculations (e.g. a 3×3×512 filter has 4068 elements and performs 4068 multiply-accumulate operations). The result of a 4068-element convolution will overflow with many possible input activations when computed using 8-bit arithmetic. Consider the case where the input activations consists of all ones and the filter (e.g., weights for combining prior layer values) consists of all ones (i.e. every element of the filter has a numerical value of one). The output of a convolution calculation in this example is the number 4068, which is much too large to be represented in 8-bit arithmetic and therefore would overflow and provide incorrect numerical results.

To effectively implement a neural network in reduced-bit architecture, the DNN is structured to have fewer elements per filter, such as 32 elements per filter. In one embodiment, the StarNet DNN architecture for 8-bit arithmetic and 8-bit storage has a maximum of 32 elements per filter.

Even when using a 32-element filter, 8-bit arithmetic can still overflow. For example, consider the case where the input activations consists of all ones. And, the filter consists of all fives (i.e. every element of the filter has a numerical value of five). In this case, the correct output of the convolution calculation is 160, but again the maximum representable value in an 8-bit signed integer is 127, so this overflows.

To avoid overflow, the network architecture may use various approaches to reduce the possible filter outputs within the range of the output values. The particular approach may vary, including within a given network model, based on the number of elements in the filter. In one implementation, linear quantization is used to bin floating-point values of filters and activations into a low-bit width integer representation. In one linear quantization scheme, the range of values of the linear bins is determined by analyzing the maximum and minimum numerical values that are observed in tensors of the neural network, looking at the dimensions of the filters, and then selecting maximum and minimum values for the bins such that the output cannot overflow.

In the case of a 32-element filter, input activations can be quantized to 2 bits plus the sign bit; we abbreviate this to (2+s). And, weights can be quantized to (1+s). So, the maximum value of an activation is 3 (which is the largest number representable in (2+s) arithmetic), and the maximum value of a weight is 1 (which is the largest number representable in (1+s) arithmetic). So, the largest possible output value is 32*3*1=96, which is smaller than 127 and therefore does not overflow during 8-bit arithmetic. Since values are stored in 8-bits in this example the storage of the activations and weights uses a subset of those 8 bits.

In the case of a 16-element filter, input activations are represented as (3+s), with a maximum value of 7, and weights are represented as (1+s), with a maximum value of 1. The maximum output value of this convolution is 16*7*1=112, which is less than 127 and therefore does not overflow.

In the case of an 8-element filter, input activations are represented as (2+s), with a maximum value of 3, and weights are represented as (2+s), with a maximum value of 3. The maximum output value of this convolution is 8*3*3=72, which is less than 127 and therefore does not overflow.

The 32-, 16-, and 8-element filters discussed so far are 1×1×Channels filters, where 32, 16, or 8 is the number of channels in the filter. Note that the number of channels in the input activations can be larger than the number of channels in a filter. This is accomplished using what are called group convolutions. Group convolutions have a hyperparameter called group-length. If the input activations have 1024 channels, and group-length is set to 32, then each filter will span a 32-channel subset of the 1024 input channels.

Convolutional neural networks commonly have some layers with filters of size 1×1×Channels and other layers with filters of size 3×3×Channels. In a 3×3 filter with a group-length of 1, there are 9 elements. A good representation of a 9-element convolution using (unsigned, non-saturating) 8-bit arithmetic is to represent weights as (2+s) and input activations as (2+s). In this configuration, the maximum output is 3*3*9=81.

However, with the goal of minimizing the number of elements (and thus being able to represent filters and activations with more bits), the following is a way to perform a convolution with a 2D spatial resolution while using fewer elements. Rather than a 3×3 filter, the idea is to use a “star-shaped” filter. (See FIG. 1.) Here, with respect to a pixel at location (x,y), the filter has weights that correspond with (x,y) and also with the pixels to the immediate top, bottom, left, and right of (x,y). However, unlike a traditional 3×3 convolution, a star-shaped filter may not have weights or zero out weights that correspond with the upper-left, upper-right, lower-left, and lower-right diagonal elements with respect to location (x,y). Thus, in the example shown in FIG. 1, the star-shaped filter has only 5 elements. With only 5 elements, the weights can be represented as (3+s) and the activations can be represented as (2+s). However, the star-shaped filter can generally refer to non-rectangular filters in which only a subset of elements in the filter have non-zero values or are accounted for in the neural network structure. While the example shown in FIG. 1 illustrates a star-shaped filter with a single channel, in other embodiments, each position of the star-shaped filter may be associated with additional elements along the depth of the filter that correspond to one or more channels. Henceforth, this star-shaped filter will be known as “star-conv,” and 1×1×Channels filter will be known as “1×1-conv.”

Note that all of the aforementioned filters have a value of group-length that is greater than 1. When a series of convolution layers have a group-length of greater than 1, what several independent neural networks may be formed that do not share data for several layers in a row because subsets of channels are processed independently for several layers. This leads to a reduction in representational power. To address this, StarNet adopts the “shuffle” layer, which interleaves the ordering of channels to enable communication across what would otherwise be a collection of independent neural networks. For example, a shuffle layer may receive a set of input values that are arranged with respect to a plurality of channels. At the shuffle layer, the neural network structure may interleave the ordering of the channels to increase representational power.

The Star-Shuffle Block

The StarNet family of DNN architectures uses a recurring block called the star-shuffle block. This block consists of the following ordering of neural network layers: {1×1-conv, relu, star-conv, relu, shuffle}.

The design of the star-shuffle block enables it to see a 2d spatial resolution (using star-conv), to mix information across nearby channels (using 1×1-conv with group-length of no more than 32), and to combine information across far-away channels (using the shuffle layer). All of this is accomplished while performing all computation using non-saturating signed 8-bit arithmetic.

Quantization Mechanism

To quantize a number from a generic 8-bit (7+s) representation to a lower-bit representation, e.g. (2+s), bins are generated as described in the section “Quantization Binning Process” below. The quantization method has a preprocessing step and a runtime step, which are described in the following.

The preprocessing step generates a set of bins that are used during the runtime step of quantization. This set of bins can be described using “quantization parameters,” which describe the bins. Each layer in the neural network has two sets of quantization parameters: “activation quantization parameters” which describe the binning of input and output values of the layer, and “layer quantization parameters” which describe the binning of the parameters of the layer itself. The parameters of a particular layer may refer to the weights of filters associated with the particular layer.

Each set of bins has two processes associated with it. One is called the “quantization” process, where generic 8-bit (7+s) representations are processed into a lower-bit (2+s) representation. The other process, called the “dequantization” process, is the inverse, where the lower-bit (2+s) representation is transformed back into the 8-bit (7+s) representations. Each binning process describes its own mechanism for quantization and dequantization.

To finalize the preprocessing step, the layer parameters are binned according to the quantization process using the layer quantization parameters. These are referred to as “quantized layer parameters.” For example, a filter with a set of trained weights V_R,weightsmay be quantized using the equation:

$V_{Q, weights} = \frac{V_{R, weights}}{A_{w e i g h t s}} - B_{weights}$

where A_weightsand B_weightsare the layer quantization parameters, and V_Q,weightsare the quantized layer parameters.

During runtime, each layer first applies the quantization binning process using the activation quantization parameters to its input if the input is not quantized. For example, a layer with a set of input values V_R,inputmay be quantized to a quantized input using the equation:

$V_{Q, input} = \frac{V_{R, input}}{A_{input}} - B_{input}$

where A_inputand B_inputare the activation quantization parameters.

The parameters associated with this quantization binning process are attached to the input, and the input is fed into the layer. This layer then applies its standard operation using the quantized layer parameters. For example, using the example above, a quantized output may be generated by the equation:

Quantized Output=f_V_Q,weights(V_Q,input)

where f_VQ,weights(⋅) denotes an operation on the quantized input using the quantized layer parameters. For example, this may be a dot product between the filter and the quantized input.

Next, the layer applies the dequantization process using the layer quantization parameters. For example, the quantized output may first be dequantized to an output:

V_R,output=A_weights·(Quantized Output+B_weights).

Then, the dequantization process uses the activation quantization parameters that are attached to the original input to dequantize the output. For example, the dequantization of the output may be given by:

Dequantized Output=A_input·(V_R,output+B_input)

Quantization Binning Process

This quantization binning approach creates a set of bins implicitly based on a quantization equation and its corresponding dequantization equation, which are described by two parameters, “A” and “B” as shown in FIG. 4.

To solve for the activation quantization parameters, a dataset is passed through the neural network one example at a time and a set of output values is collected for each layer in the neural network. For each set of output values associated with a layer, the minimum and maximum output values are identified. The minimum and maximum output values are plugged into the dequantization equation, along with the selected bit-width, to produce the system of equations pictured in FIG. 5. This system of equations is solved to find the activation quantization parameters associated with each layer.

This same process occurs with the parameters of the StarNet instance being quantized. Each layer has its minimum and maximum parameter passed into the quantization equation, along with the selected bit-width, to produce the system of equations pictured in FIG. 5. This system of equations is solved to find the layer quantization parameters associated with each layer.

Optimizations can be applied to the quantization method above. In particular, we describe the “quantization collapsing” mechanism by which quantization equations for adjacent layers and activations can be collapsed into a single equation. The mathematical transformation is shown in FIG. 6, where the quantization operation of adjacent bins is collapsed, and the corresponding dequantization operations are collapsed as well. This reduces the number of operations between each quantization and dequantization by a factor of two. However, we also can leave out both of them, using only the initial quantization equation and final dequantization equation.

In various embodiments, by using maximum values that correspond to the values representable by quantized representations (e.g., a maximum value of 7 for (3+) representation), calculations such as division can be performed more often with a bit-shift operator, reducing computational complexity and time in the reduced-bit representation and execution.

StarNet Neural Network Family

Various DNNs can be formed using the star-block. As used herein, a StarNet is a DNN containing one or more star-block modules. In the following, one example implementation of a StarNet neural network architecture is described. In this example, called “StarNet-A,” the DNN is tasked with ingesting an RGB image and classifying the image into one of 1024 categories. See FIG. 3 for a summary of the StarNet-A DNN architecture that is described in the following. With the exception of the first convolution layer in StarNet-A, all layers of StarNet-A can be implemented using only 8-bit arithmetic and 8-bit storage.

The first layer of StarNet-A is a star-conv layer, which is applied to an input image. While the inputs to most layers can be quantized without losing accuracy, one exception to this is that quantizing the input image does damage accuracy. Therefore, in the first layer is computed with 8-bit inputs, 16-bit arithmetic, and 16-bit temporary storage for activations. In one implementation, this first layer is computed on the CPU of an IOT system-on-chip (SOC), while all subsequent layers of StarNet-A are computed on an energy-efficient accelerator that is on the same SOC. A rectified linear unit (relu) follows the first star-conv layer, and the first star-conv layer has a stride of 2.

Next, StarNet-A has a series of 2 star-block modules, the details of which are described in FIG. 3. These star-block modules are followed by downsampling operation which is implemented using max-pooling with a stride of 2.

After that, there are 6 more star-block modules, a max-pool, 12 more star-block modules, a max-pool, and finally 12 more star-block modules. After each downsampling operation (e.g. max-pool), the number of filters is increased.

The first series of 2 star-block modules and the next series of 6 star-block modules use a group length of 8 for their 1×1-conv filters. To avoid overflow, the input activations and the weights for the 1×1-convs are represented using (2+s) bits, and these bits are contained in 128-bit outputs. The rationale for using (2+s) is: the maximum value of a (2+s) number is 3, the group length is 8, so the maximum output value is 3*3*8=72, which is smaller than 127 and therefore does not overflow when the output value is represented in 8-bits.

The next two series of 12 star-block modules have a group length of 16 and 32, respectively. Care is taken to develop a quantization scheme for these modules that does not overflow when using 8-bit storage and 8-bit arithmetic. The particular quantization scheme is shown in FIG. 3.

After the final star-block module, global average pooling is applied. This has the effect of reducing a H×W×Channels tensor of output activations down to a 1×1×Channels vector of output activations. In StarNet-A, the final star-block module has 1024 output channels, so the final output vector (after applying global average pooling) is a 1024-dimensional vector.

When running StarNet-A on an image, the largest of the 1024 output channels is the category that StarNet-A predicts is contained in the image.

In an other implementation, the final layers of StarNet-A can be configured to produce an activation grid that represents a semantic segmentation mask of a whole image.

In an other implementation, the input to StarNet-A includes a depth map.

FIG. 7 illustrates an example process for generating a neural network structure including input layers and filters, according to one embodiment. The online system determines 702 a bit length of a set of registers of the device used to perform arithmetic operations. For example, the registers may have an 8-bit architecture. The online system determines 704 a first integer representation for one or more input layers of the neural network structure and a second integer representation for one or more filters. The first integer representation may be associated with a first range of integer values and the second integer representation may be associated with a second range of integer values. Thus, each element in the input layers when quantized, may have a minimum to maximum range of integer values defined by the first integer representation. Similarly, each element in the filters when quantized, may have a minimum to maximum range of integer values defined by the second integer representation.

The online system generates 706 dimensionalities of the one or more input layers and the one or more filters. The dimensionalities are determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the set of registers. The online system generates 708 the neural network structure with the determined dimensionalities.

SUMMARY

The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.

Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.

Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.

Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Embodiments of the invention may also relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may include any embodiment of a computer program product or other data combination described herein.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Terminology

“Output Activation”: The output data produced by a layer of a deep neural network.

“Input Activation”: The input data provided to a layer of a deep neural network.

“Weight”: A learned parameter in a DNN.

“Filter”: A collection of weights organized in a specific pattern (e.g. a 3×3×256 convolution filter).

“Group-Length”: The number of channels in a convolution filter.

Claims

1. A method of generating a neural network structure including one or more input layers each associated with one or more filters, the method comprising: determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; andgenerating the neural network structure with the determined dimensionalities, wherein for each individual layer which forms the neural network structure, activations are quantized using activation parameters for the individual layer and the weights are quantized using layer parameters for the individual layer, wherein the activations and the activation parameters are provided as input to the individual layer, and wherein quantized output associated with the individual layer is dequantized using (1) the input activation parameters and (2) the layer parameters.
2. The method of claim 1, wherein generating the dimensionalities comprises generating the one or more filters for a corresponding input layer as star-shaped filters.
3. The method of claim 1, further comprising: receiving a set of input values corresponding to the elements of an input layer in the one or more input layers which form activations for the input layer, and a set of weights corresponding to the elements of a filter in the one or more filters with the generated dimensionalities;quantizing the set of input values by assigning each input value to a corresponding integer value in the first integer representation;quantizing the set of weights by assigning each weight to a corresponding integer value in the second integer representation; andcombining the set of input values and the set of weights to generate a quantized output.
4. The method of claim 3, wherein the neural network structure includes a shuffle layer placed after the corresponding input layer, the method further comprising: receiving another set of input values at the shuffle layer, wherein the another set of input values are arranged with respect to a plurality of channels; andinterleaving ordering of the plurality of channels at the shuffle layer.
5. The method of claim 3, wherein quantizing the set of input values comprises: obtaining a dataset including a plurality of data instances; propagating the plurality of data instances through the neural network structure to obtain input values at the input layer;identifying a lower bound value and an upper bound value from the input values obtained at the input layer; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the first integer representation.
6. The method of claim 3, wherein quantizing the set of weights comprises: identifying a lower bound value and an upper bound value from the set of weights; and dividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the second integer representation.
7. The method of claim 1, wherein the bit length of the set of registers are 8 bits, and the arithmetic operations are performed using 8-bit arithmetic.
8. A non-transitory computer-readable medium containing instructions for execution on a processor, the instructions comprising: determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; andgenerating the neural network structure with the determined dimensionalities, wherein for each individual layer which forms the neural network structure, activations are quantized using activation parameters for the individual layer and the weights are quantized using layer parameters for the individual layer, wherein the activations and the activation parameters are provided as input to the individual layer, and wherein quantized output associated with the individual layer is dequantized using (1) the input activation parameters and (2) the layer parameters.
9. The non-transitory computer-readable medium of claim 8, wherein generating the dimensionalities comprises generating the one or more filters for a corresponding input layer as star-shaped filters.
10. The non-transitory computer-readable medium of claim 8, the instructions further comprising: receiving a set of input values corresponding to the elements of an input layer in the one or more input layers which form activations for the input layer, and a set of weights corresponding to the elements of a filter in the one or more filters with the generated dimensionalities;quantizing the set of input values by assigning each input value to a corresponding integer value in the first integer representation;quantizing the set of weights by assigning each weight to a corresponding integer value in the second integer representation; andcombining the set of input values and the set of weights to generate a quantized output.
11. The non-transitory computer-readable medium of claim 10, wherein the neural network structure includes a shuffle layer placed after the corresponding input layer, the instructions further comprising: receiving another set of input values at the shuffle layer, wherein the another set of input values are arranged with respect to a plurality of channels; andinterleaving ordering of the plurality of channels at the shuffle layer.
12. The non-transitory computer-readable medium of claim 10, wherein quantizing the set of input values comprises: obtaining a dataset including a plurality of data instances;propagating the plurality of data instances through the neural network structure to obtain input values at the input layer;identifying a lower bound value and an upper bound value from the input values obtained at the input layer; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the first integer representation.
13. The non-transitory computer-readable medium of claim 10, wherein quantizing the set of weights comprises: identifying a lower bound value and an upper bound value from the set of weights; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the second integer representation.
14. The non-transitory computer-readable medium of claim 8, wherein the bit length of the set of registers are 8 bits, and the arithmetic operations are performed using 8-bit arithmetic.
15. A system comprising: a processor configured to execute instructions;a computer-readable medium containing instructions for execution on the processor, the instructions causing the processor to perform steps of: determining, for an architecture of a device, a bit length of a set of registers of the device used to perform arithmetic operations;determining a first integer representation for the one or more input layers and a second integer representation for the one or more filters, the first integer representation associated with a first range of integer values and the second integer representation associated with a second range of integer values;generating dimensionalities of the one or more input layers and the one or more filters, the dimensionalities determined such that an output value generated by combining elements of an input layer as maximum values of the first integer representation with elements of a corresponding filter as maximum values of the second integer representation does not overflow the bit length of the registers; andgenerating the neural network structure with the determined dimensionalities, wherein for each individual layer which forms the neural network structure, activations are quantized using activation parameters for the individual layer and the weights are quantized using layer parameters for the individual layer, wherein the activations and the activation parameters are provided as input to the individual layer, and wherein quantized output associated with the individual layer is dequantized using (1) the input activation parameters and (2) the layer parameters.
16. The system of claim 15, wherein generating the dimensionalities comprises generating the one or more filters for a corresponding input layer as star-shaped filters.
17. The system of claim 15, the instructions further comprising: receiving a set of input values corresponding to the elements of an input layer in the one or more input layers which form activations for the input layer, and a set of weights corresponding to the elements of a filter in the one or more filters with the generated dimensionalities;quantizing the set of input values by assigning each input value to a corresponding integer value in the first integer representation;quantizing the set of weights by assigning each weight to a corresponding integer value in the second integer representation; andcombining the set of input values and the set of weights to generate a quantized output.
18. The system of claim 17, wherein the neural network structure includes a shuffle layer placed after the corresponding input layer, the instructions further comprising: receiving another set of input values at the shuffle layer, wherein the another set of input values are arranged with respect to a plurality of channels; andinterleaving ordering of the plurality of channels at the shuffle layer.
19. The system of claim 17, wherein quantizing the set of input values comprises: obtaining a dataset including a plurality of data instances;propagating the plurality of data instances through the neural network structure to obtain input values at the input layer;identifying a lower bound value and an upper bound value from the input values obtained at the input layer; anddividing a range between the lower bound value and the upper bound value into a plurality of bins each assigned to a corresponding integer value in the first integer representation.
20. The system of claim 15, wherein the bit length of the set of registers are 8 bits, and the arithmetic operations are performed using 8-bit arithmetic.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims priority to, U.S. patent application Ser. No. 16/559,483 titled “NEURAL NETWORKS FOR EMBEDDED DEVICES” and filed on Sep. 3, 2019. U.S. patent application Ser. No. 16/559,483 claims the benefit of U.S. Provisional Patent Application No. 62/726,396, filed Sep. 3, 2018. The above-recited applications are hereby incorporated by reference herein in their entirety.

GOVERNMENT SUPPORT CLAUSE

This invention was made with government support under SBIR Phase II Grant Award No. 1758546 awarded by the National Science Foundation. The government has certain rights to the invention.

US Referenced Citations (602)

Number	Name	Date	Kind
6697534	Tan et al.	Feb 2004	B1
6882755	Silverstein et al.	May 2005	B2
7209031	Nakai et al.	Apr 2007	B2
7643659	Cao et al.	Jan 2010	B2
7747070	Puri	Jun 2010	B2
7904867	Burch et al.	Mar 2011	B2
7953253	Cao et al.	May 2011	B2
7974492	Nishijima	Jul 2011	B2
8165380	Choi et al.	Apr 2012	B2
8369633	Lu et al.	Feb 2013	B2
8406515	Cheatle et al.	Mar 2013	B2
8509478	Haas et al.	Aug 2013	B2
8588470	Rodriguez et al.	Nov 2013	B2
8744174	Hamada et al.	Jun 2014	B2
8773498	Lindbergh	Jul 2014	B2
8912476	Fogg et al.	Dec 2014	B2
8913830	Sun et al.	Dec 2014	B2
8928753	Han et al.	Jan 2015	B2
8972095	Furuno et al.	Mar 2015	B2
8976269	Duong	Mar 2015	B2
9008422	Eid et al.	Apr 2015	B2
9081385	Ferguson et al.	Jul 2015	B1
9275289	Li et al.	Mar 2016	B2
9586455	Sugai et al.	Mar 2017	B2
9672437	McCarthy	Jun 2017	B2
9710696	Wang et al.	Jul 2017	B2
9738223	Zhang et al.	Aug 2017	B2
9754154	Craig et al.	Sep 2017	B2
9767369	Furman et al.	Sep 2017	B2
9965865	Agrawal et al.	May 2018	B1
10133273	Linke	Nov 2018	B2
10140252	Fowers et al.	Nov 2018	B2
10140544	Zhao et al.	Nov 2018	B1
10146225	Ryan	Dec 2018	B2
10152655	Krishnamurthy et al.	Dec 2018	B2
10167800	Chung et al.	Jan 2019	B1
10169680	Sachdeva et al.	Jan 2019	B1
10192016	Ng et al.	Jan 2019	B2
10216189	Haynes	Feb 2019	B1
10228693	Micks et al.	Mar 2019	B2
10242293	Shim et al.	Mar 2019	B2
10248121	VandenBerg, III	Apr 2019	B2
10262218	Lee et al.	Apr 2019	B2
10282623	Ziyaee et al.	May 2019	B1
10296828	Viswanathan	May 2019	B2
10303961	Stoffel et al.	May 2019	B1
10310087	Laddha et al.	Jun 2019	B2
10311312	Yu et al.	Jun 2019	B2
10318848	Dijkman et al.	Jun 2019	B2
10325178	Tang et al.	Jun 2019	B1
10331974	Zia et al.	Jun 2019	B2
10338600	Yoon et al.	Jul 2019	B2
10343607	Kumon et al.	Jul 2019	B2
10359783	Williams et al.	Jul 2019	B2
10366290	Wang et al.	Jul 2019	B2
10372130	Kaushansky et al.	Aug 2019	B1
10373019	Nariyambut Murali et al.	Aug 2019	B2
10373026	Kim et al.	Aug 2019	B1
10380741	Yedla et al.	Aug 2019	B2
10394237	Xu et al.	Aug 2019	B2
10395144	Zeng et al.	Aug 2019	B2
10402646	Klaus	Sep 2019	B2
10402986	Ray et al.	Sep 2019	B2
10414395	Sapp et al.	Sep 2019	B1
10423934	Zanghi et al.	Sep 2019	B1
10436615	Agarwal et al.	Oct 2019	B2
10452905	Segalovitz et al.	Oct 2019	B2
10460053	Olson et al.	Oct 2019	B2
10467459	Chen et al.	Nov 2019	B2
10468008	Beckman et al.	Nov 2019	B2
10468062	Levinson et al.	Nov 2019	B1
10470510	Koh et al.	Nov 2019	B1
10474160	Huang et al.	Nov 2019	B2
10474161	Huang et al.	Nov 2019	B2
10474928	Sivakumar et al.	Nov 2019	B2
10489126	Kumar et al.	Nov 2019	B2
10489972	Atsmon	Nov 2019	B2
10503971	Dang et al.	Dec 2019	B1
10514711	Bar-Nahum et al.	Dec 2019	B2
10528824	Zou	Jan 2020	B2
10529078	Abreu et al.	Jan 2020	B2
10529088	Fine et al.	Jan 2020	B2
10534854	Sharma et al.	Jan 2020	B2
10535191	Sachdeva et al.	Jan 2020	B2
10542930	Sanchez et al.	Jan 2020	B1
10546197	Shrestha et al.	Jan 2020	B2
10546217	Albright et al.	Jan 2020	B2
10552682	Jonsson et al.	Feb 2020	B2
10559386	Neuman	Feb 2020	B1
10565475	Lecue et al.	Feb 2020	B2
10567674	Kirsch	Feb 2020	B2
10568570	Sherpa et al.	Feb 2020	B1
10572717	Zhu et al.	Feb 2020	B1
10574905	Srikanth et al.	Feb 2020	B2
10579058	Oh et al.	Mar 2020	B2
10579063	Haynes et al.	Mar 2020	B2
10579897	Redmon et al.	Mar 2020	B2
10586280	McKenna et al.	Mar 2020	B2
10591914	Palanisamy et al.	Mar 2020	B2
10592785	Zhu et al.	Mar 2020	B2
10599701	Liu	Mar 2020	B2
10599930	Lee et al.	Mar 2020	B2
10599958	He et al.	Mar 2020	B2
10606990	Tuli et al.	Mar 2020	B2
10609434	Singhai et al.	Mar 2020	B2
10614344	Anthony et al.	Apr 2020	B2
10621513	Deshpande et al.	Apr 2020	B2
10627818	Sapp et al.	Apr 2020	B2
10628432	Guo et al.	Apr 2020	B2
10628686	Ogale et al.	Apr 2020	B2
10628688	Kim et al.	Apr 2020	B1
10629080	Kazemi et al.	Apr 2020	B2
10636161	Uchigaito	Apr 2020	B2
10636169	Estrada et al.	Apr 2020	B2
10642275	Silva et al.	May 2020	B2
10645344	Marman et al.	May 2020	B2
10649464	Gray	May 2020	B2
10650071	Asgekar et al.	May 2020	B2
10652565	Zhang et al.	May 2020	B1
10656657	Djuric et al.	May 2020	B2
10657391	Chen et al.	May 2020	B2
10657418	Marder et al.	May 2020	B2
10657934	Kolen et al.	May 2020	B1
10661902	Tavshikar	May 2020	B1
10664750	Greene	May 2020	B2
10671082	Huang et al.	Jun 2020	B2
10671886	Price et al.	Jun 2020	B2
10678244	Iandola et al.	Jun 2020	B2
10678839	Gordon et al.	Jun 2020	B2
10678997	Ahuja et al.	Jun 2020	B2
10679129	Baker	Jun 2020	B2
10685159	Su et al.	Jun 2020	B2
10685188	Zhang et al.	Jun 2020	B1
10692000	Surazhsky et al.	Jun 2020	B2
10692242	Morrison et al.	Jun 2020	B1
10693740	Coccia et al.	Jun 2020	B2
10698868	Guggilla et al.	Jun 2020	B2
10699119	Lo et al.	Jun 2020	B2
10699140	Kench et al.	Jun 2020	B2
10699477	Levinson et al.	Jun 2020	B2
10713502	Tiziani	Jul 2020	B2
10719759	Kutliroff	Jul 2020	B2
10725475	Yang et al.	Jul 2020	B2
10726264	Sawhney et al.	Jul 2020	B2
10726279	Kim et al.	Jul 2020	B1
10726374	Engineer et al.	Jul 2020	B1
10732261	Wang et al.	Aug 2020	B1
10733262	Miller et al.	Aug 2020	B2
10733482	Lee et al.	Aug 2020	B1
10733638	Jain et al.	Aug 2020	B1
10733755	Liao et al.	Aug 2020	B2
10733876	Moura et al.	Aug 2020	B2
10740563	Dugan	Aug 2020	B2
10740914	Xiao et al.	Aug 2020	B2
10748062	Rippel et al.	Aug 2020	B2
10748247	Paluri	Aug 2020	B2
10751879	Li et al.	Aug 2020	B2
10755112	Mabuchi	Aug 2020	B2
10755575	Johnston et al.	Aug 2020	B2
10757330	Ashrafi	Aug 2020	B2
10762396	Vallespi et al.	Sep 2020	B2
10768628	Martin et al.	Sep 2020	B2
10768629	Song et al.	Sep 2020	B2
10769446	Chang et al.	Sep 2020	B2
10769483	Nirenberg et al.	Sep 2020	B2
10769493	Yu et al.	Sep 2020	B2
10769494	Xiao et al.	Sep 2020	B2
10769525	Redding et al.	Sep 2020	B2
10776109	Chen	Sep 2020	B2
10776626	Lin et al.	Sep 2020	B1
10776673	Kim et al.	Sep 2020	B2
10776939	Ma et al.	Sep 2020	B2
10779760	Lee et al.	Sep 2020	B2
10783381	Yu et al.	Sep 2020	B2
10783454	Shoaib et al.	Sep 2020	B2
10789402	Vemuri et al.	Sep 2020	B1
10789544	Fiedel et al.	Sep 2020	B2
10790919	Kolen et al.	Sep 2020	B1
10796221	Zhang et al.	Oct 2020	B2
10796355	Price et al.	Oct 2020	B1
10796423	Goja	Oct 2020	B2
10798368	Briggs et al.	Oct 2020	B2
10803325	Bai et al.	Oct 2020	B2
10803328	Bai et al.	Oct 2020	B1
10803743	Abari et al.	Oct 2020	B2
10805629	Liu et al.	Oct 2020	B2
10809730	Chintakindi	Oct 2020	B2
10810445	Kangaspunta	Oct 2020	B1
10816346	Wheeler et al.	Oct 2020	B2
10816992	Chen	Oct 2020	B2
10817731	Vallespi et al.	Oct 2020	B2
10817732	Porter et al.	Oct 2020	B2
10819923	McCauley et al.	Oct 2020	B1
10824122	Mummadi et al.	Nov 2020	B2
10824862	Qi et al.	Nov 2020	B2
10828790	Nemallan	Nov 2020	B2
10832057	Chan et al.	Nov 2020	B2
10832093	Taralova et al.	Nov 2020	B1
10832414	Pfeiffer	Nov 2020	B2
10832418	Karasev et al.	Nov 2020	B1
10833785	O'Shea et al.	Nov 2020	B1
10836379	Xiao et al.	Nov 2020	B2
10838936	Cohen	Nov 2020	B2
10839230	Charette et al.	Nov 2020	B2
10839578	Coppersmith et al.	Nov 2020	B2
10843628	Kawamoto et al.	Nov 2020	B2
10845820	Wheeler	Nov 2020	B2
10845943	Ansari et al.	Nov 2020	B1
10846831	Raduta	Nov 2020	B2
10846888	Kaplanyan et al.	Nov 2020	B2
10853670	Sholingar et al.	Dec 2020	B2
10853739	Truong et al.	Dec 2020	B2
10860919	Kanazawa et al.	Dec 2020	B2
10860924	Burger	Dec 2020	B2
10867444	Russell et al.	Dec 2020	B2
10871444	Al et al.	Dec 2020	B2
10871782	Milstein et al.	Dec 2020	B2
10872204	Zhu et al.	Dec 2020	B2
10872254	Mangla et al.	Dec 2020	B2
10872326	Garner	Dec 2020	B2
10872531	Liu et al.	Dec 2020	B2
10885083	Moeller-Bertram et al.	Jan 2021	B2
10887433	Fu et al.	Jan 2021	B2
10890898	Akella et al.	Jan 2021	B2
10891715	Li	Jan 2021	B2
10891735	Yang et al.	Jan 2021	B2
10893070	Wang et al.	Jan 2021	B2
10893107	Callari et al.	Jan 2021	B1
10896763	Kempanna et al.	Jan 2021	B2
10901416	Khanna et al.	Jan 2021	B2
10901508	Laszlo et al.	Jan 2021	B2
10902551	Mellado et al.	Jan 2021	B1
10908068	Amer et al.	Feb 2021	B2
10908606	Stein et al.	Feb 2021	B2
10909368	Guo et al.	Feb 2021	B2
10909453	Myers et al.	Feb 2021	B1
10915783	Hallman et al.	Feb 2021	B1
10917522	Segalis et al.	Feb 2021	B2
10921817	Kangaspunta	Feb 2021	B1
10922578	Banerjee et al.	Feb 2021	B2
10924661	Vasconcelos et al.	Feb 2021	B2
10928508	Swaminathan	Feb 2021	B2
10929757	Baker et al.	Feb 2021	B2
10930065	Grant et al.	Feb 2021	B2
10936908	Ho et al.	Mar 2021	B1
10937186	Wang et al.	Mar 2021	B2
10943101	Agarwal et al.	Mar 2021	B2
10943132	Wang et al.	Mar 2021	B2
10943355	Fagg et al.	Mar 2021	B2
11179064	Ng et al.	Nov 2021	B2
11562231	Iandola et al.	Jan 2023	B2
20030035481	Hahm	Feb 2003	A1
20050162445	Sheasby et al.	Jul 2005	A1
20060072847	Chor et al.	Apr 2006	A1
20060224533	Thaler	Oct 2006	A1
20060280364	Ma et al.	Dec 2006	A1
20070154095	Cao et al.	Jul 2007	A1
20070154096	Cao et al.	Jul 2007	A1
20090016571	Tijerina et al.	Jan 2009	A1
20100118157	Kameyama	May 2010	A1
20120109915	Kamekawa	May 2012	A1
20120110491	Cheung	May 2012	A1
20120134595	Fonseca et al.	May 2012	A1
20150104102	Carreira et al.	Apr 2015	A1
20160132786	Balan et al.	May 2016	A1
20160283842	Pescianschi	Sep 2016	A1
20160328856	Mannino et al.	Nov 2016	A1
20170011281	Dihkman et al.	Jan 2017	A1
20170158134	Shigemura	Jun 2017	A1
20170206434	Nariyambut et al.	Jul 2017	A1
20180012411	Richey et al.	Jan 2018	A1
20180018590	Szeto et al.	Jan 2018	A1
20180039853	Liu et al.	Feb 2018	A1
20180067489	Oder et al.	Mar 2018	A1
20180068459	Zhang et al.	Mar 2018	A1
20180068540	Romanenko et al.	Mar 2018	A1
20180074506	Branson	Mar 2018	A1
20180121762	Han et al.	May 2018	A1
20180150081	Gross et al.	May 2018	A1
20180211403	Hotson et al.	Jul 2018	A1
20180308012	Mummadi et al.	Oct 2018	A1
20180314878	Lee et al.	Nov 2018	A1
20180357511	Misra et al.	Dec 2018	A1
20180374105	Azout et al.	Dec 2018	A1
20190023277	Roger et al.	Jan 2019	A1
20190025773	Yang et al.	Jan 2019	A1
20190042894	Anderson	Feb 2019	A1
20190042919	Peysakhovich et al.	Feb 2019	A1
20190042944	Nair et al.	Feb 2019	A1
20190042948	Lee et al.	Feb 2019	A1
20190057314	Julian et al.	Feb 2019	A1
20190065637	Bogdoll et al.	Feb 2019	A1
20190072978	Levi	Mar 2019	A1
20190079526	Vallespi et al.	Mar 2019	A1
20190080602	Rice et al.	Mar 2019	A1
20190095780	Zhong et al.	Mar 2019	A1
20190095946	Azout et al.	Mar 2019	A1
20190101914	Coleman et al.	Apr 2019	A1
20190108417	Talagala et al.	Apr 2019	A1
20190122111	Min et al.	Apr 2019	A1
20190130255	Yim et al.	May 2019	A1
20190145765	Luo et al.	May 2019	A1
20190146497	Urtasun et al.	May 2019	A1
20190147112	Gordon	May 2019	A1
20190147250	Zhang et al.	May 2019	A1
20190147254	Bai et al.	May 2019	A1
20190147255	Homayounfar et al.	May 2019	A1
20190147335	Wang et al.	May 2019	A1
20190147372	Luo et al.	May 2019	A1
20190158784	Ahn et al.	May 2019	A1
20190180154	Orlov et al.	Jun 2019	A1
20190185010	Ganguli et al.	Jun 2019	A1
20190189251	Horiuchi et al.	Jun 2019	A1
20190197357	Anderson et al.	Jun 2019	A1
20190204842	Jafari et al.	Jul 2019	A1
20190205402	Sernau et al.	Jul 2019	A1
20190205667	Avidan et al.	Jul 2019	A1
20190217791	Bradley et al.	Jul 2019	A1
20190227562	Mohammadiha et al.	Jul 2019	A1
20190228037	Nicol et al.	Jul 2019	A1
20190230282	Sypitkowski et al.	Jul 2019	A1
20190235499	Kazemi et al.	Aug 2019	A1
20190236437	Shin et al.	Aug 2019	A1
20190243371	Nister et al.	Aug 2019	A1
20190244138	Bhowmick et al.	Aug 2019	A1
20190250622	Nister et al.	Aug 2019	A1
20190250626	Ghafarianzadeh et al.	Aug 2019	A1
20190250640	O'Flaherty et al.	Aug 2019	A1
20190258878	Koivisto et al.	Aug 2019	A1
20190266418	Xu et al.	Aug 2019	A1
20190266610	Ghatage et al.	Aug 2019	A1
20190272446	Kangaspunta et al.	Sep 2019	A1
20190276041	Choi et al.	Sep 2019	A1
20190279004	Kwon et al.	Sep 2019	A1
20190286652	Habbecke et al.	Sep 2019	A1
20190286972	El Husseini et al.	Sep 2019	A1
20190287028	St Amant et al.	Sep 2019	A1
20190289281	Badrinarayanan et al.	Sep 2019	A1
20190294177	Kwon et al.	Sep 2019	A1
20190294975	Sachs	Sep 2019	A1
20190311290	Huang et al.	Oct 2019	A1
20190318099	Carvalho et al.	Oct 2019	A1
20190325088	Dubey et al.	Oct 2019	A1
20190325266	Klepper et al.	Oct 2019	A1
20190325269	Bagherinezhad et al.	Oct 2019	A1
20190325580	Lukac et al.	Oct 2019	A1
20190325595	Stein et al.	Oct 2019	A1
20190329790	Nandakumar et al.	Oct 2019	A1
20190332875	Vallespi-Gonzalez et al.	Oct 2019	A1
20190333232	Vallespi-Gonzalez et al.	Oct 2019	A1
20190336063	Dascalu	Nov 2019	A1
20190339989	Liang et al.	Nov 2019	A1
20190340462	Pao et al.	Nov 2019	A1
20190340492	Burger et al.	Nov 2019	A1
20190340499	Burger et al.	Nov 2019	A1
20190347501	Kim et al.	Nov 2019	A1
20190349571	Herman et al.	Nov 2019	A1
20190354782	Kee et al.	Nov 2019	A1
20190354786	Lee et al.	Nov 2019	A1
20190354808	Park et al.	Nov 2019	A1
20190354817	Shlens et al.	Nov 2019	A1
20190354850	Watson et al.	Nov 2019	A1
20190370398	He et al.	Dec 2019	A1
20190370575	Nandakumar et al.	Dec 2019	A1
20190370935	Chang et al.	Dec 2019	A1
20190373322	Rojas-Echenique et al.	Dec 2019	A1
20190377345	Bachrach et al.	Dec 2019	A1
20190377965	Totolos et al.	Dec 2019	A1
20190378049	Widmann et al.	Dec 2019	A1
20190378051	Widmann et al.	Dec 2019	A1
20190382007	Casas et al.	Dec 2019	A1
20190384303	Muller et al.	Dec 2019	A1
20190384304	Towal et al.	Dec 2019	A1
20190384309	Silva et al.	Dec 2019	A1
20190384994	Frossard et al.	Dec 2019	A1
20190385048	Cassidy et al.	Dec 2019	A1
20190385360	Yang et al.	Dec 2019	A1
20200004259	Gulino et al.	Jan 2020	A1
20200004351	Marchant et al.	Jan 2020	A1
20200012936	Lee et al.	Jan 2020	A1
20200017117	Milton	Jan 2020	A1
20200025931	Liang et al.	Jan 2020	A1
20200026282	Choe et al.	Jan 2020	A1
20200026283	Barnes et al.	Jan 2020	A1
20200026992	Zhang et al.	Jan 2020	A1
20200027210	Haemel et al.	Jan 2020	A1
20200033858	Xiao	Jan 2020	A1
20200033865	Mellinger et al.	Jan 2020	A1
20200034665	Ghanta et al.	Jan 2020	A1
20200034710	Sidhu et al.	Jan 2020	A1
20200036948	Song	Jan 2020	A1
20200039520	Misu et al.	Feb 2020	A1
20200051550	Baker	Feb 2020	A1
20200060757	Ben-Haim et al.	Feb 2020	A1
20200065711	Clément et al.	Feb 2020	A1
20200065879	Hu et al.	Feb 2020	A1
20200069973	Lou et al.	Mar 2020	A1
20200073385	Jobanputra et al.	Mar 2020	A1
20200074230	Englard et al.	Mar 2020	A1
20200086880	Poeppel et al.	Mar 2020	A1
20200089243	Poeppel et al.	Mar 2020	A1
20200089969	Lakshmi et al.	Mar 2020	A1
20200090056	Singhal et al.	Mar 2020	A1
20200097841	Petousis et al.	Mar 2020	A1
20200098095	Borcs et al.	Mar 2020	A1
20200103894	Cella et al.	Apr 2020	A1
20200104705	Bhowmick et al.	Apr 2020	A1
20200110416	Hong et al.	Apr 2020	A1
20200117180	Cella et al.	Apr 2020	A1
20200117889	Laput et al.	Apr 2020	A1
20200117916	Liu	Apr 2020	A1
20200117917	Yoo	Apr 2020	A1
20200118035	Asawa et al.	Apr 2020	A1
20200125844	She et al.	Apr 2020	A1
20200125845	Hess et al.	Apr 2020	A1
20200126129	Lkhamsuren et al.	Apr 2020	A1
20200134427	Oh et al.	Apr 2020	A1
20200134461	Chai et al.	Apr 2020	A1
20200134466	Weintraub et al.	Apr 2020	A1
20200134848	El-Khamy et al.	Apr 2020	A1
20200143231	Fusi et al.	May 2020	A1
20200143279	West et al.	May 2020	A1
20200148201	King et al.	May 2020	A1
20200149898	Felip et al.	May 2020	A1
20200151201	Chandrasekhar et al.	May 2020	A1
20200151619	Mopur et al.	May 2020	A1
20200151692	Gao et al.	May 2020	A1
20200158822	Owens et al.	May 2020	A1
20200158869	Amirloo et al.	May 2020	A1
20200159225	Zeng et al.	May 2020	A1
20200160064	Wang et al.	May 2020	A1
20200160104	Urtasun et al.	May 2020	A1
20200160117	Urtasun et al.	May 2020	A1
20200160178	Kar et al.	May 2020	A1
20200160532	Urtasun et al.	May 2020	A1
20200160558	Urtasun et al.	May 2020	A1
20200160559	Urtasun et al.	May 2020	A1
20200160598	Manivasagam et al.	May 2020	A1
20200162489	Bar-Nahum et al.	May 2020	A1
20200167438	Herring	May 2020	A1
20200167554	Wang et al.	May 2020	A1
20200174481	Van Heukelom et al.	Jun 2020	A1
20200175326	Shen et al.	Jun 2020	A1
20200175354	Volodarskiy et al.	Jun 2020	A1
20200175371	Kursun	Jun 2020	A1
20200175401	Shen	Jun 2020	A1
20200183482	Sebot et al.	Jun 2020	A1
20200184250	Oko	Jun 2020	A1
20200184333	Oh	Jun 2020	A1
20200192389	ReMine et al.	Jun 2020	A1
20200193313	Ghanta et al.	Jun 2020	A1
20200193328	Guestrin et al.	Jun 2020	A1
20200202136	Shrestha et al.	Jun 2020	A1
20200202196	Guo et al.	Jun 2020	A1
20200205697	Zheng et al.	Jul 2020	A1
20200209857	Djuric et al.	Jul 2020	A1
20200209867	Valois et al.	Jul 2020	A1
20200209874	Chen et al.	Jul 2020	A1
20200210717	Hou et al.	Jul 2020	A1
20200210769	Hou et al.	Jul 2020	A1
20200210777	Valois et al.	Jul 2020	A1
20200211154	Ng et al.	Jul 2020	A1
20200216064	du Toit et al.	Jul 2020	A1
20200218722	Mai et al.	Jul 2020	A1
20200218979	Kwon et al.	Jul 2020	A1
20200223434	Campos et al.	Jul 2020	A1
20200225758	Tang et al.	Jul 2020	A1
20200226377	Campos et al.	Jul 2020	A1
20200226430	Ahuja et al.	Jul 2020	A1
20200238998	Dasalukunte et al.	Jul 2020	A1
20200242381	Chao et al.	Jul 2020	A1
20200242408	Kim et al.	Jul 2020	A1
20200242511	Kale et al.	Jul 2020	A1
20200245869	Sivan et al.	Aug 2020	A1
20200247433	Scharfenberger et al.	Aug 2020	A1
20200249685	Elluswamy et al.	Aug 2020	A1
20200250456	Wang et al.	Aug 2020	A1
20200250515	Rifkin et al.	Aug 2020	A1
20200250874	Assouline et al.	Aug 2020	A1
20200257301	Weiser et al.	Aug 2020	A1
20200257306	Nisenzon	Aug 2020	A1
20200258057	Farahat et al.	Aug 2020	A1
20200265247	Musk et al.	Aug 2020	A1
20200272160	Djuric et al.	Aug 2020	A1
20200272162	Hasselgren et al.	Aug 2020	A1
20200272859	Iashyn et al.	Aug 2020	A1
20200273231	Schied et al.	Aug 2020	A1
20200279354	Klaiman	Sep 2020	A1
20200279364	Sarkisian et al.	Sep 2020	A1
20200279371	Wenzel et al.	Sep 2020	A1
20200285464	Brebner	Sep 2020	A1
20200286256	Houts et al.	Sep 2020	A1
20200293786	Jia et al.	Sep 2020	A1
20200293796	Sajjadi et al.	Sep 2020	A1
20200293828	Wang et al.	Sep 2020	A1
20200293905	Huang et al.	Sep 2020	A1
20200294162	Shah	Sep 2020	A1
20200294257	Yoo et al.	Sep 2020	A1
20200294310	Lee et al.	Sep 2020	A1
20200297237	Tamersoy et al.	Sep 2020	A1
20200298891	Liang et al.	Sep 2020	A1
20200301799	Manivasagam et al.	Sep 2020	A1
20200302276	Yang et al.	Sep 2020	A1
20200302291	Hong	Sep 2020	A1
20200302299	Nagel et al.	Sep 2020	A1
20200302627	Duggal et al.	Sep 2020	A1
20200302662	Homayounfar et al.	Sep 2020	A1
20200304441	Bradley et al.	Sep 2020	A1
20200306640	Kolen et al.	Oct 2020	A1
20200307562	Ghafarianzadeh et al.	Oct 2020	A1
20200307563	Ghafarianzadeh et al.	Oct 2020	A1
20200309536	Omari et al.	Oct 2020	A1
20200309923	Bhaskaran et al.	Oct 2020	A1
20200310442	Halder et al.	Oct 2020	A1
20200311601	Robinson et al.	Oct 2020	A1
20200312003	Borovikov et al.	Oct 2020	A1
20200315708	Mosnier et al.	Oct 2020	A1
20200320132	Neumann	Oct 2020	A1
20200324073	Rajan et al.	Oct 2020	A1
20200327192	Hackman et al.	Oct 2020	A1
20200327443	Van et al.	Oct 2020	A1
20200327449	Tiwari et al.	Oct 2020	A1
20200327662	Liu et al.	Oct 2020	A1
20200327667	Arbel et al.	Oct 2020	A1
20200331476	Chen et al.	Oct 2020	A1
20200334416	Vianu et al.	Oct 2020	A1
20200334495	Al et al.	Oct 2020	A1
20200334501	Lin et al.	Oct 2020	A1
20200334551	Javidi et al.	Oct 2020	A1
20200334574	Ishida	Oct 2020	A1
20200337648	Saripalli et al.	Oct 2020	A1
20200341466	Pham et al.	Oct 2020	A1
20200342350	Madar et al.	Oct 2020	A1
20200342548	Mazed et al.	Oct 2020	A1
20200342652	Rowell et al.	Oct 2020	A1
20200348909	Das Sarma et al.	Nov 2020	A1
20200350063	Thornton et al.	Nov 2020	A1
20200351438	Dewhurst et al.	Nov 2020	A1
20200356107	Wells	Nov 2020	A1
20200356790	Jaipuria et al.	Nov 2020	A1
20200356864	Neumann	Nov 2020	A1
20200356905	Luk et al.	Nov 2020	A1
20200361083	Mousavian et al.	Nov 2020	A1
20200361485	Zhu et al.	Nov 2020	A1
20200364481	Kornienko et al.	Nov 2020	A1
20200364508	Gurel et al.	Nov 2020	A1
20200364540	Elsayed et al.	Nov 2020	A1
20200364746	Longano et al.	Nov 2020	A1
20200364953	Simoudis	Nov 2020	A1
20200372362	Kim	Nov 2020	A1
20200372402	Kursun et al.	Nov 2020	A1
20200380362	Cao et al.	Dec 2020	A1
20200380383	Kwong et al.	Dec 2020	A1
20200393841	Frisbie et al.	Dec 2020	A1
20200394421	Yu et al.	Dec 2020	A1
20200394457	Brady	Dec 2020	A1
20200394495	Moudgill et al.	Dec 2020	A1
20200394813	Theverapperuma et al.	Dec 2020	A1
20200396394	Zlokolica et al.	Dec 2020	A1
20200398855	Thompson	Dec 2020	A1
20200401850	Bazarsky et al.	Dec 2020	A1
20200401886	Deng et al.	Dec 2020	A1
20200402155	Kurian et al.	Dec 2020	A1
20200402226	Peng	Dec 2020	A1
20200410012	Moon et al.	Dec 2020	A1
20200410224	Goel	Dec 2020	A1
20200410254	Pham et al.	Dec 2020	A1
20200410288	Capota et al.	Dec 2020	A1
20200410751	Omari et al.	Dec 2020	A1
20210004014	Sivakumar	Jan 2021	A1
20210004580	Sundararaman et al.	Jan 2021	A1
20210004611	Garimella et al.	Jan 2021	A1
20210004663	Park et al.	Jan 2021	A1
20210006835	Slattery et al.	Jan 2021	A1
20210011908	Hayes et al.	Jan 2021	A1
20210012116	Urtasun et al.	Jan 2021	A1
20210012210	Sikka et al.	Jan 2021	A1
20210012230	Hayes et al.	Jan 2021	A1
20210012239	Arzani et al.	Jan 2021	A1
20210015240	Elfakhri et al.	Jan 2021	A1
20210019215	Neeter	Jan 2021	A1
20210026360	Luo	Jan 2021	A1
20210027112	Brewington et al.	Jan 2021	A1
20210027117	McGavran et al.	Jan 2021	A1
20210030276	Li et al.	Feb 2021	A1
20210034921	Pinkovich et al.	Feb 2021	A1
20210042575	Firner	Feb 2021	A1
20210042928	Takeda et al.	Feb 2021	A1
20210046954	Haynes	Feb 2021	A1
20210049378	Gautam et al.	Feb 2021	A1
20210049455	Kursun	Feb 2021	A1
20210049456	Kursun	Feb 2021	A1
20210049548	Grisz et al.	Feb 2021	A1
20210049700	Nguyen et al.	Feb 2021	A1
20210056114	Price et al.	Feb 2021	A1
20210056306	Hu et al.	Feb 2021	A1
20210056317	Golov	Feb 2021	A1
20210056420	Konishi et al.	Feb 2021	A1
20210056701	Vranceanu et al.	Feb 2021	A1
20210350210	Gong	Nov 2021	A1
20210378554	Au et al.	Dec 2021	A1
20220079472	Au et al.	Mar 2022	A1

Foreign Referenced Citations (270)

Number	Date	Country
2019261735	Jun 2020	AU
2019201716	Oct 2020	AU
2021101172	May 2021	AU
110599537	Dec 2010	CN
102737236	Oct 2012	CN
103366339	Oct 2013	CN
104835114	Aug 2015	CN
103236037	May 2016	CN
103500322	Aug 2016	CN
106419893	Feb 2017	CN
106504253	Mar 2017	CN
107031600	Aug 2017	CN
107169421	Sep 2017	CN
107507134	Dec 2017	CN
107885214	Apr 2018	CN
108122234	Jun 2018	CN
107133943	Jul 2018	CN
107368926	Jul 2018	CN
105318888	Aug 2018	CN
108491889	Sep 2018	CN
108647591	Oct 2018	CN
108710865	Oct 2018	CN
105550701	Nov 2018	CN
108764185	Nov 2018	CN
108845574	Nov 2018	CN
108898177	Nov 2018	CN
109086867	Dec 2018	CN
107103113	Jan 2019	CN
109215067	Jan 2019	CN
109359731	Feb 2019	CN
109389207	Feb 2019	CN
109389552	Feb 2019	CN
106779060	Mar 2019	CN
109579856	Apr 2019	CN
109615073	Apr 2019	CN
106156754	May 2019	CN
106598226	May 2019	CN
106650922	May 2019	CN
109791626	May 2019	CN
109901595	Jun 2019	CN
109902732	Jun 2019	CN
109934163	Jun 2019	CN
109948428	Jun 2019	CN
109949257	Jun 2019	CN
109951710	Jun 2019	CN
109975308	Jul 2019	CN
109978132	Jul 2019	CN
109978161	Jul 2019	CN
110060202	Jul 2019	CN
110069071	Jul 2019	CN
110084086	Aug 2019	CN
110096937	Aug 2019	CN
110111340	Aug 2019	CN
110135485	Aug 2019	CN
110197270	Sep 2019	CN
110310264	Oct 2019	CN
110321965	Oct 2019	CN
110334801	Oct 2019	CN
110399875	Nov 2019	CN
110414362	Nov 2019	CN
110426051	Nov 2019	CN
110473173	Nov 2019	CN
110516665	Nov 2019	CN
110543837	Dec 2019	CN
110569899	Dec 2019	CN
110599864	Dec 2019	CN
110619282	Dec 2019	CN
110619283	Dec 2019	CN
110619330	Dec 2019	CN
110659628	Jan 2020	CN
110688992	Jan 2020	CN
107742311	Feb 2020	CN
110751265	Feb 2020	CN
110751280	Feb 2020	CN
110826566	Feb 2020	CN
107451659	Apr 2020	CN
108111873	Apr 2020	CN
110956185	Apr 2020	CN
110966991	Apr 2020	CN
111027549	Apr 2020	CN
111027575	Apr 2020	CN
111047225	Apr 2020	CN
111126453	May 2020	CN
111158355	May 2020	CN
107729998	Jun 2020	CN
108549934	Jun 2020	CN
111275129	Jun 2020	CN
111275618	Jun 2020	CN
111326023	Jun 2020	CN
111428943	Jul 2020	CN
111444821	Jul 2020	CN
111445420	Jul 2020	CN
111461052	Jul 2020	CN
111461053	Jul 2020	CN
111461110	Jul 2020	CN
110225341	Aug 2020	CN
111307162	Aug 2020	CN
111488770	Aug 2020	CN
111507952	Aug 2020	CN
111539514	Aug 2020	CN
111565318	Aug 2020	CN
111582216	Aug 2020	CN
111598095	Aug 2020	CN
108229526	Sep 2020	CN
111693972	Sep 2020	CN
106558058	Oct 2020	CN
107169560	Oct 2020	CN
107622258	Oct 2020	CN
111767801	Oct 2020	CN
111768002	Oct 2020	CN
111783545	Oct 2020	CN
111783971	Oct 2020	CN
111797657	Oct 2020	CN
111814623	Oct 2020	CN
111814902	Oct 2020	CN
111860499	Oct 2020	CN
111881856	Nov 2020	CN
111882579	Nov 2020	CN
111897639	Nov 2020	CN
111898507	Nov 2020	CN
111898523	Nov 2020	CN
111899227	Nov 2020	CN
112101175	Dec 2020	CN
112101562	Dec 2020	CN
112115953	Dec 2020	CN
112132261	Dec 2020	CN
111062973	Jan 2021	CN
111275080	Jan 2021	CN
112183739	Jan 2021	CN
112232497	Jan 2021	CN
112288658	Jan 2021	CN
112308095	Feb 2021	CN
112308799	Feb 2021	CN
112313663	Feb 2021	CN
112329552	Feb 2021	CN
112348783	Feb 2021	CN
111899245	Mar 2021	CN
112463078	Mar 2021	CN
112488291	Mar 2021	CN
112686384	Apr 2021	CN
202017102235	May 2017	DE
202017102238	May 2017	DE
102017116017	Jan 2019	DE
102018130821	Jun 2020	DE
102019008316	Aug 2020	DE
1 437 004	Jul 2004	EP
1215626	Sep 2008	EP
2 179 589	Apr 2010	EP
2 465 093	Jun 2012	EP
2228666	Sep 2012	EP
2 567 347	Mar 2013	EP
2420408	May 2013	EP
2 618 559	Jul 2013	EP
2 723 069	Apr 2014	EP
2741253	Jun 2014	EP
3115772	Jan 2017	EP
3 198 557	Dec 2017	EP
3285485	Feb 2018	EP
3 295 424	Mar 2018	EP
3 320 486	May 2018	EP
3 185 758	Oct 2018	EP
2863633	Feb 2019	EP
3113080	May 2019	EP
3 494 514	Jun 2019	EP
3525132	Aug 2019	EP
3531689	Aug 2019	EP
3 535 692	Sep 2019	EP
3537340	Sep 2019	EP
3543917	Sep 2019	EP
3 598 874	Jan 2020	EP
3608840	Feb 2020	EP
3 616 119	Mar 2020	EP
3 657 387	May 2020	EP
2396750	Jun 2020	EP
3664020	Jun 2020	EP
3 673 233	Jul 2020	EP
3690712	Aug 2020	EP
3690742	Aug 2020	EP
3 718 048	Oct 2020	EP
3 729 002	Oct 2020	EP
3722992	Oct 2020	EP
3 732 618	Nov 2020	EP
3690730	Nov 2020	EP
3739486	Nov 2020	EP
3501897	Dec 2020	EP
3751455	Dec 2020	EP
3 766 023	Jan 2021	EP
3783527	Feb 2021	EP
2402572	Aug 2005	GB
2548087	Sep 2017	GB
2577485	Apr 2020	GB
2517270	Jun 2020	GB
2578262	Aug 1998	JP
3941252	Jul 2007	JP
4282583	Jun 2009	JP
4300098	Jul 2009	JP
2015004922	Jan 2015	JP
5863536	Feb 2016	JP
6044134	Dec 2016	JP
6525707	Jun 2019	JP
2019101535	Jun 2019	JP
2020101927	Jul 2020	JP
2020173744	Oct 2020	JP
100326702	Feb 2002	KR
101082878	Nov 2011	KR
101738422	May 2017	KR
101969864	Apr 2019	KR
101996167	Jul 2019	KR
102022388	Aug 2019	KR
102043143	Nov 2019	KR
102095335	Mar 2020	KR
102097120	Apr 2020	KR
1020200085490	Jul 2020	KR
102189262	Dec 2020	KR
1020200142266	Dec 2020	KR
200630819	Sep 2006	TW
I294089	Mar 2008	TW
I306207	Feb 2009	TW
WO 02052835	Jul 2002	WO
WO 15134900	Sep 2015	WO
WO 16032398	Mar 2016	WO
WO 16048108	Mar 2016	WO
WO 16207875	Dec 2016	WO
WO 17158622	Sep 2017	WO
WO 17214507	Dec 2017	WO
WO 19005547	Jan 2019	WO
WO 19067695	Apr 2019	WO
WO 19089339	May 2019	WO
WO 19092456	May 2019	WO
WO 19099622	May 2019	WO
WO 19122952	Jun 2019	WO
WO 19125191	Jun 2019	WO
WO 19126755	Jun 2019	WO
WO 19144575	Aug 2019	WO
WO 19182782	Sep 2019	WO
WO 19191578	Oct 2019	WO
WO 19216938	Nov 2019	WO
WO 19220436	Nov 2019	WO
WO 20006154	Jan 2020	WO
WO 20012756	Jan 2020	WO
WO 20025696	Feb 2020	WO
WO 20034663	Feb 2020	WO
WO 20056157	Mar 2020	WO
WO 20076356	Apr 2020	WO
WO 20097221	May 2020	WO
WO 20101246	May 2020	WO
WO 20120050	Jun 2020	WO
WO 20121973	Jun 2020	WO
WO 20131140	Jun 2020	WO
WO 20139181	Jul 2020	WO
WO 20139355	Jul 2020	WO
WO 20139357	Jul 2020	WO
WO 20142193	Jul 2020	WO
WO 20146445	Jul 2020	WO
WO 20151329	Jul 2020	WO
WO 20157761	Aug 2020	WO
WO 20163455	Aug 2020	WO
WO 20167667	Aug 2020	WO
WO 20174262	Sep 2020	WO
WO 20177583	Sep 2020	WO
WO 20185233	Sep 2020	WO
WO 20185234	Sep 2020	WO
WO 20195658	Oct 2020	WO
WO 20198189	Oct 2020	WO
WO 20198779	Oct 2020	WO
WO 20205597	Oct 2020	WO
WO 20221200	Nov 2020	WO
WO 20240284	Dec 2020	WO
WO 20260020	Dec 2020	WO
WO 20264010	Dec 2020	WO

Non-Patent Literature Citations (10)

Entry
Machine Translation of Chinese Patent Application CN 115861767 A, filed 2022. (Year: 2022).
‘Quantized Memory-Augmented Neural Networks’ by Park et al., from the Thirty-Second AAAI Conference on Artificial Intelligence ( AAAI-18). (Year: 2018).
‘Going Deeper with Embedded FPGA Platform for Convolutional Neural Network’by Qiu et al., FPGA'16, Feb. 21-23, 2016. ( Year: 2016).
‘Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference’by Jacob et al., Dec. 15, 2017. ( Year: 2017).
‘Filter Shaping for Convolutional Neural Networks’by Li et al., published as a conference paper at ICLR 2017. ( Year: 2017).
‘A Neural Network Implementation on an Inexpensive Eight Bit Microcontroller’by Cotton et al., copyright 2008, IEEE. (Year: 2008).
‘A Neural Network Implementation on Embedded Systems’by Nicholas Jay Cotton, Aug. 9, 2010. (Year: 2010).
‘Moving Convolutional Neural Networks to Embedded Systems: the AlexNet and VGG-16 case’by Alippi et al., Apr. 2018. (Year 2018).
‘Software-Hardware Codesign for Efficient Neural Network Acceleration’by Guo, copyright 2017, IEEE. ( Year: 2017).
‘Pact: Parameterized Clipping Activation for Quantized Neural Networks’by Choi, Jul. 17, 2018. (Year: 2018.

Related Publications (1)

	Number	Date	Country
	20230237331 A1	Jul 2023	US

Provisional Applications (1)

	Number	Date	Country
	62726396	Sep 2018	US

Continuations (1)

	Number	Date	Country
Parent	16559483	Sep 2019	US
Child	18156628		US

Neural networks for embedded devices

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

International Classifications

Abstract