Optimizing neural network structures for embedded systems

Description

BACKGROUND

This invention relates generally to autonomous control systems for vehicles, and more particularly to the generation and application of machine-learned models used in autonomous control systems for vehicles.

Autonomous control systems are systems that guide vehicles (e.g., automobiles, trucks, vans) without direct guidance by human operators. Autonomous control systems analyze the surrounding physical environment in various ways to guide vehicles in a safe manner. For example, an autonomous control system may detect and/or track objects in the physical environment, and responsive to a detected object, guide the vehicle away from the object such that collision with the object can be avoided. As another example, an autonomous control system may detect boundaries of lanes on the road such that the vehicle can be guided within the appropriate lane with the flow of traffic.

Often times, autonomous control systems use computer models to perform algorithms for analyzing the surrounding environment and performing detection and control operations. For example, the autonomous control system uses a computer model to detect pedestrians on the street using images captured using an onboard camera. The computer models are trained from data sets containing information that resemble potential environments the autonomous control system would encounter during operation. However, training the models is a time-consuming task, sometimes requiring multiple days to complete. Furthermore, when generating a model for use in a new platform, a designer of the model may want to explore multiple different architectures, or multiple different configurations of the same architecture.

SUMMARY OF THE INVENTION

A model training and implementation pipeline trains models for individual embedded systems by generating an intermediate representation of a model for interpretation on the embedded system. The pipeline includes a model generation stage and a model performance estimation stage. The pipeline iterates through multiple models and estimates the performance of the models to determine if the models are able to be applied by the target platform. The models are generated based on the performance of models generated during previous iterations. For example, if the pipeline determines that a model cannot be applied by the target platform with a desired performance, the pipeline generates a new model with a reduced complexity. During the model generation stage, the pipeline translates the description of the model together with the model parameters into an intermediate representation in a language that is compatible with a virtual machine. The intermediate representation is agnostic or independent to the configuration of the target platform. That is, as long as a virtual machine is designed for a platform, the platform is able to apply the model by executing the intermediate representation of the model through the virtual machine. The intermediate representation specifies a set of operations and the order in which the operations are to be performed. The intermediate representation may be a graph representation where nodes in the graph correspond to variables used by the model and the branches connecting the nodes represent operations to be performed on the variables.

To generate the intermediate representation, a graph representation of the model is generated and information about the variables used by the model is propagated through the graph representation. Using the graph representation, the memory utilization of the model graph is estimated and the operations of the model graph are optimized. Furthermore, the data allocation for the variables used by the model and the operations performed by the model are scheduled.

During the model performance estimation stage, the pipeline evaluates the performance of the models without training the models. For instance, the model is generated using default or randomized parameters. Based on the analysis of the performance of the untrained models, a subset of models that perform within the specified performance are selected. The selected models are then trained and the performance of the trained models are analyzed. In some embodiments, different performance parameters are tested after the models have been trained. For example, the trained models are evaluated based on their accuracy in addition to their performance characteristics. That is, if models for identifying road hazards is being tested, the accuracy of the models for detecting various road hazards in test images.

Based on the analysis of the performance of the trained models, a single model is selected for deployment to the target platform. The intermediate representation of the trained model is then stored in the storage medium of the target system together with a set of kernels for implementing the model and a virtual machine for compiling and executing the model using the set of kernels. The virtual machine is a software module that enables a computer to run or execute programs that are written in the language of the intermediate representation. The virtual machine translates the intermediate representation that is written in the intermediate language into the machine code of the processor included in the computer by selecting and applying kernels to implement the intermediate representation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is an example network environment for autonomous control, in accordance with an embodiment

FIGS. 1B-1D are example computer architectures for using in the autonomous control system, in accordance with an embodiment.

FIG. 2 is flow diagram of a process for generating a machine-learned computer model, in accordance with an embodiment.

FIG. 3A is a block diagram of the model compiler, in accordance with an embodiment.

FIGS. 3B-3D illustrate two ways of optimizing a model graph, in accordance with an embodiment.

FIG. 4 is a block diagram of the virtual machine, in accordance with an embodiment.

FIG. 5 is a tree representation of the kernels available for a backend device, in accordance with an embodiment.

FIG. 6 is a flow diagram of a process for generating an intermediate representation of a machine-learned computer model for being executed by a virtual machine, in accordance with an embodiment.

FIG. 7 is a flow diagram of a process for executing intermediate representation of the machine-learned computer model, in accordance with an embodiment.

FIG. 8 is a flow diagram of a process for generating and selecting a model architecture, in accordance with an embodiment.

FIG. 9 illustrates a deployment system architecture of the machine-learned model in the autonomous control system, in accordance with an embodiment.

The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

FIG. 1 is an example network environment 100 for autonomous control, in accordance with an embodiment. The network environment 100 includes an autonomous control system 110, a sensor collection system 150, and a model generation system 140 coupled to a network 120.

The autonomous control system 110 guides vehicles based on information related to the surrounding environment received from the one or more sensors attached to the vehicles. The vehicles are any means of conveyance or transport in or by which someone or something can travel from one place to another, and may include automobiles, trucks, vans, robotic transports, and the like. The autonomous control system 110 may guide a vehicle through one or more trips from one destination to another. For example, the autonomous control system 110 may guide a ride-sharing vehicle (e.g., a taxi) from a passenger's point of pick-up to their desired destination. Though described herein as an autonomous vehicle, the control decisions of the autonomous controls system may provide semi-autonomous control rather than complete control of the vehicle, for example to supplement or override user control, or as primary means of control that can be overridden by a user. In addition, although the autonomous control system 110 is described herein as a system that guides vehicles, the autonomous control system 110 may also guide other systems such as robotic arms or manufacturing equipment.

One or more sensors are attached to the vehicles to gather information used to generate the control of the vehicle. The sensors are devices that detect information related to the physical environment. The information can be captured through many forms. For example, the sensors may be imaging sensors that capture scenes of the physical environment through a series of one or more images. In such an example, other vehicles proximate to the vehicle of the autonomous control system, stationary and moving objects such as trees, fire hydrants, lamp posts, and the like may be captured in the images. As another example, the sensors may be geo-locational sensors, and more specifically global positioning system (GPS) sensors that detect the position of the sensor (and its attached vehicle) relative to a map of the physical environment. As yet another example, the sensors may be microphones that detect sounds in the environment in the form of audio signals. As defined herein, sensor data of a sensor denotes the readings of the environment collected by the sensor that characterize how the sensor perceives the environment.

The one or more sensors may include high-capacity sensors that have certain improved characteristics over other sensors. For example, high-capacity imaging sensors may generate sensor data having improved characteristics, such as increased resolution, data collection time, sharpness, field-of-view, and the like, compared to other sensors. As another example, high-capacity geo-locational sensors may pinpoint the location of the sensor more accurately than others. As another example, some high-capacity sensors are able to detect information at a level of accuracy or precision that other sensors cannot. For example, light detection and ranging (LIDAR) sensors can measure the distance from the sensor to an object at a level of accuracy that is difficult to achieve for image sensors. Alternatively, more-sophisticated LIDAR sensors may generate greater precision data than less-sophisticated LIDAR sensors. In general, high-capacity sensors tend to be complex, expensive, and bulky. Moreover, it may be difficult for an owner (or a manufacturer) of a vehicle to purchase and install high-capacity sensors separately on his or her vehicle.

On the other hand, due to their high capacity, only a few or even a single high-capacity sensor may be needed to collect a substantial amount of information on the physical environment for accurate performance of the autonomous control system 110. For example, a single LIDAR sensor on a vehicle can capture a 360-degree field-of-view of the physical environment through high-resolution signals that may be alone sufficient for accurate performance of the autonomous control system 110.

The one or more sensors may also include replacement sensors that have smaller capacity than high-capacity sensors, but may be more readily available than high-capacity sensors in that they are portable, easier to install, and relatively inexpensive. For example, many vehicles are now manufactured with sensors at the front and/or back of the car that provide real-time sensor data of the surroundings such that the operator can detect objects to avoid collisions with the object. However, these sensors have limited field-of-view that captures only a portion of the environment at the front and/or back of the vehicle. As another example, portable radio detection and ranging (RADAR) sensors may be able to detect distance of objects better than imaging sensors, but still may not have the accuracy of a high-capacity LIDAR sensor. As another example, portable cameras are easy to install on windshield or dashboard areas of the vehicle, but may lack the resolution and field-of-view of LIDAR sensors.

In contrast to high-capacity sensors, each sensor in a set of replacement sensors may provide fragments of information on the surrounding environment in different formats of sensor data and have lower precision information. However, the combination of sensor data may contain information comparable to that generated from high-capacity sensors. For example, a vehicle may have an RGB camera with a first resolution at the back of a vehicle, a greyscale camera with a second resolution at the dashboard of the vehicle, another RGB camera with a third resolution at the left and right sides of the vehicle, and a portable RADAR sensor. Individually, each camera has a fragmented field-of-view limited to one among the front, back, and sides of the vehicle in different resolutions and color, and the portable RADAR sensor has sub-optimal distance measurements (with respect to the high-capacity sensors).

The autonomous control system 110 performs various detection and control algorithms based on sensor data of the physical environment to guide the vehicles in a safe and efficient manner. For example, the autonomous control system 110 may detect various objects (e.g., lamp post, cars) that are proximate to a vehicle in the captured sensor data of the environment, and guide the vehicle away from the objects to prevent collision of the vehicle with the objects. As another example, the autonomous control system 110 may detect boundaries of lanes on the road such that the vehicle can be guided within the appropriate lane with the flow of traffic.

In one embodiment, various functions of the autonomous control system 110 are performed through machine-learned computer models. In one embodiment, the machine-learned models are neural network models such as feed-forward networks, convolutional neural networks (CNN), deep neural networks (DNN), recurrent neural networks (RNN), self-organizing maps (SOM), and the like, that are generated and trained by the model generation system 140 based on training data sets.

The model generation system 140 constructs and trains machine-learned models based on sensor information provided by the sensor collection system 150. The trained machine-learned models perform various functions, such as simulating sensor data, estimating sensor quality, and other detection and control algorithms for use by the autonomous control system 110. The model generation system 140 trains the models based on training data sets. The training data sets contain information resembling potential environments the autonomous control system 110 would encounter during operation. For example, a computer model for detecting pedestrians on the street may learn different representations of people from a data set containing various images of pedestrians. A sufficient amount of training data generally leads to improved performance of computer models. However, gathering training data can be costly and time-consuming. Moreover, some characteristics of environments that are important for the computer models to learn may not be included in existing training data.

The sensor collection system 150 is attached to one or more data collection vehicles, and includes one or more sensors. The sensor collection system 150 collects training information related to the physical environment using the various sensors, such that relationships can be learned between sensor data from the different sensors available to the sensor collection system and the sensor data may be used to learn appropriate interpretation of the environment or for control of the vehicle.

The one or more sensors of the sensor collection system 150 can include active sensors and passive sensors. A passive sensor observes the environment. Passive sensors can include cameras, or microphones, vibration sensors, and the like. Passive sensors include a receiver that detects and measures various forms of energy that are naturally emitted from the physical environment or constituents of the physical environment across various locations of the environment. As an example, when the sensor is a camera, the sensor data is a time series of pixel data indicating intensities of detected light. That is, a time series of pictures is acquired. Each picture is divided into pixels and each pixel may have one or more intensity values associated with it depending on whether the camera is a greyscale camera or a color camera. For example, when the camera is a color camera describing a color of a pixel in red, green, and blue, the intensity value for each is typically an integer, such as an 8, 10, or 12-bit integer specifying the intensity of the red, green, or blue portion of the frequency. If the resolution of the picture were 100×100 pixels (having 10,000 total pixels), for every picture, there would be 3 separate channels of 10,000 pixels.

When the sensor is a microphone, the sensor data is a time series of air pressure values. In one embodiment, the time series of air pressure values is converted into a spectrogram. A spectrogram shows a time series of components (strengths) showing a collection of frequency strengths for each time period. The spectrogram is generated from the initial sound waves by a time windowed discrete Fourier transform, also sometimes called a “Gabor Transform.” The size of the sensor data can be adjusted by adjusting the number of frequencies and/or the size of the time step, used in the windowed Fourier transform.

When the sensor is a vibration sensor, the sensor data is a time series of physical displacements of the vibration sensor in the system. The vibration sensor is typically attached or near to a particular component of the system to represent vibration of that component. Similarly to the microphone, in one embodiment, the time series of physical displacements are converted into a spectrogram, and the number of frequencies used in the Fourier transform can be adjusted.

The one or more sensors may include active sensors. Active sensors emit energy and then measure the energy that is reflected back to one or more receivers in the sensor. The reflected energy allows active sensors to probe for environmental information that may not otherwise be readily detected passively at the sensor. For example, active sensors may estimate distances of objects from the sensor better than passive sensors. Active sensors include both a transmitter and receiver of energy, in contrast to passive sensors that use receivers. Active sensors can include ultrasound sensors, RADAR sensors, active infrared (IR) sensors, LIDAR sensors, and the like. Usually, ultrasound sensors emit ultrasound waves, RADAR sensors emit microwaves, LIDAR sensors emit laser pulses in the near-IR or visible range waves, and IR sensors emit IR waves.

In one instance, the sensor data includes depth measurements that measures how far away an object is from the sensor. Specifically, the depth is measured by triggering a timer when the energy is emitted, and detecting the amount of time needed for the receiver to detect the reflected energy. The traveling speed of the energy can be used to calculate the depth of objects at various locations in the environment by emitting energy signals in the direction of the objects. In another instance, the sensor data also includes intensity measurements that measures the intensity of the reflected energy detected at the receiver of the sensor. These intensity values may be represented as 8 or 16-bit integer values.

For many types of active sensors, the sensor data is a collection of data points with reference to the sensor in a three-dimensional (3D) coordinate system (“point cloud” measurements) such as, for example, a spherical coordinate system or a Cartesian coordinate system. Each value designates the measurement of the actively-transmitted signal at the receiver (e.g., depth or reflected intensity). The number of data points in the point cloud is related to the resolution of the sensor. Further, even for a given sensor, the number of data points varies depending on factors such as what portion of the environment is within the sensor's range.

For example, when the sensor is a LIDAR sensor, the sensor data may include a point cloud of intensity measurements and a point cloud of reflectance measurements. Specifically, a narrow beam laser is pointed in a specific, known direction. This known direction can be identified as a pair of angles including a polar angle θ and an azimuth angle φ with reference to the sensor. The polar angle θ specifies from the upward direction (0 degrees) to the downward direction (180 degrees), while the azimuth angle φ specifies from the forward direction (0 degrees) to the backward direction (360 degrees).

By actively emitting energy across the entire field-of-view, a set of measurements for depth and/or intensity can be collected for different values of (r, θ, φ), where r denotes the depth measurement of an object (e.g., ground, cars, trees) to the sensor and θ, φ together denote the known direction object. Thus, a 3D view of the environment can be mapped to a point cloud representing objects in the environment by using the returned depth and intensity thereof.

In one embodiment, point cloud measurements are collected with rotational scanning. For example, multiple laser beams (e.g. 64 laser beams) can be emitted from a rotating drum, enabling multiple measurements across various values of θ. In this case, θ and φ are pre-determined by the position of the rotating drum and which of the multiple beams emitted the light, while r is measured based on the time-of-flight of the energy beam as discussed above.

In another embodiment, the point cloud measurements are collected by linear scanning in the (x,y) space. In such implementations, the light source is aimed at one or more mirrors. The mirrors, which may be microscopic mirrors (e.g. MEMS mirrors), can be manipulated programmatically, causing the energy beam to be steered. While mirror-based steering could potentially implement almost any scanning pattern, in practice these systems are usually used to implement grid-like scanning patterns that follow the Cartesian coordinate system.

In yet another embodiment, the point cloud measurements are collected through a phased array. A phased array is typically implemented with no moving parts. Instead, a phased array is made up of multiple transmitters at the same frequency but with different phase delay. A beam-like radiation pattern is achieved by the constructive and destructive interference of these multiple beams. The results of this approach can be viewed in polar coordinates or Cartesian coordinates.

Active sensors such as RADAR and LIDAR may output sparse representations of the environment. This sparsity can arise for a few reasons. For example, most active sensors have a minimum and maximum range at which they can reliably receive a returned signal. For example, a LIDAR sensor specifies a minimum usable return range of 0.9 meters and a maximum usable return range of 120 meters. When objects and the ground plane are outside of this range, no return is received, and therefore the returns comprise a sparse point cloud. As another example, even when objects are within range, occlusions such as rain or fog can lead to diffraction of a LIDAR sensor's laser beams. This can lead to fewer returns, which can cause the point cloud to be sparser compared to the point clouds that are generated in dry weather.

FIGS. 1B-1D are example computer architectures for using in the autonomous control system, in accordance with an embodiment. FIG. 1B is an example computer architecture that includes a central processing unit (CPU) 150, a graphics processing unit (GPU) 160, a digital signal processor (DSP) 165, main memory 170, video memory 175, storage 180, and sensors 190. The CPU 150 is an electronic circuit that performs arithmetic, logical, control, and input/output operations as specified by instructions loaded into main memory 170. The CPU 150 may, for example, be an x86 based processor, and x64 based processor, or an ARM based processor. The CPU has an instruction set that is used to instruct the CPU on performing specific operations to data stored in registers. In some embodiments, the CPU includes multiple cores (e.g., 4 cores), each core capable of executing the entire instruction set of the CPU.

The GPU 160 is a specialized electronic circuit designed to efficiently perform specific operations or mathematical functions. GPU 160 is designed to perform highly parallel operations such as matrix or vector operations. Furthermore, GPU 160 may be designed to more-effectively perform parallel floating point operations. The GPU 160 typically has a larger number of compute units than the CPU 150. The GPU has more compute units than the number of cores of the CPU, but each GPU is not capable of performing every operation a CPU core is capable of performing. For example, a GPU that has 100 cores is capable of multiplying a scalar value to each element of a 10 by 10 matrix in a single cycle, whereas a dual core CPU may perform the same computation in 50 or more cycles.

The DSP 165 is a specialized electronic circuit that is optimized for performing operations used in digital signal processing. In some embodiments, the DSP includes a vector floating point co-processor for performing vector operations more efficiently than a CPU.

The main memory 170 stores a series of instructions to be executed by the CPU or the GPU. The main memory further stores data to be used by the CPU. For example, the main memory includes a segment for storing the result of calculations performed by the CPU. The video memory 175 stores information to be used by the GPU. In some embodiments, since the GPU may perform certain complex calculations faster than the CPU, the video memory is faster at loading and storing data than the main memory. In one embodiment, the system may combine both the main memory and the video memory in a single unit. As such, both the CPU and the GPU may share the same memory module for storing the data use the respective processors. The main memory and the video memory may be implemented as a dynamic random-access memory (DRAM), such as the double data rate synchronous DRAM (DDR SDRAM).

The storage 180 stores persistent data to be kept between power cycles of the autonomous control system. For example, storage 180 stores the program to be executed by the autonomous control system and the settings/parameters used by the autonomous control system. The storage 180 may be implemented as a hard disk drive (HDD) or a solid state drive (SSD).

FIG. 1C is an example computer architecture that includes an accelerated processing unit (APU) 155, main memory 170, video memory 175, storage 180, and sensors 190. That is, the example of FIG. 1C includes an APU instead of a discrete CPU and a discrete GPU. The APU 155 includes CPU cores and GPU compute units in a single die or chip. In some embodiments, the APU includes separate dies for the CPU and GPU connected together via an interposer.

FIG. 1D is an example computer architecture that includes a low power CPU 150, memory 170, storage 180, and sensors 190. That is, the example of FIG. 1D includes a single memory module instead of dedicated main and video memory. Furthermore, the example of FIG. 1D only includes a low power CPU and does not include a GPU.

Different architectures and different computer configurations within a same architecture may have different capabilities. For example, an architecture that includes a GPU is capable of performing matrix operations mode efficiently than an architecture that does not include a GPU. Furthermore, an architecture that uses 8 GB of memory allocation compared to an architecture that uses 4 GB of memory allocation is capable of including more data in memory and thus may use more complex data structures. Further, a model that uses 32 GB of memory bandwidth is able to access more model parameters and/or temporary variables than a model that uses 16 GB of memory bandwidth. As such, a model that operates with a specific performance in a first platform having a first computer configuration, may not operate with acceptable performance in a second platform having a second computer configuration. As such, to deploy models across various platforms, different models are generated that are tailored to the capabilities of the respective platforms.

FIG. 2 is a block diagram of the model generation system 140 for generating a machine-learned computer model, in accordance with an embodiment. The model generation system is used to select and train models for deployment to a variety of different embedded processors like the ones showed in FIGS. 1B-1D. The system includes a model generator 210, a model compiler 230, a virtual machine 240, a code executor/scheduler 250, and an embedded processor 270, and a performance evaluator 280. In general, using an iterative process, a computer model is generated based on a performance of a model generated in previous iterations of the process. To determine the performance of the model, the model is converted to an intermediate representation 235 that can be interpreted by the virtual machine 240. Using the intermediate representation, the virtual machine 240 generates machine code 245 for executing the operations for applying the model. The performance evaluator 280 then analyzes the intermediate representation and the generated machine code to estimate or measure the performance of the model as performed by the target system.

The model generator 210 generates machine-learned models based on an input from the model compiler 230 and the performance evaluator 280. The model 220 generated by the model generator 210 includes a model description and a set of model parameters. In some embodiments, the model generator 210 generates an initial default model and modifies the model based on information received form the model compiler 230 and the performance evaluator 280. For instance, the model generator may modify the model to have a fewer number of layers if the performance evaluator 280 indicates that the model uses an amount of memory that is larger than the available memory in the target platform. In another example, the model generator 210 may increase the complexity of the model if the performance evaluator 280 indicates that an estimated frame rate of the model is higher than 60 frames per second.

The model compiler 230 receives the model 220 generated by the model generator 210 and generates an intermediate representation 235 of the model. The intermediate interpretation is a platform agnostic representation of the operations to be performed for using the model 220. The model compiler 230 translates the model description and model parameters into a set of operations that are compatible with the virtual machine 240. A detailed description of the model compiler 230 is described below in conjunction with FIG. 3.

The virtual machine 240 receives the intermediate interpretation 235 that includes platform agnostic operations and generates machine code 245 that includes platform specific instructions for running the set of operations described in the intermediate representation 235. The virtual machine identifies an operation specified in the intermediate representation 235 and selects a kernel 260 that implements the operation. In some embodiments, the kernels 260 are pieces of code implemented in the assembly language of the embedded processor 270 of the platform for which the model is being built. A detailed description of the virtual machine 240 is described below in conjunction with FIG. 4.

The code executor 250 instructs the processor 270 to execute the instructions included in machine code 245. The embedded processor 270 may be a CPU, a GPU, a digital signal processor (DSP), or another domain-specific processor, or a combination thereof. In some embodiments, the code executor 250 manages the hardware resources such as memory allocation and instruction execution scheduling. In some embodiments, the code executor 250 is part of the operating system of the platform using the model.

The performance evaluator 280 estimates the performance of the model generated by the model generator 210. The performance evaluator 280 estimates the performance based on the model description and model parameters provided by the model generator 210. In some embodiments, the performance evaluator 280 mathematically estimates the performance of the model using mathematics. For example, a matrix-multiplication with an input matrices of size N×N utilizes O(N³) floating point operations and O(N²) memory accesses.

The performance evaluator 280 additionally measures the performance of the model based on the output of the virtual machine 240. The performance evaluator 280 empirically measures the performance of the model by profiling the machine code 245.

In some embodiments, performance evaluator 280 determines a latency in completing the operations of the model, a throughput or frame rate at which the operations of the model can be finished, an amount of power used by the target system implementing the model, and an amount of resources (e.g., memory and processor usage) consumed by the target system implementing the model.

The performance estimator determines the throughout by determines a number of times per second the model can be applied by a target system. In some embodiments, the performance evaluator 280 determines if the operations of the model can be performed 60 times per second by the target embedded processor 270. In some embodiments, the performance evaluator 280 determines a number of operations to be performed and compares the determined number of operations to a maximum number of operations per second the embedded processor 270 is capable of performing. For instance, the performance evaluator 280 determines that a GPU has 1.8 TFLOPS (or 1.8×10¹²floating point operations per second) of computing capability, and the model is performed using 20×10⁹floating point operations. As such, the model is capable of being performed at a 90 frames per second (FPS) rate. In yet another example, the performance evaluator 280 instructs the embedded processor to execute the operations implementing the model and empirically measures a frame rate at which the processor completes the operations implementing the model.

The performance evaluator 280 empirically determines a latency as the amount of time used by a target system to complete the execution of the model. The latency may not be directly correlated with the throughput of the model as the execution of the model may be overlapped. That is, a next execution of the model may be started before a previous execution of the model is finished.

The performance evaluator 280 maththematically determines a number of naïve floating point operations (FLOPS) the total number of FLOPS used by the model when implemented with default kernels. The naïve FLOPS are estimated using the model description and model parameters generated by the model generator.

The performance evaluator 280 uses static analysis to determine a number of optimized FLOPS as the number of FLOPS used by the model based on the machine code 245 generated by the virtual machine 240.

The performance evaluator 280 mathematically determines a naïve memory allocation as the total memory used by the model for all the model-parameters and temporary variables. The naïve memory allocation is estimated based on the model 220 as generated by the model generator 210.

The performance evaluator 280 determines an optimized memory allocation as the amount of memory used by the model after the allocation of the model-parameters and temporary variables have been scheduled. The optimized memory allocation is measured based on the intermediate representation 235 generated by the model compiler 230. In other embodiments, the optimized memory allocation is measured based on the machine code 245 generated by the virtual machine 240. The optimized memory allocation is lower than the naïve memory allocation, for example, when the memory to store temporary variables are re-allocated once the temporary variables are no longer needed.

The performance evaluator 280 mathematically determines a naïve memory bandwidth as the total memory bandwidth used by the model for all the model-parameters and temporary variables. The naïve memory bandwidth is estimated based on the model 220 as generated by the model generator 210.

The performance evaluator 280 empirically determines an optimized memory bandwidth as the memory bandwidth used by the model after the allocation of the model-parameters and temporary variables, as well as the operations executed by the model, have been scheduled. The optimized memory bandwidth is measured based on the machine code 245 generated by the virtual machine 240. The optimized memory bandwidth is lower than the naïve memory bandwidth, for example, when operations to be performed by the model are fused.

FIG. 3 is a block diagram of the model compiler, in accordance with an embodiment. The model compiler includes a graph parser 310, a memory estimator 320, a graph optimizer 330, a tensor scheduler 340, and an operation scheduler 350.

The graph parser 310 maps the computations of the model 220 to a model graph that includes information about the operations to be performed when using the model. In some embodiments, the model graph is a tree structure that includes nodes corresponding to the data used by the model and branches specifying the operations applied to the nodes.

The memory estimator 320 determines an amount of memory used for storing the data used in the model graph. The memory estimator determines an amount of memory used by each of the nodes of the model graph based on the data shape and data type of the leaf nodes and the operations performed on each of the nodes. For example, if a first node has a 224×224×3 shape and a float32 type, and a second node has a 224×224×3 shape and a float32 type, the memory estimator 320 would determine that a third node corresponding to the concatenation of the first and second node has a 224×224×6 shape and a float32 type. As used herein, the shape of the node represents the dimensionality of the node and the size of the node in each of the dimensions. That is, the shape of the node is the n-dimensional shape (e.g., n-dimensional parallelotope). As such a 224×224×3 shape represents a matrix with 224 elements in a first dimension, 224 elements in a second dimension, and 3 elements in a third dimension. In some embodiments, the memory estimator 320 propagates the data shape and type from the leaf nodes of model graph through each of the branches until the memory estimator has determined the data shape and type for every node of the model graph.

The graph optimizer 330 performs optimizations on the model graph generated by the model parser 310. For example, the graph optimizer 330 may fuse operations that can be performed by a single instruction. For instance, the graph optimizer 330 may fuse a multiplication and an addition operation as a single fused multiply-accumulate (FMA) operation. In another example, the graph optimizer 330 may fuse a batch-normalization layer with a convolution step. FIGS. 3B-3D illustrate two ways of optimizing a model graph. FIG. 3B illustrates a model graph for the operation

a*0.251+c+d

Where a, c, and d are variables used by model. The model graph first multiplies variable a with static value 0.251 adds c to the result of the multiplication, and finally adds d to the result of the addition. FIG. 3C illustrates an optimized model graph where the first two operations are fused into a fused multiple-accumulate (FMA) operation. FIG. 3D illustrates a second optimization where a multiplication operation between a and 0.251 is performed in parallel with an add operation between c and d. As such, the model graphs of FIGS. 3C and 3D are performed in 2 compute cycles instead of 3 compute cycles for the model graph of FIG. 3B.

The tensor scheduler 340 identifies a memory bottleneck in the model graph. Based on the identified bottleneck, the model compiler may determine which operations not to schedule concurrently so as to reduce the amount of memory that is being used concurrently at any given point in time. In some embodiments, the tensor scheduler allocates a portion of the available memory for each of the tensors used by the model 220. In one embodiments, the tensor scheduler 340 receives a maximum amount of memory available in the target platform system. In other embodiments, the tensor scheduler 340 minimizes the amount of memory concurrently used at any given point. For example, the tensor scheduler 340 identifies the memory bottleneck of the model and determines if the operations involved in the memory bottleneck can be split to reduce the amount of memory being used. For example, if the memory bottleneck includes a dot product of a first 100 element vector with a second 100 element vector, the tensor scheduler may split the dot product as a first dot product of two 50 element vectors and a subsequent second dot product of two 50 element vectors, thus reducing the amount of temporary memory used to hold the intermediate results of the dot products. In another example, the memory bottleneck may involve the following operation:

$t = \sum_{i = 0}^{9 9} a_{i} \times b_{i} + \sum_{i = 0}^{9 9} c_{i} \times d_{i}$

The operations may be performed as followed:

- for i=0 to 99: p_i=a_i×b_i
- for i=0 to 99: q_i=c_i×d_i
  t=Σp_i+Σq_i

This implementation uses at least 200 additional to store each of the p_iand q_iintermediate results. Instead, to reduce the amount of memory used, the tensor scheduler may perform the calculation as:

- for i=0 to 99: t_i=a_i×b_i
  t=Σt_i
- for i=0 to 99: t_i=c_i×d_i
  t=t+Σt_i

As such, the memory bottleneck may be implemented by using an additional 101 tensors to store the t_iintermediate results. Even if a GPU is able to perform 200 multiplications in parallel, the tensor scheduler 340 may select not to perform the 200 multiplications in parallel, but instead breaking the operations into two 100 parallel multiplications to reduce the amount of memory used. This is optimization is referred to as “working set reduction.”

The tensor scheduler 340 further identifies when a tensor is no longer used by the model and reallocates the memory space occupied by the tensors that are no longer used to store new tensors generated by the model. For example, a model may perform the following operation:

t=(a+b)×c

Where a, b, c, and t are tensors. To perform this operation, the compiler determines that the tensors a and b are to be first added and stored as a new result tensor, and the new result tensor is to be multiplied with tensor c. This operation may be performed as:

t₁←a+b
t←t₁×c

To perform the above operations, five tensors (a, b, c, t₁, and t) are used. The tensor schedule may determine that tensor t_iis no longer used after the operation has been performed. Thus, the tensor scheduler may reallocate the memory space used by tensor t_ito store the result of the multiplication operation as follows:

t₁←a+b
t←t₁×c

As such, one less tensor is used to perform the operation, reducing the amount of memory concurrently being used. This optimization performed by the tensor scheduler 340 is referred to as “in-place optimization.”

The operation scheduler 350 determines an order for the operations in the model graph. The operation scheduler 350 performs a cost determination to determine a rate of usage of the processor of the target system. In some embodiments, the operation schedule determines a number of operations of each type (e.g., a number of add operations, a number of multiply operations, a number of convolution operations, etc.) at various points in the model graph. Based on the determined number of operations, the operation scheduler 350 determines the order in which to perform those operations.

FIG. 4 is a block diagram of the virtual machine, in accordance with an embodiment. The virtual machine 240 includes a kernel selector 420, and a code compiler 430.

The kernel selector 420 identifies a kernel for implementing an operation. As used herein, kernels are implementations of the various operations performed by the virtual machine. The kernels are specifically designed for a platform. In some embodiments, kernels are implemented in the machine language or assembly language of a processor of the target platform. In some embodiments, the kernel selector 420 uses an execution tree that is generated during a startup sequence of the target system based on the kernels available. Multiple kernels may be available for a given operation. For example, multiple kernels, each for performing a different implementation of the convolution operation, is available to the virtual machine 240. In particular, the virtual machine 240 includes one or more kernels for implementing the convolution operation using matrix multiplication, one or more kernels for implementing the convolution operation using a fast Fourier transform, and one or more kernels implementing the convolution operation using integration. In one embodiment, the kernels are grouped by implementation family, each family used for operands with specific characteristics. Moreover, each specific kernel may further includes an indication of a set of characteristics for the operands the kernel is optimized for. For example, a kernel may be optimized for operands with a data type of float32. In another example, a kernel may be optimized for tensors with a shape of 224×224×1. In some embodiments, the kernels additionally include a default kernel for implementing the operation when a specialized kernel is not available.

The kernels 260 may be implemented by software engineers who are domain experts for particular computing platforms. A “kernel” represents a specific implementation of a specific algorithm (e.g. Convolution). The domain expert annotates each kernel with information about which range of inputs that kernel is optimal for. For example, consider the SGEMM operation (Single Precision Floating General Matrix Multiply) which represents multiplying matrix A with a matrix B to produce a matrix C. One kernel for this operation is called “row-major” and another kernel is called “column-major.” These two kernels are optimized for the case where the matrix A has each row stored contiguously in memory, and where matrix A has each column stored contiguously in memory. Each of these kernels written by a domain expert, and the domain expert annotates each kernel with information about its preferred input configuration. A kernel is then selected by matching a particular input against each kernel's input configuration. So in the SGEMM case, when an input is stored in a “row-major” format, the “row-major” kernel algorithm is selected.

In another example, multiple kernels for an algorithm can be written that have different tradeoffs between memory bandwidth usage and FLOPs usage. For example, a series of convolutions can be implemented in a “tiled” manner. For the top-left of the image, convolution layers 1, 2, and 3 are performed. Then, the algorithm moves on to performing convolution layers 1, 2, and 3 on the top-right of the image, and so on. This increases cache locality and therefore reduces the memory bandwidth consumed. However, in the border zone between the top-right and top-left of the image, some information needs to be recomputed. As such, this example uses less memory bandwidth but more FLOPs. The tradeoff between FLOPs usage and memory bandwidth usage depends on the particulars of the computing platform on which the software will be executed. Therefore, it is useful to have multiple kernels for each algorithm, and to have a kernel selector that can choose which kernels to use for a specific neural network and a specific computing platform.

FIG. 5 is a tree representation of the kernels available for a specific target platform, in accordance with an embodiment. The tree representation of FIG. 5 includes three operations Op1, Op2, and Op3 that are available for the target platform. Within each operation, the target platform includes multiple families implementing the operation. For example, operation Op2 includes three different implementations Imp1, Imp2, and Imp3. Furthermore, each implementation of the operation includes multiple kernels optimized for operands with specific characteristics. For example, implementation Imp2 of operation Op2 includes three different kernels for implementing the operation.

During startup of the system, the kernel selector 420 identifies the available kernels and generates an execution tree based on the information associate with each of the kernels. In some embodiments, the kernel selector 240 first identifies the available implementation families and generates an execution tree to first select an implementation family from the available implementation families, and then to select a kernel within the selected implementation family.

The code compiler 430 generates the machine code for instructing the target system to perform the instructions included in the selected kernels. In some embodiments, the code compiler 430 generates a binary file containing the machine code for executing the operations of the model. In another embodiments, the code compiler 430 temporarily stores the machine instructions for executing the operations of the model in an executable segment of the memory.

Compilation of Machine-Learned Model

FIG. 6 is a flow diagram of a process for generating an intermediate representation of a machine-learned computer model for being executed by a virtual machine, in accordance with an embodiment.

The graph parser 310 generates 610 a model graph from the intermediate representation of the model. The model compiler 230 generates the model graph based on the description of the model and the model parameters provided by the model generator 210. The model graph includes multiple nodes representing a data variable connected to each other by branches representing operations on the data variables.

Information about the variables used by the model are propagated 620 through the model graph. In some embodiments, the information about the variables are propagated from the leaf nodes through the model graph based on the operations associated with each of the branches of the model graph. In some embodiments, for each of the nodes for the graph, a data shape and a data type is determined for the data variable associated with the node. Based on the information about the variables used by the model, the amount of memory used by the model is estimated 630.

The graph optimizer 330 optimizes the operations for applying the model. The tensor scheduler 340 identifies the memory bottleneck for the model graph and schedules 650 the tensors used by the model. The operation scheduler 350 schedules 660 the operations of the model graph based on the scheduling of the tensors. In some embodiments, the model compiler 230 performs multiple iterations of the tensor and operation scheduling. That is, after the operations have been scheduled, the scheduling of the tensors may be further modified to improve the performance of the model. In some embodiments, these steps are repeated until scheduling of the tensors and operations do not change between iterations of the scheduling steps.

Execution of Machine-Learned Model

FIG. 7 is a flow diagram of a process for executing intermediate representation of the machine-learned computer model, in accordance with an embodiment.

An operation is retrieved 710 from the intermedia representation of the model. A type of operation is identified 720 for the retrieved operation. The type and shape of operands are also identified 730 for the retrieved operation.

Based on the identified type of operation and the identified type and shape of the operands of the retrieved operation, an implementation family for performing the retrieved operation is selected. Furthermore, based on the characteristics of the operands, a kernel is selected within the kernels included in the selected implementation family. In some embodiments, an execution tree is traversed to identify the implementation family and the kernel. The execution tree may be generated during startup of the system, or may be pre-generated when the system is built or updated. For example, when a new kernel is deployed to the system, a new execution tree is provided with the new kernel, or the target system is instructed to re-build the execution tree.

Machine code is then generated for instructing a processor to execute the instructions specified by the selected kernel. In some embodiments, the selected kernel is adapted to the shape and type of the operands. The generated machine code is then used for instructing an embedded processor to perform the retrieved operation.

Generation and Selection of Machine-Learned Model

FIG. 8 is a flow diagram of a process for generating and selecting a model architecture, in accordance with an embodiment.

The model generator 210 generates 810 a model. In some embodiments, the model generator 210 generates a first model based on a preset model generation scheme.

The performance evaluator 280 estimates 815 the performance of the model generated by the model generator 210. For example, the performance evaluator estimates the naïve FLOPS, naïve memory allocation, and naïve memory bandwidth. If the estimated performance on any of these 3 metrics is lower than a specified performance, the process advances to step 860, where a new model is generated by the model generator 210 based on the performance of the previous model. If the estimated performance is lower than a specified performance, the process advances to step 860, where a new model is generated by the model generator 210 based on the performance of the previous model.

The model compiler 230 generates an intermediate representation of the model generated by the model generator. The intermediate representation of the model is agnostic to the platform the model will be used in. Moreover, model compiler 230 generates the intermediate representation of the model before the model has been trained. That is, the model compiler 230 generates the intermediate representation of the model based on default or randomized parameters. As such, the system is able to test multiple models without having to wait for the model to be trained, thus reducing the amount of computing power and time used to test and select a model for using in the target system.

The virtual machine 240 generates machine code from the intermediate representation of the model. In some embodiments, the virtual machine 240 generates machine code for the entire model. In other embodiments, the virtual machine 240 generates machine code for portions of the model.

Based on the generated machine code, the performance evaluator 280 measures 840 the performance of the model. In some embodiments, the performance evaluator 280 emulates the machine code for determining the expected performance of the model. In other embodiments, the performance evaluator 280 instructs a physical system to perform the instructions included in the machine code and evaluates the performance of the model as the instructions are executed. In other embodiments, the performance evaluator 280 directly determines an expected performance of the model based on the machine code and information known about the target platform, such as amount of memory available in the target platform and computing power of the embedded processor.

The performance evaluator 280 determines a latency in performing the machine code, a frame rate at which the machine code can be executed, an amount of power for executing the machine code, and an amount of resources used by executing the machine code. In some embodiments, the performance evaluator 280 determines if the machine code can be performed within a specified performance (e.g., with a 10 ms latency and a 60 frames per second frame rate). In other embodiments, the performance evaluator 280 determines a score for the model.

Based on the performance of the model as determined by the performance evaluator 280, the model generator 210 generates a new model and steps 820 through 840 are repeated. The model generator 210 may further generate the new model based on the intermediate representation of the previous model. For example, if the performance evaluator 280 determines that the model cannot be performed at a frame rate of 60 FPS, the model generator 210 generates a new model that includes fewer layers in the neural network. In some embodiments, the system generates new models until a model that meets a desired performance is generated. In other embodiments, the system generates a set number of model and selects a subset of models with the highest performance for further testing. In one embodiments, heuristics are used to select a subset of models that perform within the desired performance.

If the model performs with a least a desired performance, the model is trained 850. In some embodiments, the model is further evaluated 855 after the model has been trained. After training the model, an intermediate representation of the trained model is generated, and the trained model is tested. The system may select and train multiple models that perform within the desired performance characteristics. The selected models are tested and one of the tested models is selected for deployment to the target platform. In some embodiments, the selected models are evaluated 855 for accuracy and the most accurate model is selected for deployment to the target platform. In some embodiments, if the accuracy of the model is lower than a specified performance, the process advances to step 860 where a new model is generated by the model generator 210 based on the performance/accuracy of the previous model.

Deployment System Architecture of Machine-Learned Model

FIG. 9 illustrates a deployment system architecture of the machine-learned model in the autonomous control system, in accordance with an embodiment.

The deployment system stores the intermediate representation of the trained model generated by the model generation system 140. The intermediate representation is stored in the storage module 180 (e.g., a hard disk drive or a solid state drive) of the autonomous control system. The storage module 180 further stores code for performing the functions of the virtual machine.

The virtual machine 240 generates machine code 245 from the intermediate representation 235 stored in the storage model 180. The virtual machine 240 generates the machine code 245 using the kernels stored in the storage model 180. The code executor 250 receives the machine code 245 generated by the virtual machine and instructs the embedded processor to execute the set of instruction listed in the machine code 245. The processor, such as CPU 150 or GPU 160 of the autonomous control system, executes the generated machine code for applying the model on data captured by sensors 190. For example, the GPU 160 performs the generated machine code using images captured by an imaging sensor of the autonomous control system.

CONCLUSION

The foregoing description of the embodiments has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the patent rights to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.

Some portions of this description describe the embodiments in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.

Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.

Embodiments may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Embodiments may also relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may include any embodiment of a computer program product or other data combination described herein.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the patent rights. It is therefore intended that the scope of the patent rights be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the patent rights, which is set forth in the following claims.

Claims

1. A method for generating a machine-learned model comprising: generating an untrained model;generating an intermediate representation of the untrained model, the intermediate representation in an intermediate language compatible with a virtual machine;evaluating the performance of the untrained model, wherein evaluating the performance includes at least one of determining a latency in applying the untrained model in a target system, determining a frequency at which the untrained model can be applied in the target system, determining an amount of resources used by the untrained model, and determining an amount of power consumed by the target system using the untrained model;iteratively generating and evaluating new untrained models, a new untrained model generated based on a performance of a previous model;selecting a subset of models based on a performance of the generated new untrained models;training the selected subset of models, thereby generating trained models;evaluating an accuracy for each of the trained models; andselecting a trained model based on the performance evaluation of the trained models for deployment to the target system.
2. The method of claim 1, wherein determining an amount of resources used by the untrained model comprises: determining a number of floating point operations used by the untrained model when implemented with default kernels.
3. The method of claim 1, wherein determining an amount of resources used by the untrained model comprises: determining a number of floating point operations used by the untrained model when implemented with optimized kernels.
4. The method of claim 1, wherein determining an amount of resources used by the untrained model comprises: determining a total amount of memory used by the untrained model; anddetermining a total memory bandwidth used by the untrained model.
5. The method of claim 1, wherein determining an amount of resources used by the untrained model comprises: determining an amount of memory used by the untrained model after parameters and variables used by the untrained model have been scheduled; anddetermining a memory bandwidth used by the untrained model after the parameters and variables used by the untrained model have been scheduled and after operations of the untrained model have been scheduled.
6. The method of claim 1, further comprising generating an intermediate representation of the trained models.
7. The method of claim 1, wherein selecting the subset of models comprises electing a first subset of models that perform with at least a specified performance.
8. The method of claim 4, further comprising reducing a number of untrained models based on heuristics to identify the models to be trained.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/703,837, filed Jul. 26, 2018, which is incorporated by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under SBIR Phase II Grant Award No. 1758546 awarded by the National Science Foundation. The government has certain rights in the invention.

US Referenced Citations (604)

Number	Name	Date	Kind
6882755	Silverstein et al.	May 2005	B2
7209031	Nakai et al.	Apr 2007	B2
7747070	Puri	Jun 2010	B2
7904867	Burch et al.	Mar 2011	B2
7974492	Nishijima	Jul 2011	B2
8165380	Choi et al.	Apr 2012	B2
8190537	Singh	May 2012	B1
8369633	Lu et al.	Feb 2013	B2
8406515	Cheatle et al.	Mar 2013	B2
8509478	Haas et al.	Aug 2013	B2
8588470	Rodriguez et al.	Nov 2013	B2
8620837	Ghani	Dec 2013	B2
8744174	Hamada et al.	Jun 2014	B2
8773498	Lindbergh	Jul 2014	B2
8912476	Fogg et al.	Dec 2014	B2
8913830	Sun et al.	Dec 2014	B2
8928753	Han et al.	Jan 2015	B2
8972095	Furuno et al.	Mar 2015	B2
8976269	Duong	Mar 2015	B2
9008422	Eid et al.	Apr 2015	B2
9081385	Ferguson et al.	Jul 2015	B1
9275289	Li et al.	Mar 2016	B2
9437189	Epstein	Sep 2016	B2
9586455	Sugai et al.	Mar 2017	B2
9672437	McCarthy	Jun 2017	B2
9710696	Wang et al.	Jul 2017	B2
9738223	Zhang et al.	Aug 2017	B2
9754154	Craig et al.	Sep 2017	B2
9767369	Furman et al.	Sep 2017	B2
9965865	Agrawal et al.	May 2018	B1
10133273	Linke	Nov 2018	B2
10140252	Fowers et al.	Nov 2018	B2
10140544	Zhao et al.	Nov 2018	B1
10146225	Ryan	Dec 2018	B2
10152655	Krishnamurthy et al.	Dec 2018	B2
10167800	Chung et al.	Jan 2019	B1
10169680	Sachdeva et al.	Jan 2019	B1
10192016	Ng et al.	Jan 2019	B2
10216189	Haynes	Feb 2019	B1
10228693	Micks et al.	Mar 2019	B2
10242293	Shim et al.	Mar 2019	B2
10248121	VandenBerg, III	Apr 2019	B2
10262218	Lee et al.	Apr 2019	B2
10282623	Ziyaee et al.	May 2019	B1
10296828	Viswanathan	May 2019	B2
10303961	Stoffel et al.	May 2019	B1
10310087	Laddha et al.	Jun 2019	B2
10311312	Yu et al.	Jun 2019	B2
10318848	Dijkman et al.	Jun 2019	B2
10325178	Tang et al.	Jun 2019	B1
10331974	Zia et al.	Jun 2019	B2
10338600	Yoon et al.	Jul 2019	B2
10343607	Kumon et al.	Jul 2019	B2
10359783	Williams et al.	Jul 2019	B2
10366290	Wang et al.	Jul 2019	B2
10372130	Kaushansky et al.	Aug 2019	B1
10373019	Nariyambut Murali et al.	Aug 2019	B2
10373026	Kim et al.	Aug 2019	B1
10380741	Yedla et al.	Aug 2019	B2
10394237	Xu et al.	Aug 2019	B2
10395144	Zeng et al.	Aug 2019	B2
10402646	Klaus	Sep 2019	B2
10402986	Ray et al.	Sep 2019	B2
10414395	Sapp et al.	Sep 2019	B1
10423934	Zanghi et al.	Sep 2019	B1
10436615	Agarwal et al.	Oct 2019	B2
10452905	Segalovitz et al.	Oct 2019	B2
10460053	Olson et al.	Oct 2019	B2
10467459	Chen et al.	Nov 2019	B2
10468008	Beckman et al.	Nov 2019	B2
10468062	Levinson et al.	Nov 2019	B1
10470510	Koh et al.	Nov 2019	B1
10474160	Huang et al.	Nov 2019	B2
10474161	Huang et al.	Nov 2019	B2
10474928	Sivakumar et al.	Nov 2019	B2
10489126	Kumar et al.	Nov 2019	B2
10489972	Atsmon	Nov 2019	B2
10503971	Dang et al.	Dec 2019	B1
10514711	Bar-Nahum et al.	Dec 2019	B2
10528824	Zou	Jan 2020	B2
10529078	Abreu et al.	Jan 2020	B2
10529088	Fine et al.	Jan 2020	B2
10534854	Sharma et al.	Jan 2020	B2
10535191	Sachdeva et al.	Jan 2020	B2
10542930	Sanchez et al.	Jan 2020	B1
10546197	Shrestha et al.	Jan 2020	B2
10546217	Albright et al.	Jan 2020	B2
10552682	Jonsson et al.	Feb 2020	B2
10559386	Neuman	Feb 2020	B1
10565475	Lecue et al.	Feb 2020	B2
10567674	Kirsch	Feb 2020	B2
10568570	Sherpa et al.	Feb 2020	B1
10572717	Zhu et al.	Feb 2020	B1
10574905	Srikanth et al.	Feb 2020	B2
10579058	Oh et al.	Mar 2020	B2
10579063	Haynes et al.	Mar 2020	B2
10579897	Redmon et al.	Mar 2020	B2
10586280	McKenna et al.	Mar 2020	B2
10591914	Palanisamy et al.	Mar 2020	B2
10592785	Zhu et al.	Mar 2020	B2
10599701	Liu	Mar 2020	B2
10599930	Lee et al.	Mar 2020	B2
10599958	He et al.	Mar 2020	B2
10606990	Tuli et al.	Mar 2020	B2
10609434	Singhai et al.	Mar 2020	B2
10614344	Anthony et al.	Apr 2020	B2
10621513	Deshpande et al.	Apr 2020	B2
10627818	Sapp et al.	Apr 2020	B2
10628432	Guo et al.	Apr 2020	B2
10628686	Ogale et al.	Apr 2020	B2
10628688	Kim et al.	Apr 2020	B1
10629080	Kazemi et al.	Apr 2020	B2
10636161	Uchigaito	Apr 2020	B2
10636169	Estrada et al.	Apr 2020	B2
10642275	Silva et al.	May 2020	B2
10645344	Marman et al.	May 2020	B2
10649464	Gray	May 2020	B2
10650071	Asgekar et al.	May 2020	B2
10652565	Zhang et al.	May 2020	B1
10656657	Djuric et al.	May 2020	B2
10657391	Chen et al.	May 2020	B2
10657418	Marder et al.	May 2020	B2
10657934	Kolen et al.	May 2020	B1
10661902	Tavshikar	May 2020	B1
10664750	Greene	May 2020	B2
10671082	Huang et al.	Jun 2020	B2
10671886	Price et al.	Jun 2020	B2
10678244	Iandola et al.	Jun 2020	B2
10678839	Gordon et al.	Jun 2020	B2
10678997	Ahuja et al.	Jun 2020	B2
10679129	Baker	Jun 2020	B2
10685159	Su et al.	Jun 2020	B2
10685188	Zhang et al.	Jun 2020	B1
10692000	Surazhsky et al.	Jun 2020	B2
10692242	Morrison et al.	Jun 2020	B1
10693740	Coccia et al.	Jun 2020	B2
10698868	Guggilla et al.	Jun 2020	B2
10699119	Lo et al.	Jun 2020	B2
10699140	Kench et al.	Jun 2020	B2
10699477	Levinson et al.	Jun 2020	B2
10713502	Tiziani	Jul 2020	B2
10719759	Kutliroff	Jul 2020	B2
10725475	Yang et al.	Jul 2020	B2
10726264	Sawhney et al.	Jul 2020	B2
10726279	Kim et al.	Jul 2020	B1
10726374	Engineer et al.	Jul 2020	B1
10732261	Wang et al.	Aug 2020	B1
10733262	Miller et al.	Aug 2020	B2
10733482	Lee et al.	Aug 2020	B1
10733638	Jain et al.	Aug 2020	B1
10733755	Liao et al.	Aug 2020	B2
10733876	Moura et al.	Aug 2020	B2
10740563	Dugan	Aug 2020	B2
10740914	Xiao et al.	Aug 2020	B2
10748062	Rippel et al.	Aug 2020	B2
10748247	Paluri	Aug 2020	B2
10751879	Li et al.	Aug 2020	B2
10755112	Mabuchi	Aug 2020	B2
10755575	Johnston et al.	Aug 2020	B2
10757330	Ashrafi	Aug 2020	B2
10762396	Vallespi et al.	Sep 2020	B2
10768628	Martin et al.	Sep 2020	B2
10768629	Song et al.	Sep 2020	B2
10769446	Chang et al.	Sep 2020	B2
10769483	Nirenberg et al.	Sep 2020	B2
10769493	Yu et al.	Sep 2020	B2
10769494	Xiao et al.	Sep 2020	B2
10769525	Redding et al.	Sep 2020	B2
10776626	Lin et al.	Sep 2020	B1
10776673	Kim et al.	Sep 2020	B2
10776939	Ma et al.	Sep 2020	B2
10779760	Lee et al.	Sep 2020	B2
10783381	Yu et al.	Sep 2020	B2
10783454	Shoaib et al.	Sep 2020	B2
10789402	Vemuri et al.	Sep 2020	B1
10789544	Fiedel et al.	Sep 2020	B2
10790919	Kolen et al.	Sep 2020	B1
10796221	Zhang et al.	Oct 2020	B2
10796355	Price et al.	Oct 2020	B1
10796423	Goja	Oct 2020	B2
10798368	Briggs et al.	Oct 2020	B2
10803325	Bai et al.	Oct 2020	B2
10803328	Bai et al.	Oct 2020	B1
10803743	Abari et al.	Oct 2020	B2
10805629	Liu et al.	Oct 2020	B2
10809730	Chintakindi	Oct 2020	B2
10810445	Kangaspunta	Oct 2020	B1
10816346	Wheeler et al.	Oct 2020	B2
10816992	Chen	Oct 2020	B2
10817731	Vallespi et al.	Oct 2020	B2
10817732	Porter et al.	Oct 2020	B2
10819923	McCauley et al.	Oct 2020	B1
10824122	Mummadi et al.	Nov 2020	B2
10824862	Qi et al.	Nov 2020	B2
10828790	Nemallan	Nov 2020	B2
10832057	Chan et al.	Nov 2020	B2
10832093	Taralova et al.	Nov 2020	B1
10832414	Pfeiffer	Nov 2020	B2
10832418	Karasev et al.	Nov 2020	B1
10833785	O'Shea et al.	Nov 2020	B1
10836379	Xiao et al.	Nov 2020	B2
10838936	Cohen	Nov 2020	B2
10839230	Charette et al.	Nov 2020	B2
10839578	Coppersmith et al.	Nov 2020	B2
10843628	Kawamoto et al.	Nov 2020	B2
10845820	Wheeler	Nov 2020	B2
10845943	Ansari et al.	Nov 2020	B1
10846831	Raduta	Nov 2020	B2
10846888	Kaplanyan et al.	Nov 2020	B2
10853670	Sholingar et al.	Dec 2020	B2
10853739	Truong et al.	Dec 2020	B2
10860919	Kanazawa et al.	Dec 2020	B2
10860924	Burger	Dec 2020	B2
10867444	Russell et al.	Dec 2020	B2
10871444	Al et al.	Dec 2020	B2
10871782	Milstein et al.	Dec 2020	B2
10872204	Zhu et al.	Dec 2020	B2
10872254	Mangla et al.	Dec 2020	B2
10872326	Garner	Dec 2020	B2
10872531	Liu et al.	Dec 2020	B2
10885083	Moeller-Bertram et al.	Jan 2021	B2
10887433	Fu et al.	Jan 2021	B2
10890898	Akella et al.	Jan 2021	B2
10891715	Li	Jan 2021	B2
10891735	Yang et al.	Jan 2021	B2
10893070	Wang et al.	Jan 2021	B2
10893107	Callari et al.	Jan 2021	B1
10896763	Kempanna et al.	Jan 2021	B2
10901416	Khanna et al.	Jan 2021	B2
10901508	Laszlo et al.	Jan 2021	B2
10902551	Mellado et al.	Jan 2021	B1
10908068	Amer et al.	Feb 2021	B2
10908606	Stein et al.	Feb 2021	B2
10909368	Guo et al.	Feb 2021	B2
10909453	Myers et al.	Feb 2021	B1
10915783	Hallman et al.	Feb 2021	B1
10917522	Segalis et al.	Feb 2021	B2
10921817	Kangaspunta	Feb 2021	B1
10922578	Banerjee et al.	Feb 2021	B2
10924661	Vasconcelos et al.	Feb 2021	B2
10928508	Swaminathan	Feb 2021	B2
10929757	Baker et al.	Feb 2021	B2
10930065	Grant et al.	Feb 2021	B2
10936908	Ho et al.	Mar 2021	B1
10937186	Wang et al.	Mar 2021	B2
10943101	Agarwal et al.	Mar 2021	B2
10943132	Wang et al.	Mar 2021	B2
10943355	Fagg et al.	Mar 2021	B2
11636333	Sidhu	Apr 2023	B2
20030035481	Hahm	Feb 2003	A1
20050162445	Sheasby et al.	Jul 2005	A1
20060072847	Chor et al.	Apr 2006	A1
20060224533	Thaler	Oct 2006	A1
20060280364	Ma et al.	Dec 2006	A1
20070156392	Balchandran	Jul 2007	A1
20090016571	Tijerina et al.	Jan 2009	A1
20100118157	Kameyama	May 2010	A1
20120042299	Perrin	Feb 2012	A1
20120109915	Kamekawa	May 2012	A1
20120110491	Cheung	May 2012	A1
20120134595	Fonseca et al.	May 2012	A1
20150104102	Carreira et al.	Apr 2015	A1
20150379429	Lee	Dec 2015	A1
20160110657	Gibiansky	Apr 2016	A1
20160132786	Balan et al.	May 2016	A1
20160162456	Munro	Jun 2016	A1
20160328856	Mannino et al.	Nov 2016	A1
20170011281	Dihkman et al.	Jan 2017	A1
20170158134	Shigemura	Jun 2017	A1
20170193066	Zhu	Jul 2017	A1
20170193392	Liu	Jul 2017	A1
20170206434	Nariyambut et al.	Jul 2017	A1
20170364931	Khavronin	Dec 2017	A1
20180012411	Richey et al.	Jan 2018	A1
20180018590	Szeto et al.	Jan 2018	A1
20180039853	Liu et al.	Feb 2018	A1
20180067489	Oder et al.	Mar 2018	A1
20180068459	Zhang et al.	Mar 2018	A1
20180068540	Romanenko et al.	Mar 2018	A1
20180074506	Branson	Mar 2018	A1
20180121762	Han et al.	May 2018	A1
20180150081	Gross et al.	May 2018	A1
20180181875	Motohashi	Jun 2018	A1
20180211403	Hotson et al.	Jul 2018	A1
20180308012	Mummadi et al.	Oct 2018	A1
20180314878	Lee et al.	Nov 2018	A1
20180357511	Misra et al.	Dec 2018	A1
20180374105	Azout et al.	Dec 2018	A1
20190023277	Roger et al.	Jan 2019	A1
20190025773	Yang et al.	Jan 2019	A1
20190042894	Anderson	Feb 2019	A1
20190042919	Peysakhovich et al.	Feb 2019	A1
20190042944	Nair et al.	Feb 2019	A1
20190042948	Lee et al.	Feb 2019	A1
20190043070	Merrill	Feb 2019	A1
20190057314	Julian et al.	Feb 2019	A1
20190065637	Bogdoll et al.	Feb 2019	A1
20190065989	Kida	Feb 2019	A1
20190072978	Levi	Mar 2019	A1
20190079526	Vallespi et al.	Mar 2019	A1
20190080602	Rice et al.	Mar 2019	A1
20190095780	Zhong et al.	Mar 2019	A1
20190095946	Azout et al.	Mar 2019	A1
20190101914	Coleman et al.	Apr 2019	A1
20190108417	Talagala et al.	Apr 2019	A1
20190122111	Min et al.	Apr 2019	A1
20190130255	Yim et al.	May 2019	A1
20190130305	Sivertson	May 2019	A1
20190145765	Luo et al.	May 2019	A1
20190146497	Urtasun et al.	May 2019	A1
20190147112	Gordon	May 2019	A1
20190147250	Zhang et al.	May 2019	A1
20190147254	Bai et al.	May 2019	A1
20190147255	Homayounfar et al.	May 2019	A1
20190147335	Wang et al.	May 2019	A1
20190147372	Luo et al.	May 2019	A1
20190158784	Ahn et al.	May 2019	A1
20190179903	Terry	Jun 2019	A1
20190180154	Orlov et al.	Jun 2019	A1
20190185010	Ganguli et al.	Jun 2019	A1
20190189251	Horiuchi et al.	Jun 2019	A1
20190197357	Anderson et al.	Jun 2019	A1
20190204842	Jafari et al.	Jul 2019	A1
20190205402	Sernau et al.	Jul 2019	A1
20190205667	Avidan et al.	Jul 2019	A1
20190217791	Bradley et al.	Jul 2019	A1
20190227562	Mohammadiha et al.	Jul 2019	A1
20190228037	Nicol et al.	Jul 2019	A1
20190230282	Sypitkowski et al.	Jul 2019	A1
20190235499	Kazemi et al.	Aug 2019	A1
20190236437	Shin et al.	Aug 2019	A1
20190243371	Nister et al.	Aug 2019	A1
20190244138	Bhowmick et al.	Aug 2019	A1
20190250622	Nister et al.	Aug 2019	A1
20190250626	Ghafarianzadeh et al.	Aug 2019	A1
20190250640	O'Flaherty et al.	Aug 2019	A1
20190258878	Koivisto et al.	Aug 2019	A1
20190266418	Xu et al.	Aug 2019	A1
20190266610	Ghatage et al.	Aug 2019	A1
20190272446	Kangaspunta et al.	Sep 2019	A1
20190276041	Choi et al.	Sep 2019	A1
20190279004	Kwon et al.	Sep 2019	A1
20190286652	Habbecke et al.	Sep 2019	A1
20190286972	El Husseini et al.	Sep 2019	A1
20190287028	St Amant et al.	Sep 2019	A1
20190289281	Badrinarayanan et al.	Sep 2019	A1
20190294177	Kwon et al.	Sep 2019	A1
20190294975	Sachs	Sep 2019	A1
20190311290	Huang et al.	Oct 2019	A1
20190318099	Carvalho et al.	Oct 2019	A1
20190325088	Dubey et al.	Oct 2019	A1
20190325266	Klepper et al.	Oct 2019	A1
20190325269	Bagherinezhad et al.	Oct 2019	A1
20190325580	Lukac et al.	Oct 2019	A1
20190325595	Stein et al.	Oct 2019	A1
20190329790	Nandakumar et al.	Oct 2019	A1
20190332875	Vallespi-Gonzalez et al.	Oct 2019	A1
20190333232	Vallespi-Gonzalez et al.	Oct 2019	A1
20190336063	Dascalu	Nov 2019	A1
20190339989	Liang et al.	Nov 2019	A1
20190340462	Pao et al.	Nov 2019	A1
20190340492	Burger et al.	Nov 2019	A1
20190340499	Burger et al.	Nov 2019	A1
20190340524	Kirchhoff	Nov 2019	A1
20190347501	Kim et al.	Nov 2019	A1
20190349571	Herman et al.	Nov 2019	A1
20190354782	Kee et al.	Nov 2019	A1
20190354786	Lee et al.	Nov 2019	A1
20190354808	Park et al.	Nov 2019	A1
20190354817	Shlens et al.	Nov 2019	A1
20190354850	Watson et al.	Nov 2019	A1
20190370398	He et al.	Dec 2019	A1
20190370575	Nandakumar et al.	Dec 2019	A1
20190370935	Chang et al.	Dec 2019	A1
20190373322	Rojas-Echenique et al.	Dec 2019	A1
20190377345	Bachrach et al.	Dec 2019	A1
20190377965	Totolos et al.	Dec 2019	A1
20190378049	Widmann et al.	Dec 2019	A1
20190378051	Widmann et al.	Dec 2019	A1
20190382007	Casas et al.	Dec 2019	A1
20190384303	Muller et al.	Dec 2019	A1
20190384304	Towal et al.	Dec 2019	A1
20190384309	Silva et al.	Dec 2019	A1
20190384994	Frossard et al.	Dec 2019	A1
20190385048	Cassidy et al.	Dec 2019	A1
20190385360	Yang et al.	Dec 2019	A1
20200004259	Gulino et al.	Jan 2020	A1
20200004351	Marchant et al.	Jan 2020	A1
20200012936	Lee et al.	Jan 2020	A1
20200017117	Milton	Jan 2020	A1
20200025931	Liang et al.	Jan 2020	A1
20200026282	Choe et al.	Jan 2020	A1
20200026283	Barnes et al.	Jan 2020	A1
20200026992	Zhang et al.	Jan 2020	A1
20200027210	Haemel et al.	Jan 2020	A1
20200033858	Xiao	Jan 2020	A1
20200033865	Mellinger et al.	Jan 2020	A1
20200034665	Ghanta et al.	Jan 2020	A1
20200034710	Sidhu et al.	Jan 2020	A1
20200036948	Song	Jan 2020	A1
20200039520	Misu et al.	Feb 2020	A1
20200051550	Baker	Feb 2020	A1
20200060757	Ben-Haim et al.	Feb 2020	A1
20200065711	Clément et al.	Feb 2020	A1
20200065879	Hu et al.	Feb 2020	A1
20200069973	Lou et al.	Mar 2020	A1
20200073385	Jobanputra et al.	Mar 2020	A1
20200074230	Englard et al.	Mar 2020	A1
20200086880	Poeppel et al.	Mar 2020	A1
20200089243	Poeppel et al.	Mar 2020	A1
20200089969	Lakshmi et al.	Mar 2020	A1
20200090056	Singhal et al.	Mar 2020	A1
20200097841	Petousis et al.	Mar 2020	A1
20200098095	Borcs et al.	Mar 2020	A1
20200103894	Cella et al.	Apr 2020	A1
20200104705	Bhowmick et al.	Apr 2020	A1
20200110416	Hong et al.	Apr 2020	A1
20200117180	Cella et al.	Apr 2020	A1
20200117889	Laput et al.	Apr 2020	A1
20200117916	Liu	Apr 2020	A1
20200117917	Yoo	Apr 2020	A1
20200118035	Asawa et al.	Apr 2020	A1
20200125844	She et al.	Apr 2020	A1
20200125845	Hess et al.	Apr 2020	A1
20200126129	Lkhamsuren et al.	Apr 2020	A1
20200134427	Oh et al.	Apr 2020	A1
20200134461	Chai et al.	Apr 2020	A1
20200134466	Weintraub et al.	Apr 2020	A1
20200134848	El-Khamy et al.	Apr 2020	A1
20200143231	Fusi et al.	May 2020	A1
20200143279	West et al.	May 2020	A1
20200148201	King et al.	May 2020	A1
20200149898	Felip et al.	May 2020	A1
20200151201	Chandrasekhar et al.	May 2020	A1
20200151619	Mopur et al.	May 2020	A1
20200151692	Gao et al.	May 2020	A1
20200158822	Owens et al.	May 2020	A1
20200158869	Amirloo et al.	May 2020	A1
20200159225	Zeng et al.	May 2020	A1
20200160064	Wang et al.	May 2020	A1
20200160104	Urtasun et al.	May 2020	A1
20200160117	Urtasun et al.	May 2020	A1
20200160178	Kar et al.	May 2020	A1
20200160532	Urtasun et al.	May 2020	A1
20200160558	Urtasun et al.	May 2020	A1
20200160559	Urtasun et al.	May 2020	A1
20200160598	Manivasagam et al.	May 2020	A1
20200162489	Bar-Nahum et al.	May 2020	A1
20200167438	Herring	May 2020	A1
20200167554	Wang et al.	May 2020	A1
20200174481	Van Heukelom et al.	Jun 2020	A1
20200175326	Shen et al.	Jun 2020	A1
20200175354	Volodarskiy et al.	Jun 2020	A1
20200175371	Kursun	Jun 2020	A1
20200175401	Shen	Jun 2020	A1
20200183482	Sebot et al.	Jun 2020	A1
20200184250	Oko	Jun 2020	A1
20200184333	Oh	Jun 2020	A1
20200192389	ReMine et al.	Jun 2020	A1
20200193313	Ghanta et al.	Jun 2020	A1
20200193328	Guestrin et al.	Jun 2020	A1
20200202136	Shrestha et al.	Jun 2020	A1
20200202196	Guo et al.	Jun 2020	A1
20200209857	Djuric et al.	Jul 2020	A1
20200209867	Valois et al.	Jul 2020	A1
20200209874	Chen et al.	Jul 2020	A1
20200210717	Hou et al.	Jul 2020	A1
20200210769	Hou et al.	Jul 2020	A1
20200210777	Valois et al.	Jul 2020	A1
20200216064	du Toit et al.	Jul 2020	A1
20200218722	Mai et al.	Jul 2020	A1
20200218979	Kwon et al.	Jul 2020	A1
20200223434	Campos et al.	Jul 2020	A1
20200225758	Tang et al.	Jul 2020	A1
20200226377	Campos et al.	Jul 2020	A1
20200226430	Ahuja et al.	Jul 2020	A1
20200238998	Dasalukunte et al.	Jul 2020	A1
20200242381	Chao et al.	Jul 2020	A1
20200242408	Kim et al.	Jul 2020	A1
20200242511	Kale et al.	Jul 2020	A1
20200245869	Sivan et al.	Aug 2020	A1
20200249685	Elluswamy et al.	Aug 2020	A1
20200250456	Wang et al.	Aug 2020	A1
20200250515	Rifkin et al.	Aug 2020	A1
20200250874	Assouline et al.	Aug 2020	A1
20200257301	Weiser et al.	Aug 2020	A1
20200257306	Nisenzon	Aug 2020	A1
20200258057	Farahat et al.	Aug 2020	A1
20200265247	Musk et al.	Aug 2020	A1
20200272160	Djuric et al.	Aug 2020	A1
20200272162	Hasselgren et al.	Aug 2020	A1
20200272859	Iashyn et al.	Aug 2020	A1
20200273231	Schied et al.	Aug 2020	A1
20200279354	Klaiman	Sep 2020	A1
20200279364	Sarkisian et al.	Sep 2020	A1
20200279371	Wenzel et al.	Sep 2020	A1
20200285464	Brebner	Sep 2020	A1
20200286256	Houts et al.	Sep 2020	A1
20200293786	Jia et al.	Sep 2020	A1
20200293796	Sajjadi et al.	Sep 2020	A1
20200293828	Wang et al.	Sep 2020	A1
20200293905	Huang et al.	Sep 2020	A1
20200294162	Shah	Sep 2020	A1
20200294257	Yoo et al.	Sep 2020	A1
20200294310	Lee et al.	Sep 2020	A1
20200297237	Tamersoy et al.	Sep 2020	A1
20200298891	Liang et al.	Sep 2020	A1
20200301799	Manivasagam et al.	Sep 2020	A1
20200302276	Yang et al.	Sep 2020	A1
20200302291	Hong	Sep 2020	A1
20200302627	Duggal et al.	Sep 2020	A1
20200302662	Homayounfar et al.	Sep 2020	A1
20200304441	Bradley et al.	Sep 2020	A1
20200306640	Kolen et al.	Oct 2020	A1
20200307562	Ghafarianzadeh et al.	Oct 2020	A1
20200307563	Ghafarianzadeh et al.	Oct 2020	A1
20200309536	Omari et al.	Oct 2020	A1
20200309923	Bhaskaran et al.	Oct 2020	A1
20200310442	Halder et al.	Oct 2020	A1
20200311601	Robinson et al.	Oct 2020	A1
20200312003	Borovikov et al.	Oct 2020	A1
20200315708	Mosnier et al.	Oct 2020	A1
20200320132	Neumann	Oct 2020	A1
20200324073	Rajan et al.	Oct 2020	A1
20200327192	Hackman et al.	Oct 2020	A1
20200327443	Van et al.	Oct 2020	A1
20200327449	Tiwari et al.	Oct 2020	A1
20200327662	Liu et al.	Oct 2020	A1
20200327667	Arbel et al.	Oct 2020	A1
20200331476	Chen et al.	Oct 2020	A1
20200334416	Vianu et al.	Oct 2020	A1
20200334495	Al et al.	Oct 2020	A1
20200334501	Lin et al.	Oct 2020	A1
20200334551	Javidi et al.	Oct 2020	A1
20200334574	Ishida	Oct 2020	A1
20200337648	Saripalli et al.	Oct 2020	A1
20200341466	Pham et al.	Oct 2020	A1
20200342350	Madar et al.	Oct 2020	A1
20200342548	Mazed et al.	Oct 2020	A1
20200342652	Rowell et al.	Oct 2020	A1
20200348909	Das Sarma et al.	Nov 2020	A1
20200350063	Thornton et al.	Nov 2020	A1
20200351438	Dewhurst et al.	Nov 2020	A1
20200356107	Wells	Nov 2020	A1
20200356790	Jaipuria et al.	Nov 2020	A1
20200356864	Neumann	Nov 2020	A1
20200356905	Luk et al.	Nov 2020	A1
20200361083	Mousavian et al.	Nov 2020	A1
20200361485	Zhu et al.	Nov 2020	A1
20200364481	Kornienko et al.	Nov 2020	A1
20200364508	Gurel et al.	Nov 2020	A1
20200364540	Elsayed et al.	Nov 2020	A1
20200364746	Longano et al.	Nov 2020	A1
20200364953	Simoudis	Nov 2020	A1
20200372362	Kim	Nov 2020	A1
20200372402	Kursun et al.	Nov 2020	A1
20200380362	Cao et al.	Dec 2020	A1
20200380383	Kwong et al.	Dec 2020	A1
20200393841	Frisbie et al.	Dec 2020	A1
20200394421	Yu et al.	Dec 2020	A1
20200394457	Brady	Dec 2020	A1
20200394495	Moudgill et al.	Dec 2020	A1
20200394813	Theverapperuma et al.	Dec 2020	A1
20200396394	Zlokolica et al.	Dec 2020	A1
20200398855	Thompson	Dec 2020	A1
20200401850	Bazarsky et al.	Dec 2020	A1
20200401886	Deng et al.	Dec 2020	A1
20200402155	Kurian et al.	Dec 2020	A1
20200402226	Peng	Dec 2020	A1
20200410012	Moon et al.	Dec 2020	A1
20200410224	Goel	Dec 2020	A1
20200410254	Pham et al.	Dec 2020	A1
20200410288	Capota et al.	Dec 2020	A1
20200410751	Omari et al.	Dec 2020	A1
20210004014	Sivakumar	Jan 2021	A1
20210004580	Sundararaman et al.	Jan 2021	A1
20210004611	Garimella et al.	Jan 2021	A1
20210004663	Park et al.	Jan 2021	A1
20210006835	Slattery et al.	Jan 2021	A1
20210011908	Hayes et al.	Jan 2021	A1
20210012116	Urtasun et al.	Jan 2021	A1
20210012210	Sikka et al.	Jan 2021	A1
20210012230	Hayes et al.	Jan 2021	A1
20210012239	Arzani et al.	Jan 2021	A1
20210015240	Elfakhri et al.	Jan 2021	A1
20210019215	Neeter	Jan 2021	A1
20210026360	Luo	Jan 2021	A1
20210027112	Brewington et al.	Jan 2021	A1
20210027117	McGavran et al.	Jan 2021	A1
20210030276	Li et al.	Feb 2021	A1
20210034921	Pinkovich et al.	Feb 2021	A1
20210042575	Firner	Feb 2021	A1
20210042928	Takeda et al.	Feb 2021	A1
20210046954	Haynes	Feb 2021	A1
20210049378	Gautam et al.	Feb 2021	A1
20210049455	Kursun	Feb 2021	A1
20210049456	Kursun	Feb 2021	A1
20210049548	Grisz et al.	Feb 2021	A1
20210049700	Nguyen et al.	Feb 2021	A1
20210056114	Price et al.	Feb 2021	A1
20210056306	Hu et al.	Feb 2021	A1
20210056317	Golov	Feb 2021	A1
20210056420	Konishi et al.	Feb 2021	A1
20210056701	Vranceanu et al.	Feb 2021	A1

Foreign Referenced Citations (244)

Number	Date	Country
2019261735	Jun 2020	AU
2019201716	Oct 2020	AU
110599537	Dec 2010	CN
102737236	Oct 2012	CN
103366339	Oct 2013	CN
104835114	Aug 2015	CN
103236037	May 2016	CN
103500322	Aug 2016	CN
106419893	Feb 2017	CN
106504253	Mar 2017	CN
107031600	Aug 2017	CN
107169421	Sep 2017	CN
107507134	Dec 2017	CN
107885214	Apr 2018	CN
108122234	Jun 2018	CN
107133943	Jul 2018	CN
107368926	Jul 2018	CN
105318888	Aug 2018	CN
108491889	Sep 2018	CN
108647591	Oct 2018	CN
108710865	Oct 2018	CN
105550701	Nov 2018	CN
108764185	Nov 2018	CN
108845574	Nov 2018	CN
108898177	Nov 2018	CN
109086867	Dec 2018	CN
107103113	Jan 2019	CN
109215067	Jan 2019	CN
109359731	Feb 2019	CN
109389207	Feb 2019	CN
109389552	Feb 2019	CN
106779060	Mar 2019	CN
109579856	Apr 2019	CN
109615073	Apr 2019	CN
106156754	May 2019	CN
106598226	May 2019	CN
106650922	May 2019	CN
109791626	May 2019	CN
109901595	Jun 2019	CN
109902732	Jun 2019	CN
109934163	Jun 2019	CN
109948428	Jun 2019	CN
109949257	Jun 2019	CN
109951710	Jun 2019	CN
109975308	Jul 2019	CN
109978132	Jul 2019	CN
109978161	Jul 2019	CN
110060202	Jul 2019	CN
110069071	Jul 2019	CN
110084086	Aug 2019	CN
110096937	Aug 2019	CN
110111340	Aug 2019	CN
110135485	Aug 2019	CN
110197270	Sep 2019	CN
110310264	Oct 2019	CN
110321965	Oct 2019	CN
110334801	Oct 2019	CN
110399875	Nov 2019	CN
110414362	Nov 2019	CN
110426051	Nov 2019	CN
110473173	Nov 2019	CN
110516665	Nov 2019	CN
110543837	Dec 2019	CN
110569899	Dec 2019	CN
110599864	Dec 2019	CN
110619282	Dec 2019	CN
110619283	Dec 2019	CN
110619330	Dec 2019	CN
110659628	Jan 2020	CN
110688992	Jan 2020	CN
107742311	Feb 2020	CN
110751280	Feb 2020	CN
110826566	Feb 2020	CN
107451659	Apr 2020	CN
108111873	Apr 2020	CN
110956185	Apr 2020	CN
110966991	Apr 2020	CN
111027549	Apr 2020	CN
111027575	Apr 2020	CN
111047225	Apr 2020	CN
111126453	May 2020	CN
111158355	May 2020	CN
107729998	Jun 2020	CN
108549934	Jun 2020	CN
111275129	Jun 2020	CN
111275618	Jun 2020	CN
111326023	Jun 2020	CN
111428943	Jul 2020	CN
111444821	Jul 2020	CN
111445420	Jul 2020	CN
111461052	Jul 2020	CN
111461053	Jul 2020	CN
111461110	Jul 2020	CN
110225341	Aug 2020	CN
111307162	Aug 2020	CN
111488770	Aug 2020	CN
111539514	Aug 2020	CN
111565318	Aug 2020	CN
111582216	Aug 2020	CN
111598095	Aug 2020	CN
108229526	Sep 2020	CN
111693972	Sep 2020	CN
106558058	Oct 2020	CN
107169560	Oct 2020	CN
107622258	Oct 2020	CN
111767801	Oct 2020	CN
111768002	Oct 2020	CN
111783545	Oct 2020	CN
111783971	Oct 2020	CN
111797657	Oct 2020	CN
111814623	Oct 2020	CN
111814902	Oct 2020	CN
111860499	Oct 2020	CN
111881856	Nov 2020	CN
111882579	Nov 2020	CN
111897639	Nov 2020	CN
111898507	Nov 2020	CN
111898523	Nov 2020	CN
111899227	Nov 2020	CN
112101175	Dec 2020	CN
112101562	Dec 2020	CN
112115953	Dec 2020	CN
111062973	Jan 2021	CN
111275080	Jan 2021	CN
112183739	Jan 2021	CN
112232497	Jan 2021	CN
112288658	Jan 2021	CN
112308095	Feb 2021	CN
112308799	Feb 2021	CN
112313663	Feb 2021	CN
112329552	Feb 2021	CN
112348783	Feb 2021	CN
111899245	Mar 2021	CN
202017102235	May 2017	DE
202017102238	May 2017	DE
102017116017	Jan 2019	DE
102018130821	Jun 2020	DE
102019008316	Aug 2020	DE
1215626	Sep 2008	EP
2228666	Sep 2012	EP
2420408	May 2013	EP
2723069	Apr 2014	EP
2741253	Jun 2014	EP
3115772	Jan 2017	EP
2618559	Aug 2017	EP
3285485	Feb 2018	EP
2863633	Feb 2019	EP
3113080	May 2019	EP
3525132	Aug 2019	EP
3531689	Aug 2019	EP
3537340	Sep 2019	EP
3543917	Sep 2019	EP
3608840	Feb 2020	EP
3657387	May 2020	EP
2396750	Jun 2020	EP
3664020	Jun 2020	EP
3690712	Aug 2020	EP
3690742	Aug 2020	EP
3722992	Oct 2020	EP
3690730	Nov 2020	EP
3739486	Nov 2020	EP
3501897	Dec 2020	EP
3751455	Dec 2020	EP
3783527	Feb 2021	EP
2402572	Aug 2005	GB
2548087	Sep 2017	GB
2577485	Apr 2020	GB
2517270	Jun 2020	GB
2578262	Aug 1998	JP
3941252	Jul 2007	JP
4282583	Jun 2009	JP
4300098	Jul 2009	JP
2015004922	Jan 2015	JP
5863536	Feb 2016	JP
6044134	Dec 2016	JP
6525707	Jun 2019	JP
2019101535	Jun 2019	JP
2020101927	Jul 2020	JP
2020173744	Oct 2020	JP
100326702	Feb 2002	KR
101082878	Nov 2011	KR
101738422	May 2017	KR
101969864	Apr 2019	KR
101996167	Jul 2019	KR
102022388	Aug 2019	KR
102043143	Nov 2019	KR
102095335	Mar 2020	KR
102097120	Apr 2020	KR
1020200085490	Jul 2020	KR
102189262	Dec 2020	KR
1020200142266	Dec 2020	KR
200630819	Sep 2006	TW
I294089	Mar 2008	TW
I306207	Feb 2009	TW
WO 02052835	Jul 2002	WO
WO 16032398	Mar 2016	WO
WO 16048108	Mar 2016	WO
WO 16207875	Dec 2016	WO
WO 17158622	Sep 2017	WO
WO 19005547	Jan 2019	WO
WO 19067695	Apr 2019	WO
WO 19089339	May 2019	WO
WO 19092456	May 2019	WO
WO 19099622	May 2019	WO
WO 19122952	Jun 2019	WO
WO 19125191	Jun 2019	WO
WO 19126755	Jun 2019	WO
WO 19144575	Aug 2019	WO
WO 19182782	Sep 2019	WO
WO 19191578	Oct 2019	WO
WO 19216938	Nov 2019	WO
WO 19220436	Nov 2019	WO
WO 20006154	Jan 2020	WO
WO 20012756	Jan 2020	WO
WO 20025696	Feb 2020	WO
WO 20034663	Feb 2020	WO
WO 20056157	Mar 2020	WO
WO 20076356	Apr 2020	WO
WO 20097221	May 2020	WO
WO 20101246	May 2020	WO
WO 20120050	Jun 2020	WO
WO 20121973	Jun 2020	WO
WO 20131140	Jun 2020	WO
WO 20139181	Jul 2020	WO
WO 20139355	Jul 2020	WO
WO 20139357	Jul 2020	WO
WO 20142193	Jul 2020	WO
WO 20146445	Jul 2020	WO
WO 20151329	Jul 2020	WO
WO 20157761	Aug 2020	WO
WO 20163455	Aug 2020	WO
WO 20167667	Aug 2020	WO
WO 20174262	Sep 2020	WO
WO 20177583	Sep 2020	WO
WO 20185233	Sep 2020	WO
WO 20185234	Sep 2020	WO
WO 20195658	Oct 2020	WO
WO 20198189	Oct 2020	WO
WO 20198779	Oct 2020	WO
WO 20205597	Oct 2020	WO
WO 20221200	Nov 2020	WO
WO 20240284	Dec 2020	WO
WO 20260020	Dec 2020	WO
WO 20264010	Dec 2020	WO

Related Publications (1)

	Number	Date	Country
	20230289599 A1	Sep 2023	US

Provisional Applications (1)

	Number	Date	Country
	62703837	Jul 2018	US

Divisions (1)

	Number	Date	Country
Parent	16522411	Jul 2019	US
Child	18183515		US

Optimizing neural network structures for embedded systems

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Term Extension

Abstract