The disclosed implementations relate generally to neural networks, and more specifically to hybrid neural network hardware containing an initial fixed portion (e.g., analog) and a second flexible portion (e.g., digital).
Conventional hardware has failed to keep pace with innovation in neural networks and the growing popularity of machine learning based applications. The complexity of neural networks continues to outpace CPU and GPU computational power as digital microprocessor advances are plateauing. Neuromorphic processors based on spike neural networks, such as Loihi and True North, are limited in their applications. For GPU-like architectures, power and speed are limited by data transmission speed. Data transmission can consume up to 80% of chip power and can significantly impact the speed of calculations. Edge applications demand low power consumption, but there are currently no known performant hardware implementations that have the needed low power consumption (e.g., consume less than 50 milliwatts of power).
The neural network training process presents unique challenges for hardware realization of neural networks. A trained neural network is used for specific inferencing tasks, such as classification or regression. Once a neural network is trained, a hardware equivalent is manufactured. When the neural network is retrained, the hardware manufacturing process is repeated, driving up costs. Although some reconfigurable hardware solutions exist, such hardware cannot be easily mass produced, and costs a lot more (e.g., five times more) than hardware that is not reconfigurable. Conventional neuromorphic analog signal processors have fixed weights, which cannot be adjusted after a chip is manufactured.
Accordingly, there is a need for methods, circuits and/or interfaces that address at least some of the deficiencies identified above. Analog circuits that model trained neural networks and are manufactured according to the techniques described herein, can provide improved performance per watt, can be useful in implementing hardware solutions in edge environments, and can tackle a variety of applications, such as drone navigation and autonomous cars. The cost advantage provided by these manufacturing methods and/or analog network architectures are even more pronounced with larger neural networks. Also, analog hardware implementations of neural networks provide improved parallelism and neuromorphism. Moreover, neuromorphic analog components are not sensitive to noise and temperature changes, when compared to digital counterparts.
Chips manufactured according to the techniques described herein provide an order of magnitude improvement over conventional systems in size, power, and performance, and are ideal for edge environments, including for retraining purposes. Such analog neuromorphic chips can be used to implement edge computing applications or in Internet-of-Things (IoT) environments. Due to the analog hardware, initial processing (e.g., formation of descriptors for image recognition), which can consume over 80-90% of power, can be moved onto a chip, thereby decreasing energy consumption and network load for new applications.
A hybrid approach to neuromorphic computing is described herein, according to some implementations. Similar to a human brain, an artificial neural network can include a fixed part and a flexible part. The flexible part can be changed for a new classification or regression task. According to some implementations, a hybrid neuromorphic analog signal processor combines (i) a fixed part to support fixed weights and (ii) a flexible part responsible for classification or regression. The flexible part may operate on output produced by the fixed part. The flexible part can change based on updated needs after manufacturing. The flexible part may be implemented as arrays of memristors and/or arrays of SuperFlash memory with some determined architecture.
In machine learning, after several hundred training cycles (sometimes referred to as epochs), a deep convolutional neural network typically maintains fixed weights and structure for the first 80-90% of the layers. In the following cycles, only a few last layers that are responsible for classification or regression continue to change weights. This property is also used in transfer learning. This property may be used for implementing the hybrid architecture described herein. A fixed neural network that is responsible for pattern detection (embeddings) is combined with a following flexible algorithm (e.g., a flexible neural network) that is responsible for pattern interpretation. According to some implementations, a hybrid core includes a fixed neuromorphic analog core that is configured to generate embeddings. This part consumes ultra-low power and provides low latency. The hybrid core also includes a flexible part that can be used for final classification or regression.
In some implementations, a hardware device includes an analog circuit and a classifier or regression circuit. The analog circuit corresponds to a portion of a trained neural network. The analog circuit is configured to obtain one or more analog signals from one or more sensors and compute an analog output based on the one or more analog signals. The classifier or regression circuit is coupled to the analog circuit. The classifier or regression circuit is configured to (1) obtain an input signal based on the analog output and (2) apply a machine learning model to the input signal to either (i) classify the input signal according to a plurality of discrete categories or (ii) assign an output on a predefined continuous scale.
In some implementations, the classifier or regression circuit comprises a digital circuit and the hardware apparatus further includes an analog-to-digital converter (ADC) coupled to the analog circuit. The ADC is configured to receive and convert the analog output to a digital input.
In some implementations, the analog output comprises a set of latent embeddings and the classifier or regression circuit applies the machine learning model to the latent embeddings.
In some implementations, the analog circuit comprises a plurality of operational amplifiers and a plurality of resistors. Resistance values of the plurality of resistors are based on weights of neurons in the portion of the trained neural network. The resistors are configured to connect the plurality of operational amplifiers. In some implementations, the analog circuit comprises sputtered resistors in a backend-of-the-line (BEOL).
In some implementations, the classifier or regression circuit comprises one or more digital computing units selected from the group consisting of: CPUs, GPUs, RISCs, FPGAs, and ASICs.
In some implementations, the classifier or regression circuit comprises a processor that is further configured to perform as a digital controller, providing signals to one or more interfaces and multiplexing power within the hardware apparatus.
In some implementations, the classifier or regression circuit comprises a compute-in-memory component and one or more programmable memory tiles.
In some implementations, the classifier or regression circuit comprises a network of memristors.
In some implementations, the trained neural network is an autoencoder comprising an encoder portion, having a plurality of hidden layers that compute a respective representation of each input vector in a lower dimensional space than an input space of the respective input vector, and a decoder portion that reconstructs the respective input vector. The analog circuit corresponds to the encoder portion and the classifier or regression circuit corresponds to the decoder portion.
In some implementations, the classifier or regression circuit is reconfigurable to train the machine learning model for a new set of inputs that is different from a set of inputs used to train the trained neural network.
In some implementations, the one or more sensors include at least one analog sensor. The analog sensor is a microphone, a piezoelectric sensor, a PPG sensor, an IMU sensor, a chemical sensor, a Lidar sensor, a Radar sensor, or a CMOS matrix sensor.
In some implementations, the analog circuit is configured to generate embeddings that encode types of human activity, and the analog signal comprises three-axis accelerometer signals.
In some implementations, the analog circuit is configured to generate compressed data that encodes vibration sensor data based on vibration features from vibration sensors, and the analog signal comprises three-axis accelerometer signals. In some implementations, the vibration sensors are configured to be placed in machinery, cars, tracks, railway cars, wind turbines, or oil and gas pumps, and the analog signal is obtained wirelessly from the vibration sensors.
In some implementations, the analog circuit is configured to generate embeddings that encode a first set of keywords, and the classifier or regression circuit is configured to be retrained for a second set of keywords that is distinct from the first set of keywords.
In some implementations, the analog circuit is configured to generate pseudo-labels for unlabeled data for self-supervised representation learning.
In another aspect, a method is provided for splitting neural networks. The method includes obtaining a multi-layered neural network that includes a plurality of layers of neurons. The method also includes selecting a set of layers of the multi-layered neural network. The set of layers includes a first layer of neurons and ends with a candidate layer of neurons. The method also includes generating embeddings output by the candidate layer of neurons by inputting a set of input vectors to the multi-layered neural network. The method also includes training a classifier or regressor to classify or regress the embeddings. The method also includes evaluating the classifier or regressor using a test set to determine a performance metric for the classification or regression. The method also includes, in accordance with a determination that the performance metric for the classification is above a predetermined threshold, repeating selecting a new set of layers based on the set of layers, generating new embeddings using the new set of layers, training the classifier or regressor to classify or regress the new embeddings, and evaluating the classifier using the test set, until the performance metric is below the predetermined threshold.
In some implementations, selecting the set of layers and selecting the new set of layers are based on determining if (i) number of operations, (ii) number of neurons, and (iii) dimension of resulting embeddings, are below respective predetermined threshold values.
In some implementations, selecting the set of layers and selecting the new set of layers are based on calculating energy per operation by simulating the multi-layered neural network.
In some implementations, selecting the set of layers and selecting the new set of layers are based on estimating energy per operation based on supply voltage, propagation time and average working current per neuron, for the multi-layered neural network.
In some implementations, the method further includes repeating the steps for a predetermined number of iterations.
In some implementations, the method further includes using a new classifier for classifying the new embeddings after repeating the steps a predetermined number of iterations.
In some implementations, the plurality of layers of neurons includes the first layer of neurons for receiving inputs, and wherein each layer of neurons of the plurality of layers of neurons is connected to a subsequent layer of neurons of the plurality of layers of neurons.
In some implementations, a computer system has one or more processors and memory. The one or more programs include instructions for performing any of the methods described herein.
In some implementations, a non-transitory computer readable storage medium stores one or more programs configured for execution by a computer system having one or more processors and memory. The one or more programs include instructions for performing any of the methods described herein.
Thus, methods, systems, and devices are disclosed that are used for hardware realization of neural networks.
For a better understanding of the aforementioned systems and methods, as well as additional systems and methods for hybrid fixed/flexible implementations of neural networks, reference should be made to the Description of Implementations below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
Reference will now be made to implementations, examples of which are illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without requiring these specific details.
Some implementations realize neural networks in hardware by splitting the neural network into two parts. A first fixed part includes fixed weights and is realized using an analog circuit. In some implementations, the fixed circuit is implemented using a neuromorphic analog signal processor. A second flexible part includes programmable weights. In some implementations, the second flexible part is realized using a digital processor, which may be included as part of the neuromorphic analog signal processor chip or may be an external processor or device. In some implementations, the second flexible part uses arrays of memristors and/or arrays of SuperFlash memory with some determined architecture.
In this way, the advantages of a neuromorphic analog signal processor, such as low latency and high power efficiency, may be combined with the flexibility of the second flexible part.
Described herein are example hardware and techniques for splitting neural networks into the two parts, according to some implementations.
In machine learning, after many training cycles (sometimes referred to as epochs), a deep convolutional neural network model maintains fixed weights and structure for the first 80-90% of the layers. In the following training cycles, weights change only in the last few of the neural network layers (e.g., layers responsible for classification). This property is also used in transfer learning techniques. This property or characteristic of neural networks is a basis for the hybrid hardware described herein. In some implementations, a fixed neural network is responsible for pattern detection (creating a dense set of latent embeddings or descriptors). This is combined with a following flexible algorithm. In some implementations, this algorithm includes an additional flexible neural network responsible for pattern interpretation, according to the nature of the application.
Embeddings (also referred to as latent embeddings or descriptors) are a representation containing densely packed information about sensory input. Embeddings are formed by a neural network similar to a biological nervous system. Embeddings can be found in visual neurobiology. For example, the retina in the eye compresses and encodes visual sensory signals for the visual cortex. The visual cortex is then able to classify and extract meaning for further decision making. Embeddings are formed in hidden layers of a neural network. Embeddings contain significant information about input data. Embeddings are used as input data for further efficient processing, such as data classification and interpretation.
Some implementations include (i) a fixed neuromorphic analog core configured to generate embeddings with ultra-low power and low latency, and (ii) a flexible digital core for final classification or regression.
A large number of machine learning tasks require flexible weights. The fixed part 102 may be implemented using sputtered resistors on the BEOL of the chip. In some implementations, the flexible part 104 is realized using a digital micro-controller unit (MCU) coupled to a neuromorphic analog signal processor (the fixed part). In some implementations, the flexible part is realized using a RISC V processor, which may be an integral part of the neural analog signal processor. The flexible part can be a neural network or an algorithm, such as k-nearest neighbors (KNN).
A classification or regression task may be viewed as having two stages. In the first stage, v=G(x, WG), where x is the input data, xϵRN (R is the set of real numbers and N is the number of dimensions of the input data), G is a neural network for building embeddings, WG is a set of trainable parameters of G, and v is an embedding with vϵRM (M is the number of dimensions of the output data). Typically, M is much smaller than N. In the second stage, y=C(v, WC), where C is a classifier, WC is a set of trainable parameters of C, and y is a classification result corresponding to the input vector x. In some cases, C is a discrete classifier, assigning one of a set of categorical values to each input vector x. In some cases, C computes a regression value based on the set of embeddings v, assigning a classification result on a continuous scale. A regression calculator can be considered as a continuous classifier.
In machine learning, a large portion (e.g., 80-90%) of the neural network weights are typically unchanged after several epochs of training. Transfer learning can be used to realize such neural networks. For example, a new neural network may be implemented with a fixed part for the large portion of the network and a flexible part (e.g., the remaining 10-20%), which can be trained separately. Some implementations combine fixed parts of neural networks (e.g., using resistors) and flexible parts of network (e.g., realized using a digital processor, either in an MCU of the device, at a RISC-V processor, an FPGA, or a CPU), as part of a neuromorphic analog signal processor chip. In some implementations, the flexible part of a network is implemented using conventional technologies and programming languages used in software development, such as Python, C/C++, assembly code, and specialized frameworks, such as TensorFlow or Torch. The implementation is then run on conventional digital computing units, such as a CPU, a GPU, a RISC processor, or an FPGA, depending on the target device and/or application.
Example Methods for Splitting a Neural Network into Fixed and Flexible Parts
According to some implementations, in the first stage, the autoencoder is trained. The number of training epochs is determined by reconstruction error calculated between output and input vectors. After training the autoencoder, its encoder is used to transform an input vector onto the representation space. In the second stage, a classifier for a particular task is trained. The space for this task is the same as the space in which the autoencoder was trained. All of the vectors from this task space are transformed onto the representation space by means of the encoder. The classifier is built in this representation space. The classifier processes vectors after transformation by the encoder. The number of training epochs for the classifier is determined by the resulting accuracy. In this way, an encoder is trained once and implemented in a fixed part of the system. A classifier is trained for a particular task and implemented in a flexible part of the system.
In some implementations, self-supervised representation learning (SSRL) provides deep feature learning without the requirement of large, annotated data sets. In some implementations, a binary classification neural network is trained to compare pairs of input vectors. When both inputs are the same or transformations of the same base vector, the classifier outputs a first class (e.g., “1”). For two fully different input vectors, the classifier outputs a second class (e.g., “0”).
Typically, this type of neural network contains two branches (one for each of two inputs) with shared weights for processing two input vectors. The embeddings obtained at the output layer of these branches are then fed to a classifier for comparing them. These parts are trained in an end-to-end manner. The number of training epochs is determined by a target binary classification error. The branches generating the embeddings play the same role as encoders in the autoencoder described above. Similar to the example method described above for autoencoders, a classifier is separately trained for a particular downstream classification task. The number of training epochs for the classifier is determined by the classification or regression accuracy. A branch generating an embedding is trained once and implemented in a fixed part of the system. A classifier is trained for a particular task and implemented in a flexible part of the system.
In the hybrid process flow 166, on the other hand, the analog signal 162 from the one or more sensors 142 is input into a neuromorphic analog signal processor 154, which may implement the analog circuit described above. The neuromorphic analog signal processor 154 generates embeddings 172 (which may be similar to the embeddings 170). The embeddings are input into a classification or analysis circuit 156 (sometimes referred to as the classifier), which may be similar in functionality to the digital classification module 150. Unlike the standard flow 168, a majority of the computations use the hybrid hardware 160, including the neuromorphic analog signal processor 154 and the classification and analysis circuit 156. The algorithm based decision module 158 for the hybrid process flow 166 may be similar to the algorithm based decisions module 152 for the standard process flow 168. The neuromorphic analog signal processor 154 has ultra-low power consumption and/or very low latency. The classification and decision making algorithms can be digital. These modules may be performed with dramatically reduced resources, power consumption, and/or using an ultra-small micro-controller unit (MCU) core. When a neural network is simulated in a digital processor (from CPU to a GPU and/or a Tensor Processing Unit (TPU)), most resources are used for primary data processing to extract embeddings (see the extraction process 148 in the standard process 168). When the input signal is data-rich, primary data processing consumes up to 80% of the capacity or computational resources. Classification and decision making consume fewer resources. Primary data processing is fixed after training. Classification and decision making constantly improve and change in the course of ongoing data accumulation and learning.
The memory 206 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices. In some implementations, the memory includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. In some implementations, the memory 206 includes one or more storage devices remotely located from one or more processing units 112. The memory 206, or alternatively the non-volatile memory within the memory 206, includes a non-transitory computer readable storage medium. In some implementations, the memory 206, or the non-transitory computer readable storage medium of the memory 206, stores the following programs, modules, and data structures, or a subset or superset thereof:
Operations of the modules and the data structures shown and described above in reference to
The flexible part can be realized in hardware using different methods. In some implementations, the flexible part is performed by a RISC V processor, which may be an integral part of an analog neuromorphic signal processor. In this case, the flexible part can be a digital controller that provides signals to interfaces and multiplexes power signals within the analog neuromorphic signal processor. Some implementations use in-memory computing and/or programmable memory tiles (e.g., flash memory, memristors, or other types of programmable memory). Some implementations use a CPU to perform a neural network for classification or a classification algorithm. The fixed part is typically compute-bound, whereas classification tends to be not as resource intensive as the fixed part.
Some implementations separate neural networks into a fixed part and a flexible part, realize the fixed part using resistors for weights, manufacture the resistors on the BEOL, and realize the flexible part at a coupled MCU or a RISC-V. The fixed part and the flexible part are an integral part of a neuromorphic analog signal processor chip, according to some implementations.
Some implementations split a neural network into a fixed part and a flexible part using transfer learning techniques. For example, a convolutional neural network is trained for data classification. The convolutional part calculates feature representations (embeddings) of the input data. These embeddings are further processed by classifiers specifically trained for other classification or regression tasks from the same data space. The weights of the portion of the neural network that produces the embeddings are fixed.
Some implementations train a deep neural network for input data feature representation using an autoencoder, a self-supervised representation learning system, or a generative adversarial network. The weights of the neural network are used to implement a fixed part of the system. Some implementations use transfer learning techniques to train fixed parts of the neural networks and then train the flexible parts. Some implementations generate embeddings using a fixed part of the network. The embeddings are analyzed by a flexible part using algorithms or neural network based analysis.
Some implementations generate embeddings for human activity recognition, based on 3-axis accelerometer signals. Some implementations use an autoencoder neural network. The autoencoder encodes various types of human activity as strings of 16 bytes (embeddings) and then decodes it without loss of accuracy. Some implementations use an analyzer neural network to decode human activities encoded in the embeddings.
In some implementations, the encoder part is implemented using fixed neurons of a neuromorphic analog signal processor chip. The flexible part (the analyzer) is implemented using a digital processor. In some implementations, the digital processor is an external CPU or a RISC-V processor of neuromorphic analog signal processor chip. In some implementations, the digital processor is used for input, output, and power management. In some implementations, embeddings are obtained using a fixed analog part of the neuromorphic analog signal processor chip. In some cases, this constitutes approximately 90% of the whole workload. In some implementations, activity recognition is performed using a digital analyzer (typically 10% of the workload) implemented using a conventional CPU.
Embeddings generated by an encoder neural network serve an important purpose. For example, if a user practices a new physical activity (e.g., riding a bicycle), a unique descriptor will be formed. The descriptor is likely to be different from other classes of embeddings. In a multi-dimensional space (e.g., a space with 16 dimensions, each dimension corresponding to a different feature), the embedding is likely to be compact and specific for bicycling. Once the user marks this activity as bicycling, that activity can be recognized next time as bicycling. In some implementations, new classes are encoded even if the classes are not present during teaching of the neural network.
In some implementations, neuromorphic analog signal processors based on the techniques described herein are used for predictive maintenance applications, such as vibration control. Typically, there is a large amount of data that flows from vibrational sensors, which are placed in machinery, cars and tracks, railway cars, wind turbines, oil and gas pumps. The data may be transferred wirelessly to an analyzing equipment. The big data flow shortens the battery life of battery-operated sensors.
Some implementations compress the data flow from vibration sensors (e.g., compress by a factor of 1000) using the encoder/decoder techniques described above. The resulting embeddings are transmitted through Long Range (LoRa). An advantage of the techniques described herein is that it is possible to create new classes that describe features of vibration sensors, even if the network was not taught to distinguish such types of features.
Some implementations use an encoder network to obtain fixed weights, which are used to implement a fixed part of a neuromorphic analog signal processor. In this way, it is possible to obtain a whole manifold of different vibration features from different vibration sensors. The different vibration features can then be analyzed by a flexible part digital analyzer, which will recognize cases of malfunction of the machine. Embeddings and encoders have an advantage that the techniques can be applied independently of the type of sensor signal.
Keyword spotting typically requires recognition of different sets of words (e.g., for different languages). A neuromorphic analog signal processor needs to be adaptable. Changing the chip architecture for each new set of words is not practical. Accordingly, some implementations include a fixed part (which performs approximately 90% of the computations) and a flexible part (which performs the remaining approximately 10% of the computations). The fixed part distinguishes between different words from a certain set of data. For other sets of words, the fixed part can generate embeddings. A second flexible network is implemented to distinguish between different sets of words.
According to some implementations, a hardware apparatus includes an analog circuit (e.g., the fixed part 102, which is the circuit comprising the operational amplifiers 120 interconnected using the resistors 118) configured to receive one or more analog signals from one or more sensors, and compute an analog output based on the one or more analog signals, by performing a portion of a neural network. In some implementations, the one or more sensors are integrated (not shown) into the hardware apparatus. For example, the one or more sensors may be connected to the resistors 118 (see
The hardware apparatus also includes a classifier or regression circuit (e.g., the flexible part 104 in
In some implementations, the classifier or regression circuit includes a digital circuit, examples of which are described above in reference to
In some implementations, the analog output represents embeddings and the classifier or regression circuit uses the embeddings to classify or regress the analog output.
In some implementations, the analog circuit includes a plurality of operational amplifiers 120 and a plurality of resistors 118. Resistance values of the plurality of resistors are based on weights of the portion of the trained neural network. The plurality of resistors is configured to connect the plurality of operational amplifiers. In some implementations, the analog circuit includes sputtered resistors formed on the backend-of-the-line (BEOL).
In some implementations, the classifier or regression circuit includes one or more digital computing units, such as CPUs, GPUs, RISC processors, FPGAs, and ASICs.
In some implementations, the classifier or regression circuit includes a processor that is further configured to perform as a digital controller that provides signals to one or more interfaces and multiplexes power within the hardware apparatus. For example, the CPU in
In some implementations, the classifier or regression circuit includes a compute-in-memory component and one or more programmable memory tiles.
In some implementations, the classifier or regression circuit includes a network of memristors.
In some implementations, the classifier or regression circuit includes a processor configured to perform a neural network for data classification or regression. The neural network is distinct from the trained neural network. For example, the fixed part implements the first set of layers 108 of the neural network 114, whereas the flexible part implements a classifier that is different from the neural network 114.
In some implementations, the neural network is an autoencoder including an encoder portion and a decoder portion. The encoder portion performs nonlinear transformations in hidden layers. The analog circuit corresponds to the encoder portion of the autoencoder and is configured to compute a representation of the input vector in a lower dimensional space than an input space of the input vector.
In some implementations, the classifier or regression circuit is reconfigurable to train the machine learning model for a new set of inputs that is different from the set of inputs used to train the neural network.
In some implementations, the analog circuit is configured to generate embeddings that encode one or more types of human activity. The analog signal includes three-axis accelerometer signals.
In some implementations, the analog circuit is configured to generate compressed data that encodes vibration sensor data based on vibration features from vibration sensors. The analog signal includes three-axis accelerometer signals. In some implementations, the vibration sensors are configured to be placed in machinery, cars, tracks, railway cars, wind turbines, or oil and gas pumps, and the analog signal is obtained wirelessly from the vibration sensors.
In some implementations, the analog circuit is configured to generate embeddings that encode a first set of keywords. The classifier or regression circuit is configured to be retrained for a second set of keywords that is distinct from the first set of keywords.
In some implementations, the analog circuit is configured to generate pseudo-labels for unlabeled data for self-supervised representation learning.
Example Method for Splitting Neural Networks into Fixed and Flexible Parts
The method also includes evaluating (410) (e.g., by the evaluation module 226) the classifier or regression model using a test set to determine the accuracy level and/or performance level. The test set (sometimes referred to as a dataset) includes samples. Each sample is input data. For each sample, there is typically real-world output, considered the ground truth. The data set typically includes a test set (used for validation) and a larger training set. Any general dataset that is labeled (e.g., CIFAR, COCO, or Imagenet) can be used. The predetermined threshold specifies a target performance metric (e.g., 95% accuracy for determining whether there is a car in an image). One goal is to optimize for power efficiency and flexibility. Typically, the larger the digital part, the more flexible it is, but the system has lower power efficiency.
The method also includes, when the accuracy level (or performance metric) does not meet a predetermined threshold, repeating (412) (e.g., by the neural network splitting module 214) selecting a new set of initial layers based on the set of layers, generating new embeddings using the new set of initial layers, training the classifier or regression model according to the new embeddings, and evaluating the classifier using the test set, until the accuracy level or performance metric meet the predetermined threshold.
In some implementations, the neural network splitting module 214 reduces the number of layers for the analog part, provided the classification or regression performance metric is above a threshold. A goal is for the analog part to be as maximal as possible, and the flexible part to be as minimal as possible, while providing the flexibility to classify inputs for a given domain or a given application.
In some implementations, selecting the set of initial layers and selecting the new set of initial layers are based on determining if (i) the number of operations, (ii) the number of neurons, and (iii) the dimension of resulting embeddings, are below respective predetermined threshold values.
In some implementations, selecting the set of initial layers and selecting the new set of initial layers are based on calculating energy per operation by simulating the neural network. Commercial software, such as Cadence Virtuoso, may be used for the simulation. A classifier cannot be smaller than some predetermined size, so the analog or fixed part has to be at least some size. Typically, the smaller the classifier, the smaller the set of distinct classes that can be classified.
In some implementations, selecting the set of initial layers and selecting the new set of initial layers are based on estimating energy per operation based on supply voltage, propagation time, and average working current per neuron, for the neural network.
In some implementations, the method further includes repeating the steps for a predetermined number of iterations (e.g., 5 times).
In some implementations, the method further includes using a new classifier for classifying the new embeddings after repeating the steps a predetermined number of iterations.
In some implementations, the plurality of layers of neurons includes the first layer of neurons for receiving inputs. Each layer of neurons of the plurality of layers of neurons is connected to a subsequent layer of neurons of the plurality of layers of neurons.
Example Application of Method for Splitting Neural Networks into Fixed and Flexible Parts
When selecting layers for the fixed part, some implementations take into account not only the performance metric (e.g., accuracy) of the classifier or regression model, but also the complexity of the fixed part (e.g., the number of operations, and/or the number of neurons), as well as the dimension of the embeddings.
Suppose a multi-layered neural network has 5 layers and processes input data samples of length 10. Also, suppose that the first layer includes 10 neurons, the second layer includes 20 neurons, the third layer includes 15 neurons, the fourth layer includes 10 neurons, and the last layer includes 5 neurons. Further suppose that a particular task has the following requirements: the complexity of the fixed part has to be less than 650 operations, the number of neurons has to be less than 50 neurons, and the embedding dimensionality has to be less or equal to 20. If four layers of this neural network are selected, then they produce embeddings of dimension 10 (the number of neurons in the fourth layer). The computational complexity of these four layers is defined by the number of operations to execute in order to calculate the embedding: (10 times 10) plus (10 times 20) plus (20 times 15) plus (15 times 10) results in 750 operations. The number of neurons is the sum of neurons in the selected layers, which is 10 plus 20 plus 15 plus 10, which is 55 neurons. These parameters do not meet the requirements (for the number of operations). So the method removes the fourth layer from the selection. For the remaining three layers, the complexity is (10 times 10) plus (10 times 20) plus (20 times 15), which is 600 operations. The number of neurons is 10 plus 20 plus 15, which is 45, and the embedding dimension is 15. These parameters satisfy the requirements, so the three layers are considered for the fixed part. The method includes generating embeddings for input data using these three layers and training a classifier for these embeddings. Suppose the accuracy of the classifier after training is 98%.
Subsequently, the method removes the third layer and checks the requirements for the first two layers. The complexity is now (10 times 10) plus (10 times 20), which is 300 operations. The number of neurons is 10 plus 20, which is 30, and the embedding dimension is 20. These parameters also satisfy the requirements, so these two layers are considered for the fixed part. The method includes generating embeddings for input data using these two layers as candidate layers for splitting and training a classifier for these embeddings. Now suppose the accuracy of the second classifier is 95%. The classification accuracy for the case where three layers are selected is higher than the case where two layers are selected, so the method selects three layers for the fixed part (i.e., the first three layers).
In some implementations, there are restrictions on the configuration of the analog part. Accordingly, some implementations select candidate layers for the analog part in order to satisfy these restrictions and maximize the classification accuracy of the classifier that classify embeddings. These restrictions may include energy consumption, the number of neurons in the analog part and the embedding dimension. Meeting these restrictions depends on the parameters (the number of operations, the number of neurons, and the number of neurons in the last layer) of the layers selected for the fixed part.
The terminology used in the description of the invention herein is for the purpose of describing particular implementations only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various implementations with various modifications as are suited to the particular use contemplated.
This application is a continuation-in-part of U.S. application Ser. No. 17/189,109, filed Mar. 1, 2021, entitled “Analog Hardware Realization of Neural Networks,” which is a continuation of PCT Application No. PCT/RU2020/000306, filed Jun. 25, 2020, entitled “Analog Hardware Realization of Neural Networks,” each of which is incorporated by reference herein in its entirety. U.S. application Ser. No. 17/189,109 is also a continuation-in-part of PCT Application PCT/EP2020/067800, filed Jun. 25, 2020, entitled “Analog Hardware Realization of Neural Networks,” which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/RU2020/000306 | Jun 2020 | US |
Child | 17189109 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17189109 | Mar 2021 | US |
Child | 18196412 | US | |
Parent | PCT/EP2020/067800 | Jun 2020 | US |
Child | PCT/RU2020/000306 | US |