The disclosed implementations relate generally to neural networks, and more specifically to systems and methods for hardware realization of trained neural networks for sound signal processing, classification, and enhancement.
Conventional hardware has failed to keep pace with innovation in neural networks and the growing popularity of machine learning based applications. The complexity of neural networks continues to outpace CPU and GPU computational power as digital microprocessor advances are plateauing. Neuromorphic processors based on spike neural networks, such as Loihi and True North, are limited in their applications. For GPU-like architectures, power and speed of such architectures are limited by data transmission speed. Data transmission can consume up to 80% of chip power and can significantly impact the speed of calculations. Edge applications demand low power consumption, but there are currently no known performant hardware implementations that consume less than 50 milliwatts of power.
Memristor-based architectures that use cross-bar technology remain impractical for manufacturing recurrent and feed-forward neural networks. For example, memristor-based cross-bars have several disadvantages, including high latency and leakage of currents during operation, which make them impractical. Also, there are reliability issues in manufacturing memristor-based cross-bars, especially when neural networks have both negative and positive weights. For large neural networks with many neurons, at high dimensions, memristor-based cross-bars cannot be used for simultaneous propagation of different signals, which in turn complicates summation of signals, when neurons are represented by operational amplifiers. Furthermore, memristor-based analog integrated circuits have a number of limitations, such as a small number of resistive states, first cycle problems when forming memristors, complexity with channel formation when training the memristors, unpredictable dependency on the dimensions of the memristors, slow operation of memristors, and drift of state of resistance.
Additionally, the training process required for neural networks presents unique challenges for hardware realization of neural networks. A trained neural network is used for specific inferencing tasks, such as classification. Once a neural network is trained, a hardware equivalent is manufactured. When the neural network is retrained, the hardware manufacturing process is repeated, driving up costs. Although some reconfigurable hardware solutions exist, such hardware cannot be easily mass produced, and costs a lot more (e.g., five times more) than hardware that is not reconfigurable. Further, edge environments, such as smart-home applications, do not require re-programmability as such. For example, 85% of all applications of neural networks do not require any retraining during operation, so on-chip learning is not that useful. Furthermore, edge applications include noisy environments, which can cause reprogrammable hardware to become unreliable.
Voice transmissions comprise the majority of communications between humans and human-machine interfaces, and substantially surpass video and hand-typed communications. Clarity of voice transmission needs to be maintained while voice signals are compressed or digitized for transmission. Traditionally, multiple noise suppression and noise filtering methods and apparatuses process the unclear voice signals and remove at least some of the unwanted noise. Some conventional techniques use microphones that capture noise and generate sounds that effectively cancel out the unwanted noises detected around a listener. Such techniques are more prevalent in headphones, and specifically in noise-cancelling headphones. There are also techniques that suppress certain noises based on spectra qualities of specific noise sources, or using more elaborate algorithms, such as Markov processes, Fast Fourier Transform methods, and various noise-detecting adaptive algorithms.
More recently, neural networks have been used to analyze signals containing a mix of voice and noise, and to effectively extract mostly voice-containing signals, based on the specific features attributable to voice. Such neural networks need to be trained and are implemented substantially as programs running on powerful computers. These computers consume substantial electric and computing power. Conventional solutions are often limited by training features, fail to provide real-time processing, and are limited to processing specific recorded voice signals. Currently, voice communications are predominantly performed via cellular or land-line phones. Conventional equipment lacks computing power and/or electrical power for effectively processing voice signals and suppressing unwanted noises. Even with sophisticated noise-cancelling technologies, the types of noises that can be effectively suppressed are substantially limited. It is common to have unwanted disturbances, such as dog barks, door slams, emergency sirens, car honks, and similar unpredictable interferences, which are still background noises for the purpose of transmitting clear voice signals.
Non-voice noises or signals can originate in the vicinity of a speaker, near the microphone, or on another device used to transform sound into electrical signals. Such noises are generally referred to as the background noises at-origin. For such noise signals, any background conversations, or voices of persons, farther from the microphone, could be considered noise. Other non-voice noises can originate during processing and transmission of the signals, such as compression, analog-to-digital conversion, spectrum limitation, breakdown in packets limited by length, spectrum, or information size. Such noises occur during transmission, as well as during corresponding reversing steps. When several voice/noise signals are mixed together, such as in conference calls or multi-person communications, noises associated with each signal are mixed, further complicating the task and challenge of voice clarification. In addition, when the voice signals are further processed to result in the actual sound generated near the ear of the recipient (e.g., for human-to-human communications), via either speakers, headphones, or other apparatuses or methods, further noises or unwanted signals may be introduced by the ambient environment near the recipient. Although voice commands have become popular, wearable devices lack advanced sound processing capabilities. Conventional devices need Internet connection and have power limitations. There is also a security concern using Internet-connected voice processing devices.
Accordingly, there is a need for methods, circuits and/or interfaces that address at least some of the deficiencies identified above. Analog circuits that model trained neural networks and manufactured according to the techniques described herein, can provide improved performance per watt advantages, can be useful in implementing hardware solutions in edge environments, and can tackle a variety of applications, such as drone navigation and autonomous cars. The cost advantages provided by the proposed manufacturing methods and/or analog network architectures are even more pronounced with larger neural networks. Also, analog hardware implementations of neural networks provide improved parallelism and neuromorphism. Moreover, neuromorphic analog components are not sensitive to noise and temperature changes, when compared to digital counterparts. A neurovoice processor is described herein, according to some implementations. The processor consumes low power (e.g., 200 micro-watt or less) and can work locally (without the Internet), thereby providing privacy.
Chips manufactured according to the techniques described herein provide orders of magnitude improvement over conventional systems in size, power, and performance, and are ideal for edge environments, including for retraining purposes. Such analog neuromorphic chips can be used to implement edge computing applications or in Internet-of-Things (IoT) environments. Due to the analog hardware, initial processing (e.g., formation of descriptors for image recognition), that can consume over 80-90% of power, can be moved onto the chip, thereby decreasing energy consumption and network load, which can open new markets for applications.
Various edge applications can benefit from the use of such analog hardware. For example, for video processing, the techniques described herein can be used to include direct connection to CMOS sensors without a digital interface. Various other video processing applications include road sign recognition for automobiles, camera-based true depth and/or simultaneous localization and mapping for robots, room access control without a server connection, and always-on solutions for security and healthcare. Such chips can be used for data processing from radars and lidars, and for low-level data fusion. Such techniques can be used to implement battery management features for large battery packs, sound/voice processing without connection to data centers, voice recognition on mobile devices, wake up speech instructions for IoT sensors, translators that translate one language to another, large sensor arrays of IoT with low signal intensity, and/or configurable process controls with hundreds of sensors.
Neuromorphic analog chips can be mass produced after standard software-based neural network simulations/training, according to some implementations. A client's neural network can be easily ported, regardless of the structure of the neural network, with customized chip design and production. Moreover, a library of ready to make on-chip solutions (network emulators) are provided, according to some implementations. Such solutions require only training, one lithographic mask change, following which chips can be mass produced. For example, during chip production, only part of the lithography masks needs to be changed.
The techniques described herein can be used to design and/or manufacture an analog neuromorphic integrated circuit that is mathematically equivalent to a trained neural network (either feed-forward or recurrent neural networks). According to some implementations, the process begins with a trained neural network that is first converted into a transformed network comprised of standard elements. Operation of the transformed network are simulated using software with known models representing the standard elements. The software simulation is used to determine the individual resistance values for each of the resistors in the transformed network. Lithography masks are laid out based on the arrangement of the standard elements in the transformed network. Each of the standard elements are laid out in the masks using an existing library of circuits corresponding to the standard elements to simplify and speed up the process. In some implementations, the resistors are laid out in one or more masks separate from the masks including the other elements (e.g., operational amplifiers) in the transformed network. In this manner, if the neural network is retrained, only the masks containing the resistors, or other types of fixed-resistance elements, representing the new weights in the retrained neural network need to be regenerated, which simplifies and speeds up the process. The lithography masks are then sent to a fab for manufacturing the analog neuromorphic integrated circuit.
In one aspect, a method is provided for hardware realization of neural networks, according to some implementations. The method incudes obtaining a neural network topology and weights of a trained neural network. The method also includes transforming the neural network topology to an equivalent analog network of analog components. The method also includes computing a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection between analog components of the equivalent analog network. The method also includes generating a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components.
In some implementations, generating the schematic model includes generating a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix and represents a resistance value.
In some implementations, the method further includes obtaining new weights for the trained neural network, computing a new weight matrix for the equivalent analog network based on the new weights, and generating a new resistance matrix for the new weight matrix.
In some implementations, the neural network topology includes one or more layers of neurons, each layer of neurons computing respective outputs based on a respective mathematical function, and transforming the neural network topology to the equivalent analog network of analog components includes: for each layer of the one or more layers of neurons: (i) identifying one or more function blocks, based on the respective mathematical function, for the respective layer. Each function block has a respective schematic implementation with block outputs that conform to outputs of a respective mathematical function; and (ii) generating a respective multi-layer network of analog neurons based on arranging the one or more function blocks. Each analog neuron implements a respective function of the one or more function blocks, and each analog neuron of a first layer of the multi-layer network is connected to one or more analog neurons of a second layer of the multi-layer network.
In some implementations, the one or more function blocks include one or more basic function blocks selected from the group consisting of: (i) a weighted summation block with a block output Vout=ReLU(Σwi·Viin+bias). ReLU is Rectified Linear Unit (ReLU) activation function or a similar activation function, Vi represents an i-th input, wi represents a weight corresponding to the i-th input, and bias represents a bias value, and E is a summation operator; (ii) a signal multiplier block with a block output Vout=coeff·Vi·Vj·Vi represents an i-th input and Vi represents a j-th input, and coeff is a predetermined coefficient; (iii) a sigmoid activation block with a block output
V represents an input, and A and B are predetermined coefficient values of the sigmoid activation block; (iv) a hyperbolic tangent activation block with a block output Vout=A*tanh (B*Vin). Vin represents an input, and A and B are predetermined coefficient values; and (v) a signal delay block with a block output U(t)=V(t−dt). t represents a current time-period, V(t−dt) represents an output of the signal delay block for a preceding time period t−dt, and dt is a delay value.
In some implementations, identifying the one or more function blocks includes selecting the one or more function blocks based on a type of the respective layer.
In some implementations, the neural network topology includes one or more layers of neurons, each layer of neurons computing respective outputs based on a respective mathematical function, and transforming the neural network topology to the equivalent analog network of analog components includes: (i) decomposing a first layer of the neural network topology to a plurality of sub-layers, including decomposing a mathematical function corresponding to the first layer to obtain one or more intermediate mathematical functions. Each sub-layer implements an intermediate mathematical function; and (ii) for each sub-layer of the first layer of the neural network topology: (a) selecting one or more sub-function blocks, based on a respective intermediate mathematical function, for the respective sub-layer; and (b) generating a respective multilayer analog sub-network of analog neurons based on arranging the one or more sub-function blocks. Each analog neuron implements a respective function of the one or more sub-function blocks, and each analog neuron of a first layer of the multilayer analog sub-network is connected to one or more analog neurons of a second layer of the multilayer analog sub-network.
In some implementations, the mathematical function corresponding to the first layer includes one or more weights, and decomposing the mathematical function includes adjusting the one or more weights such that combining the one or more intermediate functions results in the mathematical function.
In some implementations, the method further includes: (i) generating equivalent digital network of digital components for one or more output layers of the neural network topology; and (ii) connecting output of one or more layers of the equivalent analog network to the equivalent digital network of digital components.
In some implementations, the analog components include a plurality of operational amplifiers and a plurality of resistors, each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons.
In some implementations, selecting component values of the analog components includes performing a gradient descent method to identify possible resistance values for the plurality of resistors.
In some implementations, the neural network topology includes one or more GRU or LSTM neurons, and transforming the neural network topology includes generating one or more signal delay blocks for each recurrent connection of the one or more GRU or LSTM neurons.
In some implementations, the one or more signal delay blocks are activated at a frequency that matches a predetermined input signal frequency for the neural network topology.
In some implementations, the neural network topology includes one or more layers of neurons that perform unlimited activation functions, and transforming the neural network topology includes applying one or more transformations selected from the group consisting of: (i) replacing the unlimited activation functions with limited activation; and (ii) adjusting connections or weights of the equivalent analog network such that, for predetermined one or more inputs, difference in output between the trained neural network and the equivalent analog network is minimized.
In some implementations, the method further includes generating one or more lithographic masks for fabricating a circuit implementing the equivalent analog network of analog components based on the resistance matrix.
In some implementations, the method further includes: (i) obtaining new weights for the trained neural network; (ii) computing a new weight matrix for the equivalent analog network based on the new weights; (iii) generating a new resistance matrix for the new weight matrix; and (iv) generating a new lithographic mask for fabricating the circuit implementing the equivalent analog network of analog components based on the new resistance matrix.
In some implementations, the trained neural network is trained using software simulations to generate the weights.
In another aspect, a method for hardware realization of neural networks is provided, according to some implementations. The method includes obtaining a neural network topology and weights of a trained neural network. The method also includes calculating one or more connection constraints based on analog integrated circuit (IC) design constraints. The method also includes transforming the neural network topology to an equivalent sparsely connected network of analog components satisfying the one or more connection constraints. The method also includes computing a weight matrix for the equivalent sparsely connected network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection between analog components of the equivalent sparsely connected network.
In some implementations, transforming the neural network topology to the equivalent sparsely connected network of analog components includes deriving a possible input connection degree Ni and output connection degree No, according to the one or more connection constraints.
In some implementations, the neural network topology includes at least one densely connected layer with K inputs and L outputs and a weight matrix U. In such cases, transforming the at least one densely connected layer includes constructing the equivalent sparsely connected network with K inputs, L outputs, and ┌logN
In some implementations, the neural network topology includes at least one densely connected layer with K inputs and L outputs and a weight matrix U. In such cases, transforming the at least one densely connected layer includes constructing the equivalent sparsely connected network with K inputs, L outputs, and M≥max(┌logN
In some implementations, the neural network topology includes a single sparsely connected layer with K inputs and L outputs, a maximum input connection degree of Pi, a maximum output connection degree of Po, and a weight matrix of U, where absent connections are represented with zeros. In such cases, transforming the single sparsely connected layer includes constructing the equivalent sparsely connected network with K inputs, L outputs, M≥max(┌logN
In some implementations, the neural network topology includes a convolutional layer with K inputs and L outputs. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes decomposing the convolutional layer into a single sparsely connected layer with K inputs, L outputs, a maximum input connection degree of Pi, and a maximum output connection degree of Po. Pi≤Ni and Po≤No.
In some implementations, generating a schematic model for implementing the equivalent sparsely connected network utilizing the weight matrix.
In some implementations, the neural network topology includes a recurrent neural layer. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes transforming the recurrent neural layer into one or more densely or sparsely connected layers with signal delay connections.
In some implementations, the neural network topology includes a recurrent neural layer. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes decomposing the recurrent neural layer into several layers, where at least one of the layers is equivalent to a densely or sparsely connected layer with K inputs and L output and a weight matrix U, where absent connections are represented with zeros.
In some implementations, the neural network topology includes K inputs, a weight vector U∈RK, and a single layer perceptron with a calculation neuron with an activation function F. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes: (i) deriving a connection degree N for the equivalent sparsely connected network according to the one or more connection constraints; (ii) calculating a number of layers m for the equivalent sparsely connected network using the equation m=┌logN K┐; and (iii) constructing the equivalent sparsely connected network with the K inputs, m layers and the connection degree N. The equivalent sparsely connected network includes respective one or more analog neurons in each layer of the m layers, each analog neuron of first m−1 layers implements identity transform, and an analog neuron of last layer implements the activation function F of the calculation neuron of the single layer perceptron. Also, in such cases, computing the weight matrix for the equivalent sparsely connected network includes calculating a weight vector W for connections of the equivalent sparsely connected network by solving a system of equations based on the weight vector U. The system of equations includes K equations with S variables, and S is computed using the equation
In some implementations, the neural network topology includes K inputs, a single layer perceptron with L calculation neurons, and a weight matrix V that includes a row of weights for each calculation neuron of the L calculation neurons. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes: (i) deriving a connection degree N for the equivalent sparsely connected network according to the one or more connection constraints; (ii) calculating number of layers m for the equivalent sparsely connected network using the equation m=┌logN K┐; (iii) decomposing the single layer perceptron into L single layer perceptron networks. Each single layer perceptron network includes a respective calculation neuron of the L calculation neurons; (iv) for each single layer perceptron network of the L single layer perceptron networks: (a) constructing a respective equivalent pyramid-like sub-network for the respective single layer perceptron network with the K inputs, the m layers and the connection degree N. The equivalent pyramid-like sub-network includes one or more respective analog neurons in each layer of the m layers, each analog neuron of first m−1 layers implements identity transform, and an analog neuron of last layer implements the activation function of the respective calculation neuron corresponding to the respective single layer perceptron; and (b) constructing the equivalent sparsely connected network by concatenating each equivalent pyramid-like sub-network including concatenating an input of each equivalent pyramid-like sub-network for the L single layer perceptron networks to form an input vector with L*K inputs. Also, in such cases, computing the weight matrix for the equivalent sparsely connected network includes, for each single layer perceptron network of the L single layer perceptron networks: (i) setting a weight vector U=Vi, ith row of the weight matrix V corresponding to the respective calculation neuron corresponding to the respective single layer perceptron network; and (ii) calculating a weight vector Wi for connections of the respective equivalent pyramid-like sub-network by solving a system of equations based on the weight vector U. The system of equations includes K equations with S variables, and S is computed using the equation
In some implementations, the neural network topology includes K inputs, a multi-layer perceptron with S layers, each layer i of the S layers includes a corresponding set of calculation neurons Li and corresponding weight matrices Vi that includes a row of weights for each calculation neuron of the L, calculation neurons. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes: (i) deriving a connection degree N for the equivalent sparsely connected network according to the one or more connection constraints; (ii) decomposing the multi-layer perceptron into Q=Σi=1,S(Li) single layer perceptron networks. Each single layer perceptron network includes a respective calculation neuron of the Q calculation neurons. Decomposing the multi-layer perceptron includes duplicating one or more input of the K inputs that are shared by the Q calculation neurons; (iii) for each single layer perceptron network of the Q single layer perceptron networks: (a) calculating a number of layers m for a respective equivalent pyramid-like sub-network using the equation m=┌logN Ki j┐. Ki,j is number of inputs for the respective calculation neuron in the multi-layer perceptron; and (b) constructing the respective equivalent pyramid-like sub-network for the respective single layer perceptron network with Ki,j inputs, the m layers and the connection degree N. The equivalent pyramid-like sub-network includes one or more respective analog neurons in each layer of them layers, each analog neuron of first m−1 layers implements identity transform, and an analog neuron of last layer implements the activation function of the respective calculation neuron corresponding to the respective single layer perceptron network; and (iv) constructing the equivalent sparsely connected network by concatenating each equivalent pyramid-like sub-network including concatenating input of each equivalent pyramid-like sub-network for the Q single layer perceptron networks to form an input vector with Q*Ki,j inputs. Also, in such cases, computing the weight matrix for the equivalent sparsely connected network includes: for each single layer perceptron network of the Q single layer perceptron networks: (i) setting a weight vector U=Vij, the ith row of the weight matrix V corresponding to the respective calculation neuron corresponding to the respective single layer perceptron network, where j is the corresponding layer of the respective calculation neuron in the multi-layer perceptron; and (ii) calculating a weight vector Wi for connections of the respective equivalent pyramid-like sub-network by solving a system of equations based on the weight vector U. The system of equations includes Ki,j equations with S variables, and S is computed using the equation
In some implementations, the neural network topology includes a Convolutional Neural Network (CNN) with K inputs, S layers, each layer i of the S layers includes a corresponding set of calculation neurons Li and corresponding weight matrices Vi that includes a row of weights for each calculation neuron of the Li calculation neurons. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes: (i) deriving a connection degree N for the equivalent sparsely connected network according to the one or more connection constraints; (ii) decomposing the CNN into Q=Σi=1,S(Li) single layer perceptron networks. Each single layer perceptron network includes a respective calculation neuron of the Q calculation neurons. Decomposing the CNN includes duplicating one or more input of the K inputs that are shared by the Q calculation neurons; (iii) for each single layer perceptron network of the Q single layer perceptron networks: (a) calculating number of layers m for a respective equivalent pyramid-like sub-network using the equation m=┌logN Ki,j┐. j is the corresponding layer of the respective calculation neuron in the CNN, and Ki,j is number of inputs for the respective calculation neuron in the CNN; and (b) constructing the respective equivalent pyramid-like sub-network for the respective single layer perceptron network with Ki,j inputs, the m layers and the connection degree N. The equivalent pyramid-like sub-network includes one or more respective analog neurons in each layer of the m layers, each analog neuron of first m−1 layers implements identity transform, and an analog neuron of last layer implements the activation function of the respective calculation neuron corresponding to the respective single layer perceptron network; and (iv) constructing the equivalent sparsely connected network by concatenating each equivalent pyramid-like sub-network including concatenating input of each equivalent pyramid-like sub-network for the Q single layer perceptron networks to form an input vector with Q*Ki,j inputs. Also, in such cases, computing the weight matrix for the equivalent sparsely connected network includes, for each single layer perceptron network of the Q single layer perceptron networks: (i) setting a weight vector U=Vij, the ith row of the weight matrix V corresponding to the respective calculation neuron corresponding to the respective single layer perceptron network, where j is the corresponding layer of the respective calculation neuron in the CNN; and (ii) calculating weight vector Wi for connections of the respective equivalent pyramid-like sub-network by solving a system of equations based on the weight vector U. The system of equations includes Ki,j equations with S variables, and S is computed using the equation
In some implementations, the neural network topology includes K inputs, a layer Lp with K neurons, a layer Ln with L neurons, and a weight matrix W∈RL×K, where R is the set of real numbers, each neuron of the layer Lp is connected to each neuron of the layer Ln, each neuron of the layer Ln performs an activation function F, such that output of the layer Ln is computed using the equation Yo=F(W·x) for an input x. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes performing a trapezium transformation that includes: (i) deriving a possible input connection degree NI>1 and a possible output connection degree NO>1, according to the one or more connection constraints; (ii) in accordance with a determination that K·L<L·NI+K·NO, constructing a three-layered analog network that includes a layer LAp with K analog neurons performing identity activation function, a layer LAh with
analog neurons performing identity activation function, and a layer LAo with L analog neurons performing the activation function F, such that each analog neuron in the layer LAp has No outputs, each analog neuron in the layer LAh has not more than NI inputs and NO outputs, and each analog neuron in the layer LAo has NI inputs. Also, in such cases, computing the weight matrix for the equivalent sparsely connected network includes generating a sparse weight matrices Wo and Wh by solving a matrix equation Wo·Wh=W that includes K·L equations in K·NO+L·N1 variables, so that the total output of the layer LAo is calculated using the equation Yo=F(Wo·Wh·x). The sparse weight matrix Wo∈RK×M represents connections between the layers LAp and LAh, and the sparse weight matrix Wh∈RM×L represents connections between the layers LAh and LAo.
In some implementations, performing the trapezium transformation further includes: in accordance with a determination that K·L≥L·NI+K·NO: (i) splitting the layer Lp to obtain a sub-layer Lp1 with K′ neurons and a sub-layer Lp2 with (K-K′) neurons such that K′·L≥L·NI+K′·NO; (ii) for the sub-layer Lp1 with K′ neurons, performing the constructing, and generating steps; and (iii) for the sub-layer Lp2 with K-K′ neurons, recursively performing the splitting, constructing, and generating steps.
In some implementations, the neural network topology includes a multilayer perceptron network. In such cases, the method further includes, for each pair of consecutive layers of the multilayer perceptron network, iteratively performing the trapezium transformation and computing the weight matrix for the equivalent sparsely connected network.
In some implementations, the neural network topology includes a recurrent neural network (RNN) that includes (i) a calculation of linear combination for two fully connected layers, (ii) element-wise addition, and (iii) a non-linear function calculation. In such cases, the method further includes performing the trapezium transformation and computing the weight matrix for the equivalent sparsely connected network, for (i) the two fully connected layers, and (ii) the non-linear function calculation.
In some implementations, the neural network topology includes a long short-term memory (LSTM) network or a gated recurrent unit (GRU) network that includes (i) a calculation of linear combination for a plurality of fully connected layers, (ii) element-wise addition, (iii) a Hadamard product, and (iv) a plurality of non-linear function calculations. In such cases, the method further includes performing the trapezium transformation and computing the weight matrix for the equivalent sparsely connected network, for (i) the plurality of fully connected layers, and (ii) the plurality of non-linear function calculations.
In some implementations, the neural network topology includes a convolutional neural network (CNN) that includes (i) a plurality of partially connected layers and (ii) one or more fully-connected layers. In such cases, the method further includes: (i) transforming the plurality of partially connected layers to equivalent fully-connected layers by inserting missing connections with zero weights; and (ii) for each pair of consecutive layers of the equivalent fully-connected layers and the one or more fully-connected layers, iteratively performing the trapezium transformation and computing the weight matrix for the equivalent sparsely connected network.
In some implementations, the neural network topology includes K inputs, L output neurons, and a weight matrix U∈RL×K, where R is the set of real numbers, each output neuron performs an activation function F. In such cases, transforming the neural network topology to the equivalent sparsely connected network of analog components includes performing an approximation transformation that includes: (i) deriving a possible input connection degree NI>1 and a possible output connection degree NO>1, according to the one or more connection constraints; (ii) selecting a parameter p from the set {0, 1, . . . , ┌logN
for all weights j of the neuron except ki; and (b) setting all other weights of the pyramid neural network to 1; and (ii) generating weights for the trapezium neural network including (a) setting weights of each neuron i of the first layer of the trapezium neural network according to the equation
and (b) setting other weights of the trapezium neural network to 1.
In some implementations, the neural network topology includes a multilayer perceptron with the K inputs, S layers, and Li=1,S calculation neurons in i-th layer, and a weight matrix Ui=1,S∈RL
In another aspect, a method is provided for hardware realization of neural networks, according to some implementations. The method includes obtaining a neural network topology and weights of a trained neural network. The method also includes transforming the neural network topology to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons. The method also includes computing a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection. The method also includes generating a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix and represents a resistance value.
In some implementations, generating the resistance matrix for the weight matrix includes: (i) obtaining a predetermined range of possible resistance values {Rmin, Rmax} and selecting an initial base resistance value Rbase within the predetermined range; (ii) selecting a limited length set of resistance values, within the predetermined range, that provide most uniform distribution of possible weights
within the range [−Rbase, Rbase] for all combinations of {Ri, Rj} within the limited length set of resistance values; (iii) selecting a resistance value R+=R−, from the limited length set of resistance values, either for each analog neuron or for each layer of the equivalent analog network, based on maximum weight of incoming connections and bias wmax of each neuron or for each layer of the equivalent analog network, such that R+=R− is the closest resistor set value to Rbase*wmax; and (iv) for each element of the weight matrix, selecting a respective first resistance value R1 and a respective second resistance value R2 that minimizes an error according to equation
for all possible values of R1 and R2 within the predetermined range of possible resistance values. w is the respective element of the weight matrix, and rerr is a predetermined relative tolerance value for resistances.
In some implementations, the predetermined range of possible resistance values includes resistances according to nominal series E24 in the range 100 KΩ to 1 MΩ.
In some implementations, R+ and R− are chosen independently for each layer of the equivalent analog network.
In some implementations, R+ and R− are chosen independently for each analog neuron of the equivalent analog network.
In some implementations, a first one or more weights of the weight matrix and a first one or more inputs represent one or more connections to a first operational amplifier of the equivalent analog network. In such cases, the method further includes, prior to generating the resistance matrix: (i) modifying the first one or more weights by a first value; and (ii) configuring the first operational amplifier to multiply, by the first value, a linear combination of the first one or more weights and the first one or more inputs, before performing an activation function.
In some implementations, the method further includes: (i) obtaining a predetermined range of weights; and (ii) updating the weight matrix according to the predetermined range of weights such that the equivalent analog network produces similar output as the trained neural network for same input.
In some implementations, the trained neural network is trained so that each layer of the neural network topology has quantized weights.
In some implementations, the method further includes retraining the trained neural network to reduce sensitivity to errors in the weights or the resistance values that cause the equivalent analog network to produce different output compared to the trained neural network.
In some implementations, the method further includes retraining the trained neural network so as to minimize weight in any layer that are more than mean absolute weight for that layer by larger than a predetermined threshold.
In another aspect, a method is provided for hardware realization of neural networks, according to some implementations. The method includes obtaining a neural network topology and weights of a trained neural network. The method also includes transforming the neural network topology to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons. The method also includes computing a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection. The method also includes generating a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix. The method also includes pruning the equivalent analog network to reduce number of the plurality of operational amplifiers or the plurality of resistors, based on the resistance matrix, to obtain an optimized analog network of analog components.
In some implementations, pruning the equivalent analog network includes substituting, with conductors, resistors corresponding to one or more elements of the resistance matrix that have resistance values below a predetermined minimum threshold resistance value.
In some implementations, pruning the equivalent analog network includes removing one or more connections of the equivalent analog network corresponding to one or more elements of the resistance matrix that are above a predetermined maximum threshold resistance value.
In some implementations, pruning the equivalent analog network includes removing one or more connections of the equivalent analog network corresponding to one or more elements of the weight matrix that are approximately zero.
In some implementations, pruning the equivalent analog network further includes removing one or more analog neurons of the equivalent analog network without any input connections.
In some implementations, pruning the equivalent analog network includes: (i) ranking analog neurons of the equivalent analog network based on detecting use of the analog neurons when making calculations for one or more data sets; (ii) selecting one or more analog neurons of the equivalent analog network based on the ranking; and (iii) removing the one or more analog neurons from the equivalent analog network.
In some implementations, detecting use of the analog neurons includes: (i) building a model of the equivalent analog network using a modelling software; and (ii) measuring propagation of analog signals by using the model to generate calculations for the one or more data sets.
In some implementations, detecting use of the analog neurons includes: (i) building a model of the equivalent analog network using a modelling software; and (ii) measuring output signals of the model by using the model to generate calculations for the one or more data sets.
In some implementations, detecting use of the analog neurons includes: (i) building a model of the equivalent analog network using a modelling software; and (ii) measuring power consumed by the analog neurons by using the model to generate calculations for the one or more data sets.
In some implementations, the method further includes subsequent to pruning the equivalent analog network, and prior to generating one or more lithographic masks for fabricating a circuit implementing the equivalent analog network, recomputing the weight matrix for the equivalent analog network and updating the resistance matrix based on the recomputed weight matrix.
In some implementations, the method further includes, for each analog neuron of the equivalent analog network: (i) computing a respective bias value for the respective analog neuron based on the weights of the trained neural network, while computing the weight matrix; (ii) in accordance with a determination that the respective bias value is above a predetermined maximum bias threshold, removing the respective analog neuron from the equivalent analog network; and (iii) in accordance with a determination that the respective bias value is below a predetermined minimum bias threshold, replacing the respective analog neuron with a linear junction in the equivalent analog network.
In some implementations, the method further includes reducing number of neurons of the equivalent analog network, prior to generating the weight matrix, by increasing number of connections from one or more analog neurons of the equivalent analog network.
In some implementations, the method further includes pruning the trained neural network to update the neural network topology and the weights of the trained neural network, prior to transforming the neural network topology, using pruning techniques for neural networks, so that the equivalent analog network includes less than a predetermined number of analog components.
In some implementations, the pruning is performed iteratively taking into account accuracy or a level of match in output between the trained neural network and the equivalent analog network.
In some implementations, the method further includes, prior to transforming the neural network topology to the equivalent analog network, performing network knowledge extraction.
In another aspect, an integrated circuit is provided, according to some implementations. The integrated circuit includes an analog network of analog components fabricated by a method that includes: (i) obtaining a neural network topology and weights of a trained neural network; (ii) transforming the neural network topology to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents a respective analog neuron, and each resistor represents a respective connection between a respective first analog neuron and a respective second analog neuron; (iii) computing a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection; (iv) generating a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix; (v) generating one or more lithographic masks for fabricating a circuit implementing the equivalent analog network of analog components based on the resistance matrix; and (vi) fabricating the circuit based on the one or more lithographic masks using a lithographic process.
In some implementations, the integrated circuit further includes one or more digital to analog converters configured to generate analog input for the equivalent analog network of analog components based on one or more digital.
In some implementations, the integrated circuit further includes an analog signal sampling module configured to process 1-dimensional or 2-dimensional analog inputs with a sampling frequency based on number of inferences of the integrated circuit.
In some implementations, the integrated circuit further includes a voltage converter module to scale down or scale up analog signals to match operational range of the plurality of operational amplifiers.
In some implementations, the integrated circuit further includes a tact signal processing module configured to process one or more frames obtained from a CCD camera.
In some implementations, the trained neural network is a long short-term memory (LSTM) network. In such cases, the integrated circuit further includes one or more clock modules to synchronize signal tacts and to allow time series processing.
In some implementations, the integrated circuit further includes one or more analog to digital converters configured to generate digital signal based on output of the equivalent analog network of analog components.
In some implementations, the integrated circuit further includes one or more signal processing modules configured to process 1-dimensional or 2-dimensional analog signals obtained from edge applications.
In some implementations, the trained neural network is trained, using training datasets containing signals of arrays of gas sensors on different gas mixture, for selective sensing of different gases in a gas mixture containing predetermined amounts of gases to be detected. In such cases, the neural network topology is a 1-Dimensional Deep Convolutional Neural network (1D-DCNN) designed for detecting 3 binary gas components based on measurements by 16 gas sensors, and includes 16 sensor-wise 1-D convolutional blocks, 3 shared or common 1-D convolutional blocks and 3 dense layers. In such cases, the equivalent analog network includes: (i) a maximum of 100 input and output connections per analog neuron, (ii) delay blocks to produce delay by any number of time steps, (iii) a signal limit of 5, (iv) 15 layers, (v) approximately 100,000 analog neurons, and (vi) approximately 4,900,000 connections.
In some implementations, the trained neural network is trained, using training datasets containing thermal aging time series data for different MOSFETs, for predicting remaining useful life (RUL) of a MOSFET device. In such cases, the neural network topology includes 4 LSTM layers with 64 neurons in each layer, followed by two dense layers with 64 neurons and 1 neuron, respectively. In such cases, the equivalent analog network includes: (i) a maximum of 100 input and output connections per analog neuron, (ii) a signal limit of 5, (iii) 18 layers, (iv) between 3,000 and 3,200 analog neurons, and (v) between 123,000 and 124,000 connections.
In some implementations, the trained neural network is trained, using training datasets containing time series data including discharge and temperature data during continuous usage of different commercially available Li-Ion batteries, for monitoring state of health (SOH) and state of charge (SOC) of Lithium Ion batteries to use in battery management systems (BMS). In such cases, the neural network topology includes an input layer, 2 LSTM layers with 64 neurons in each layer, followed by an output dense layer with 2 neurons for generating SOC and SOH values. In such cases, the equivalent analog network includes: (i) a maximum of 100 input and output connections per analog neuron, (ii) a signal limit of 5, (iii) 9 layers, (iv) between 1,200 and 1,300 analog neurons, and (v) between 51,000 and 52,000 connections.
In some implementations, the trained neural network is trained, using training datasets containing time series data including discharge and temperature data during continuous usage of different commercially available Li-Ion batteries, for monitoring state of health (SOH) of Lithium Ion batteries to use in battery management systems (BMS). In such cases, the neural network topology includes an input layer with 18 neurons, a simple recurrent layer with 100 neurons, and a dense layer with 1 neuron. In such cases, the equivalent analog network includes: (i) a maximum of 100 input and output connections per analog neuron, (ii) a signal limit of 5, (iii) 4 layers, (iv) between 200 and 300 analog neurons, and (v) between 2,200 and 2,400 connections.
In some implementations, the trained neural network is trained, using training datasets containing speech commands, for identifying voice commands. In such cases, the neural network topology is a Depthwise Separable Convolutional Neural Network (DS-CNN) layer with 1 neuron. In such cases, the equivalent analog network includes: (i) a maximum of 100 input and output connections per analog neuron, (ii) a signal limit of 5, (iii) 13 layers, (iv) approximately 72,000 analog neurons, and (v) approximately 2.6 million connections.
In some implementations, the trained neural network is trained, using training datasets containing photoplethysmography (PPG) data, accelerometer data, temperature data, and electrodermal response signal data for different individuals performing various physical activities for a predetermined period of times and reference heart rate data obtained from ECG sensor, for determining pulse rate during physical exercises based on PPG sensor data and 3-axis accelerometer data. In such cases, the neural network topology includes two Conv1D layers each with 16 filters and a kernel of 20, performing time series convolution, two LSTM layers each with 16 neurons, and two dense layers with 16 neurons and 1 neuron, respectively. In such cases, the equivalent analog network includes: (i) delay blocks to produce any number of time steps, (ii) a maximum of 100 input and output connections per analog neuron, (iii) a signal limit of 5, (iv) 16 layers, (v) between 700 and 800 analog neurons, and (vi) between 12,000 and 12,500 connections.
In some implementations, the trained neural network is trained to classify different objects based on pulsed Doppler radar signal. In such cases, the neural network topology includes multi-scale LSTM neural network.
In some implementations, the trained neural network is trained to perform human activity type recognition, based on inertial sensor data. In such cases, the neural network topology includes three channel-wise convolutional networks each with a convolutional layer of 12 filters and a kernel dimension of 64, and each followed by a max pooling layer, and two common dense layers of 1024 neurons and N neurons, respectively, where N is a number of classes. In such cases, the equivalent analog network includes: (i) delay blocks to produce any number of time steps, (ii) a maximum of 100 input and output connections per analog neuron, (iii) an output layer of 10 analog neurons, (iv) signal limit of 5, (v) 10 layers, (vi) between 1,200 and 1,300 analog neurons, and (vi) between 20,000 and 21,000 connections.
In some implementations, the trained neural network is further trained to detect abnormal patterns of human activity based on accelerometer data that is merged with heart rate data using a convolution operation.
In another aspect, a method is provided for generating libraries for hardware realization of neural networks. The method includes obtaining a plurality of neural network topologies, each neural network topology corresponding to a respective neural network. The method also includes transforming each neural network topology to a respective equivalent analog network of analog components. The method also includes generating a plurality of lithographic masks for fabricating a plurality of circuits, each circuit implementing a respective equivalent analog network of analog components.
In some implementations, the method further includes obtaining a new neural network topology and weights of a trained neural network. The method also includes selecting one or more lithographic masks from the plurality of lithographic masks based on comparing the new neural network topology to the plurality of neural network topologies. The method also includes computing a weight matrix for a new equivalent analog network based on the weights. The method also includes generating a resistance matrix for the weight matrix. The method also includes generating a new lithographic mask for fabricating a circuit implementing the new equivalent analog network based on the resistance matrix and the one or more lithographic masks.
In some implementations, the new neural network topology includes a plurality of subnetwork topologies, and selecting the one or more lithographic masks is further based on comparing each subnetwork topology with each network topology of the plurality of network topologies.
In some implementations, one or more subnetwork topologies of the plurality of subnetwork topologies fails to compare with any network topology of the plurality of network topologies. In such cases, the method further includes: (i) transforming each subnetwork topology of the one or more subnetwork topologies to a respective equivalent analog subnetwork of analog components; and (ii) generating one or more lithographic masks for fabricating one or more circuits, each circuit of the one or more circuits implementing a respective equivalent analog subnetwork of analog components.
In some implementations, transforming a respective network topology to a respective equivalent analog network includes: (i) decomposing the respective network topology to a plurality of subnetwork topologies; (ii) transforming each subnetwork topology to a respective equivalent analog subnetwork of analog components; and (iii) composing each equivalent analog subnetwork to obtain the respective equivalent analog network.
In some implementations, decomposing the respective network topology includes identifying one or more layers of the respective network topology as the plurality of subnetwork topologies.
In some implementations, each circuit is obtained by: (i) generating schematics for a respective equivalent analog network of analog components; and (ii) generating a respective circuit layout design based on the schematics.
In some implementations, the method further includes combining one or more circuit layout designs prior to generating the plurality of lithographic masks for fabricating the plurality of circuits.
In another aspect, a method is provided for optimizing energy efficiency of analog neuromorphic circuits, according to some implementations. The method includes obtaining an integrated circuit implementing an analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. The analog network represents a trained neural network, each operational amplifier represents a respective analog neuron, and each resistor represents a respective connection between a respective first analog neuron and a respective second analog neuron. The method also include generating inferences using the integrated circuit for a plurality of test inputs, including simultaneously transferring signals from one layer to a subsequent layer of the analog network. The method also includes, while generating inferences using the integrated circuit: (i) determining if a level of signal output of the plurality of operational amplifiers is equilibrated; and (ii) in accordance with a determination that the level of signal output is equilibrated: (a) determining an active set of analog neurons of the analog network influencing signal formation for propagation of signals; and (turning off power for one or more analog neurons of the analog network, distinct from the active set of analog neurons, for a predetermined period of time.
In some implementations, determining the active set of analog neurons is based on calculating delays of signal propagation through the analog network.
In some implementations, determining the active set of analog neurons is based on detecting the propagation of signals through the analog network.
In some implementations, the trained neural network is a feed-forward neural network, and the active set of analog neurons belong to an active layer of the analog network, and turning off power includes turning off power for one or more layers prior to the active layer of the analog network.
In some implementations, the predetermined period of time is calculated based on simulating propagation of signals through the analog network, accounting for signal delays.
In some implementations, the trained neural network is a recurrent neural network (RNN), and the analog network further includes one or more analog components other than the plurality of operational amplifiers, and the plurality of resistors. In such cases, the method further includes, in accordance with a determination that the level of signal output is equilibrated, turning off power, for the one or more analog components, for the predetermined period of time.
In some implementations, the method further includes turning on power for the one or more analog neurons of the analog network after the predetermined period of time.
In some implementations, determining if the level of signal output of the plurality of operational amplifiers is equilibrated is based on detecting if one or more operational amplifiers of the analog network is outputting more than a predetermined threshold signal level.
In some implementations, the method further includes repeating the turning off for the predetermined period of time and turning on the active set of analog neurons for the predetermined period of time, while generating the inferences.
In some implementations, the method further includes: (i) in accordance with a determination that the level of signal output is equilibrated, for each inference cycle: (a) during a first time interval, determining a first layer of analog neurons of the analog network influencing signal formation for propagation of signals; and (b) turning off power for a first one or more analog neurons of the analog network, prior to the first layer, for the predetermined period of time; and (ii) during a second time interval subsequent to the first time interval, turning off power for a second one or more analog neurons including the first layer of analog neurons and the first one or more analog neurons of the analog network, for the predetermined period.
In some implementations, the one or more analog neurons consist of analog neurons of a first one or more layers of the analog network, and the active set of analog neurons consist of analog neurons of a second layer of the analog network, and the second layer of the analog network is distinct from layers of the first one or more layers.
According to some implementations, a method and apparatus are provided to clear voice signal from undesired noises, in order to clarify transmission for the benefit of a recipient (e.g., a human or machine interface) receiving such clarified signal. The techniques described herein can be applied at origin, at the interim transmission point(s), or near the recipient. To process the voice clarification, a neural network is trained to separate voice signal from other unwanted signals, in an input signal. Such neural network is transformed into an equivalent analog neural network using techniques described herein. Such transformed equivalent analog neural network is implemented in the form of an analog integrated circuit.
In another aspect, a method is provided for analog hardware realization of trained convolutional neural networks for voice clarity. The method includes obtaining a neural network topology and weights of a trained neural network. The method also includes transforming the neural network topology into an equivalent analog network of analog components. The method also includes computing a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents one or more connections between analog components of the equivalent analog network. For example, for dense layers, one weight matrix element represents a single connection. For convolutional layers, on the other hand, one weight matrix element represents multiple connections. To further illustrate, suppose a layer multiplies N input signals by a single weight value w. In this case, the input layer size is N, output layer size is N, and there are N connections each with weight w. In this way, one weight value represents multiple connections. The method also includes generating a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components.
In some implementations, the neural network topology includes a Fourier transformation layer and an inverse Fourier transformation layer. Fourier transformation layers are useful for any voice-based application because voice signal is based on frequencies, so frequencies can be extracted with FFT (Fast Fourier Transformation). In some implementations, FFTs are implemented using a dense layer.
In some implementations, the neural network topology includes one or more of: a convolutional layer, a max-pooling layer, and a densely connected layer.
In some implementations, the neural network topology includes a convolutional layer. In such cases, transforming the neural network topology includes: for each output of the convolutional layer: defining dependency relations between the respective output and a related subset of inputs. The related subset of inputs is defined by filters, kernel, padding, and strides parameters of the convolutional layer; and defining a respective subset of weights according to the dependency relations of the respective output; and constructing a layer of analog neurons such that (i) each analog neuron corresponds to a respective output of the convolutional layer, (ii) each analog neuron is connected to a related subset of inputs of a previous layer of analog neurons of the equivalent analog network, and (iii) incoming connections for each analog neuron are weighted according to a respective subset of weights of a corresponding output of the convolutional layer.
To illustrate these steps, consider a convolutional layer, where each output depends on a subset of inputs and each subset of inputs affects a subset of outputs. In other words, this relationship is a many-to-many relation, unlike an all-to-all relationship in a dense layer. For a single batch, suppose there is a matrix of inputs (for 1D convolution, this matrix is a two-dimensional matrix; for 2D convolution, this is a three-dimensional matrix) and a matrix of outputs with the same number of dimensions as the matrix of inputs. The first and second dimensions are sometimes called spatial dimensions, the last dimension is sometimes called channels. Suppose a kernel size (e.g., one dimensional kernel for 1D convolution, two-dimensional kernel for 2D convolution) is provided. A weight matrix is defined and sized using the equation: (kernel size times channels times filters). Here, kernel size and filters are parameters of the weight matrix and channels is a measure of inputs shape. For each filter F, a subset of the weight matrix [:, :, F] is applied over input data, and slid along spatial dimensions with a step defined by strides parameter. For 1D convolution, kernel is of size 3, spatial dimension X is 7, and stride=2. The weight matrix is applied over spatial coordinates {1,2,3}, {3,4,5}, {5,6,7} and over the channel dimension. Each time, the weight matrix is applied, a single output element is calculated. This element connections have weights as weights matrix subset and are connected to subset of input data, as described earlier. For more filters, some implementations leave the input as is and duplicate kernel and output as many times as there are filters, with different kernel values. It is noted that the description here is only provided for illustration purposes, and various other implementations are possible.
In some implementations, the neural network topology includes a max-pooling layer. In such instances, transforming the neural network topology includes generating a multi-layer network of analog neurons, for the max-pooling layer, that have maximum input counts. In some implementations, generating the multi-layer network of analog neurons includes generating a two-input schematic comprising two SNMs arranged in two layers, where an SNM of the last layer has a maximum of two inputs. In some implementations, generating the multi-layer network of analog neurons includes generating a three-input schematic comprising three SNMs arranged in three layers, where an SNM of the last layer has a maximum of three inputs. In some implementations, generating the multi-layer network of analog neurons includes generating a four-input schematic comprising four SNMs arranged in three layers, where an SNM of the last layer has a maximum of four inputs. In some implementations, the method further includes transforming the max-pooling layer into a calculation tree in which each node of the calculation tree is selected from the group consisting of: a two-input schematic comprising two SNMs arranged in two layers, where an SNM of the last layer has a maximum of two inputs; a three-input schematic comprising three SNMs arranged in three layers, where an SNM of the last layer has a maximum of three inputs; and a four-input schematic comprising four SNMs arranged in three layers, where an SNM of the last layer has a maximum of four inputs. In some implementations, the method further includes minimizing a number of layers of the calculation tree. In some implementations, the method further includes prioritizing use of the four-input SNMs over use of three-input SNMs and two-input SNMs.
In some implementations, the method further includes (i) defining an analog neuron of a last layer of the multi-layer analog network to perform an activation function other than ReLU, and (ii) defining all other neurons of the multi-layer analog network to perform ReLU without changing final output of the multi-layer network.
In some implementations, each layer of the trained neural network computes respective outputs based on a respective mathematical function. In such cases, transforming the neural network topology to the equivalent analog network of analog components includes: for each layer of the trained neural network: identifying one or more function blocks, based on the respective mathematical function, for the respective layer. Each function block has a respective schematic implementation with block outputs that conform to outputs of a respective mathematical function; and generating a respective multi-layer network of analog neurons based on arranging the one or more function blocks, wherein each analog neuron implements a respective function of the one or more function blocks, and each analog neuron of a first layer of the respective multi-layer network is connected to one or more analog neurons of a second layer of the respective multi-layer network. In some implementations, the one or more function blocks include a weighted summation block with a block output Vout=ReLU(Σwi·Viin+bias), where ReLU is a Rectified Linear Unit (ReLU) activation function or a similar activation function, Vi represents an i-th input, wi represents a weight corresponding to the i-th input, bias represents a bias value, and Σ is a summation operator. In some implementations, the one or more function blocks include a weighted summation block with a block output Vout=ReLU_X(Σwi·Viin+bias), where ReLU_X is a Rectified Linear Unit (ReLU) activation function, or a similar activation function, that limits output signal by the positive value X, Vi represents an i-th input, wi represents a weight corresponding to the i-th input, bias represents a bias value, and Σ is a summation operator.
In some implementations, the neural network topology includes a convolutional layer having K inputs and L outputs. In such cases, transforming the neural network topology to the equivalent analog network includes: deriving a possible input connection degree Ni and output connection degree No, according to one or more connection constraints based on analog integrated circuit (IC) design constraints; and transforming the convolutional layer includes decomposing the convolutional layer into a single sparsely connected layer with K inputs, L outputs, a maximum input connection degree of Pi, and a maximum output connection degree of Po, where Pi≤Ni and Po≤No.
In some implementations, the analog components include a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons. Generating the schematic model includes generating a resistance matrix from the weight matrix. Each element of the resistance matrix (i) represents a respective resistance value and (ii) corresponds to a respective weight of the weight matrix. Selecting component values of the analog components includes performing a gradient descent method to identify possible resistance values for the plurality of resistors.
In some implementations, the method further includes: generating an equivalent digital network of digital components for one or more output layers of the neural network topology; and connecting output of one or more layers of the equivalent analog network to the equivalent digital network of digital components.
In another aspect, a system is provided for hardware realization of neural networks. The system includes one or more processors, memory that stores one or more programs configured for execution by the one or more processors. The one or more programs include instructions for: obtaining a neural network topology and weights of a trained neural network; transforming the neural network topology into an equivalent analog network of analog components; computing a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents one or more connections between analog components of the equivalent analog network; and generating a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components.
In another aspect, a voice-transmission device is provided, and includes an integrated circuit for voice clarification. The integrated circuit includes an analog network of analog components fabricated by a method comprising the steps of: obtaining a neural network topology and weights of a trained neural network; transforming the neural network topology into an equivalent analog network of analog components; computing a weight matrix for the equivalent analog network based on the weights of the trained neural network, wherein each element of the weight matrix represents one or more connections between analog components of the equivalent analog network; generating a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components; and fabricating the circuit, according to the schematic model, using a lithographic process.
In some implementations of the voice-transmission device, generating the schematic model further includes: generating a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix; and generating one or more lithographic masks for fabricating the circuit implementing the equivalent analog network of analog components based on the resistance matrix.
In some implementations of the voice-transmission device, the voice-transmission device is integrated into a cell phone.
In some implementations of the voice-transmission device, input from a microphone of the cell phone is input to the integrated circuit.
In some implementations of the voice-transmission device, output from the integrated circuit is input to a speaker of the cell phone.
In some implementations of the voice-transmission device, the integrated circuit is coupled to one or more other noise cancelling devices.
In some implementations of the voice-transmission device, the integrated circuit is coupled to one or more noise reduction software programs executing on the voice transmission device.
In some implementations, a computer system has one or more processors, memory, and a display. The one or more programs include instructions for performing any of the methods described herein.
In some implementations, a non-transitory computer readable storage medium stores one or more programs configured for execution by a computer system having one or more processors, memory, and a display. The one or more programs include instructions for performing any of the methods described herein.
Thus, methods, systems, and devices are disclosed that are used for hardware realization of trained neural networks.
For a better understanding of the aforementioned systems, methods, and graphical user interfaces, as well as additional systems, methods, and graphical user interfaces that provide data visualization analytics and data preparation, reference should be made to the Description of Implementations below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
Reference will now be made to implementations, examples of which are illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without requiring these specific details.
In some implementations, components of the system 100 described above are implemented in one or more computing devices or server systems as computing modules.
Some implementations include one or more optional modules 244 as shown in
Some implementations include a lithographic mask generation module 248 that further includes lithographic masks 250 for resistances (corresponding to connections), and/or lithographic masks for analog components (e.g., operational amplifiers, multipliers, delay blocks, etc.) other than the resistances (or connections). In some implementations, lithographic masks are generated based on chip design layout following chip design using Cadence, Synopsys, or Mentor Graphics software packages. Some implementations use a design kit from a silicon wafer manufacturing plant (sometimes called a fab). Lithographic masks are intended to be used in that particular fab that provides the design kit (e.g., TSMC 65 nm design kit). The lithographic mask files that are generated are used to fabricate the chip at the fab. In some implementations, the Cadence, Mentor Graphics, or Synopsys software packages-based chip design is generated semi-automatically from the SPICE or Fast SPICE (Mentor Graphics) software packages. In some implementations, a user with chip design skill drives the conversion from the SPICE or Fast SPICE circuit into Cadence, Mentor Graphics or Synopsis chip design. Some implementations combine Cadence design blocks for single neuron unit, establishing proper interconnects between the blocks.
Some implementations include a library generation module 254 that further includes libraries of lithographic masks 256. Examples of library generation are described below in reference to
Some implementations include Integrated Circuit (IC) fabrication module 258 that further includes Analog-to-Digital Conversion (ADC), Digital-to-Analog Conversion (DAC), or similar other interfaces 260, and/or fabricated ICs or models 262. Example integrated circuits and/or related modules are described below in reference to
Some implementations include an energy efficiency optimization module 264 that further includes an inferencing module 266, a signal monitoring module 268, and/or a power optimization module 270. Examples of energy efficiency optimizations are described below in reference to
Each of the above identified executable modules, applications, or sets of procedures may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various implementations. In some implementations, the memory 214 stores a subset of the modules and data structures identified above. Furthermore, in some implementations, the memory 214 stores additional modules or data structures not described above.
Although
In the description above and below, a math neuron is a mathematical function which receives one or more weighted inputs and produces a scalar output. In some implementations, a math neuron can have memory (e.g., long short-term memory (LSTM), recurrent neuron). A trivial neuron is a math neuron that performs a function, representing an ‘ideal’ mathematical neuron, Vout=f(Σ(Viin·ωi+bias), where f(x) is an activation function. A SNM is a schematic model with analog components (e.g., operational amplifiers, resistors R1, . . . , Rn, and other components) representing a specific type of math neuron (for example, trivial neuron) in schematic form. SNM output voltage is represented by a corresponding formula that depends on K input voltages and SNM component values Vout=g(Viin, . . . , VKin, R1 . . . Rn). According to some implementations, with properly selected component values, SNM formula is equivalent to math neuron formula, with a desired weights set. In some implementations, the weights set is fully determined by resistors used in a SNM. A target (analog) neural network 304 (sometimes called a T-network) is a set of math neurons which have defined SNM representation, and weighted connections between them, forming a neural network. A T-network follows several restrictions, such as an inbound limit (a maximum limit of inbound connections for any neuron within the T-network), an outbound limit (a maximum limit of outbound connections for any neuron within the T-network), and a signal range (e.g., all signals should be inside pre-defined signal range). T-transformation (322) is a process of converting some desired neural network, such as MobileNet, to a corresponding T-network. A SPICE model 306 is a SPICE Neural Network model of a T-network 304, where each math neuron is substituted with corresponding one or more SNMs. A Cadence NN model 310 is a Cadence model of the T-network 304, where each math neuron is substituted with a corresponding one or more SNMs. Also, as described herein, two networks L and M have mathematical equivalence, if for all neuron outputs of these networks |ViL−ViM|<eps, where eps is relatively small (e.g., between 0.1-1% of operating voltage range). Also, two networks L and M have functional equivalence, if for a given validation input data set {I1, . . . , In}, the classification results are mostly the same, i.e., P(L(Ik)=M(Ik))=1−eps, where eps is relatively small.
Some implementations store the layout or the organization of the input neural networks including number of neurons in each layer, total number of neurons, operations or activation functions of each neuron, and/or connections between the neurons, in the memory 214, as the neural network topology 224.
Some implementations use Keras learning that converges in approximately 1000 iterations, and results in weights for the connections. In some implementations, the weights are stored in memory 214, as part of the weights 222. In the following example, data format is ‘Neuron [1st link weight, 2nd link weight, bias]’.
Next, to compute resistor values for connections between the neurons, some implementations compute resistor range. Some implementations set resistor nominal values (R+, R−) of 1 MΩ, possible resistor range of 100 KΩ to 1 MΩ and nominal series E24. Some implementations compute w1, w2, wbias resistor values for each connection as follows. For each weight value wi (e.g., the weights 222), some implementations evaluate all possible (Ri−, Ri+) resistor pairs options within the chosen nominal series and choose a resistor pair which produces minimal error value
The following table provides example values for the weights w1, w2, and bias, for each connection, according to some implementations.
Before describing examples of transformation, it is worth noting some of the advantages of the transformed neural networks over conventional architectures. As described herein, the input trained neural networks are transformed to pyramid- or trapezium-shaped analog networks. Some of the advantages of pyramid or trapezium over cross bars include lower latency, simultaneous analog signal propagation, possibility for manufacture using standard integrated circuit (IC) design elements, including resistors and operational amplifiers, high parallelism of computation, high accuracy (e.g., accuracy increases with the number of layers, relative to conventional methods), tolerance towards error(s) in each weight and/or at each connection (e.g., pyramids balance the errors), low RC (low Resistance Capacitance delay related to propagation of signal through network), and/or ability to manipulate biases and functions of each neuron in each layer of the transformed network. Also, pyramids are excellent computation block by itself, since it is a multi-level perceptron, which can model any neural network with one output. Networks with several outputs are implemented using different pyramids or trapezia geometry, according to some implementations. A pyramid can be thought of as a multi-layer perceptron with one output and several layers (e.g., N layers), where each neuron has n inputs and 1 output. Similarly, a trapezium is a multilayer perceptron, where each neuron has n inputs and m outputs. Each trapezium is a pyramid-like network, where each neuron has n inputs and m outputs, where n and m are limited by IC analog chip design limitations, according to some implementations.
Some implementations perform lossless transformation of any trained neural network into subsystems of pyramids or trapezia. Thus, pyramids and trapezia can be used as universal building blocks for transforming any neural networks. An advantage of pyramid- or trapezia-based neural networks is the possibility to realize any neural network using standard IC analog elements (e.g., operational amplifiers, resistors, signal delay lines in case of recurrent neurons) using standard lithography techniques. It is also possible to restrict the weights of transformed networks to some interval. In other words, lossless transformation is performed with weights limited to some predefined range, according to some implementations. Another advantage of using pyramids or trapezia is the high degree of parallelism in signal processing or the simultaneous propagation of analog signals that increases the speed of calculations, providing lower latency. Moreover, many modern neural networks are sparsely connected networks and are much better (e.g., more compact, have low RC values, absence of leakage currents) when transformed into pyramids than into cross-bars, Pyramids and trapezia networks are relatively more compact than cross-bar based memristor networks.
Furthermore, analog neuromorphic trapezia-like chips possess a number of properties, not typical for analog devices. For example, signal to noise ratio is not increasing with the number of cascades in analog chip, the external noise is suppressed, and influence of temperature is greatly reduced. Such properties make trapezia-like analog neuromorphic chips analogous to digital circuits. For example, individual neurons, based on operational amplifier, level the signal and are operated with the frequencies of 20,000-100,000 Hz, and are not influenced by noise or signals with frequency higher than the operational range, according to some implementations. Trapezia-like analog neuromorphic chip also perform filtration of output signal due to peculiarities in how operational amplifiers function. Such trapezia-like analog neuromorphic chip suppresses the synphase noise. Due to low-ohmic outputs of operational amplifiers, the noise is also significantly reduced. Due to the leveling of signal at each operational amplifier output and synchronous work of amplifiers, the drift of parameters, caused by temperature does not influence the signals at final outputs. Trapezia-like analogous neuromorphic circuit is tolerant towards the errors and noise in input signals and is tolerant towards deviation of resistor values, corresponding to weight values in neural network. Trapezia-like analog neuromorphic networks are also tolerant towards any kind of systemic error, like error in resistor value settings, if such error is same for all resistors, due to the very nature of analog neuromorphic trapezia-like circuits, based on operational amplifiers.
In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
Example Transformations with Target Neurons with N Inputs and 1 Output
In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or analog design constraints 236, to obtain the transformed neural networks 228.
Single Layer Perceptron with One Output
Suppose a single layer perceptron SLP(K, l) includes K inputs and one output neuron with activation function F. Suppose further U∈RK is a vector of weights for SLP(K, l). The following algorithm Neuron2TNN1 constructs a T-neural network from T-neurons with N inputs and 1 output (referred to as TN(N, l)).
1. Construct an input layer for T-NN by including all inputs from SLP(K,l).
2. If K>N then:
3. Else (i.e., if K<=N) then):
w
j
1
=u
j
,j=1, . . . ,K
4. Set l=1
5. If ml>N:
6. Else (if m>=N):
w
j
l+1=1 a.
7. Repeat steps 5 and 6.
Here ┌x┐—minimum integer number being no less than x. Number of layers in T-NN constructed by means of the algorithm Neuron2TNN1 is h=┌logNK┐. The total number of weights in T-NN is:
Layer 1 (e.g., layer 1002):
Layers i=2, 3, . . . , h (e.g., layers 1004, 1006,1008, and 1010):
Output value of the T-NN is calculated according the following formula:
y=F(WmWm-1 . . . W2W1x)
Output for the first layer is calculated as an output vector according to the following formula:
Multiplying the obtained vector by the weight matrix of the second layer:
Every subsequent layer outputs a vector with components equal to linear combination of some sub-vector of x.
Finally, the T-NN's output is equal to:
This is the same value as the one calculated in SLP(K,l) for the same input vector x. So output values of SLP(K,l) and constructed T-NN are equal.
Single Layer Perceptron with Several Outputs
Suppose there is a single layer perceptron SLP(K, L) with K inputs and L output neurons, each neuron performing an activation function F. Suppose further U∈RL×K is a weight matrix for SLP(K, L). The following algorithm Layer2TNN1 constructs a T-neural network from neurons TN(N, l).
1. For every output neuron i=1, . . . ,L
2. Construct PTNN by composing all TNNi into one neural net:
Output of the PTNN is equal to the SLP(K, L)'s output for the same input vector because output of every pair SLPi(K, l) and TNNi are equal.
Suppose a multilayer perceptron (MLP) includes K inputs, S layers and Li calculation neurons in i-th layer, represented as MLP(K, S, LI, . . . LS). Suppose Ui∈RL
The following is an example algorithm to construct a T-neural network from neurons TN(N, l), according to some implementations.
1. For every layer i=1, . . . ,S
2. Construct MTNN by stacking all PTNNi into one neural net; output of a TNNi-1 is set as input for TNNi.
Output of the MTNN is equal to the MLP(K, S, Ll, . . . LS)'s output for the same input vector because output of every pair SLPi(Li-1, Li) and PTNNi are equal.
Example T-Transformations with Target Neurons with NI Inputs and NO Outputs
In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
Example Transformation of Single Layer Perceptron with Several Outputs
Suppose a single layer perceptron SLP(K, L) includes K inputs and L output neurons, each neuron performing an activation function F. Suppose further U∈RL×K is a weight matrix for SLP(K,L). The following algorithm constructs a T-neural network from neurons TN(NI, NO), according to some implementations.
According to some implementations, output of the PTNNX is calculated by means of the same formulas as for PTNN (described above), so the outputs are equal.
Suppose a multilayer perceptron (MLP) includes K inputs, S layers and Li calculation neurons in ith layer, represented as MLP(K, S, Ll, . . . LS). Suppose Ui∈RL
1. For every layer i=1, . . . ,S:
2. Construct MTNNX by stacking all PTNNXi into one neural net:
According to some implementations. output of the MTNNX is equal to the MLP(K, S, Ll, . . . LS)'s output for the same input vector, because output of every pair SLPi(Li-1, Li) and PTNNXi are equal.
A Recurrent Neural Network (RNN) contains backward connection allowing saving information.
Data processing in an RNN is performed by means of the following formula:
h
t
=f(W(hh)ht-1+W(hx)xt)
In the equation above, xt is a current input vector, and ht-1 is the RNN's output for the previous input vector xt-1. This expression consists of the several operations: calculation of linear combination for two fully connected layers W(hh)ht_1 and W(hx)xt, element-wise addition, and non-linear function calculation (f). The first and third operations can be implemented by trapezium-based network (one fully connected layer is implemented by pyramid-based network, a special case of trapezium networks). The second operation is a common operation that can be implemented in networks of any structure.
In some implementations, the RNN's layer without recurrent connections is transformed by means of Layer2TNNX algorithm described above. After transformation is completed, recurrent links are added between related neurons. Some implementations use delay blocks described below in reference to
A Long Short-Term Memory (LSTM) neural network is a special case of a RNN. A LSTM network's operations are represented by the following equations:
f
t=σ(Wf[ht-1,xt]+bf);
i
t=σ(Wi[ht-1,xt]+bi);
D
t=tanh(WD[ht-1,xt]+bD);
C
t=(ft×Ct-1+it×Dt);
o
t=σ(Wo[ht-1,xt]+bo); and
h
t
=o
t×tanh(Ct).
In the equations above, Wf, Wi, WD, and WO are trainable weight matrices, bf, bi, bD, and bO are trainable biases, xt is a current input vector, ht-1 is an internal state of the LSTM calculated for the previous input vector xt-1, and ot is output for the current input vector. In the equations, the subscript t denotes a time instance t, and the subscript t−1 denotes a time instance t−1.
There are several types of operations utilized in these expressions: (i) calculation of linear combination for several fully connected layers, (ii) elementwise addition, (iii) Hadamard product, and (iv) non-linear function calculation (e.g., sigmoid (σ) and hyperbolic tangent (tanh)). Some implementations implement the (i) and (iv) operations by a trapezium-based network (one fully connected layer is implemented by a pyramid-based network, a special case of trapezium networks). Some implementations use networks of various structures for the (ii) and (iii) operations which are common operations.
The layer in an LSTM layer without recurrent connections is transformed by using the Layer2TNNX algorithm described above, according to some implementations. After transformation is completed, recurrent links are added between related neurons, according to some implementations.
A Gated Recurrent Unit) (GRU) neural network is a special case of RNN. A RNN's operations are represented by the following expressions:
z
t=σ(Wzxt+Uzht-1);
r
t=σ(Wrxt+Urht-1);
j
t=tanh(Wxt+rt×Uht-1);
h
t
=z
t
×h
t-1+(1−zt)×jt).
In the equations above, xt is a current input vector, and ht-1 is an output calculated for the previous input vector xt-1.
Operation types used in GRU are the same as the operation types for LSTM networks (described above), so GRU is transformed to trapezium-based networks following the principles described above for LSTM (e.g., using the Layer2TNNX algorithm), according to some implementations.
In general, Convolutional Neural Networks (CNN) include several basic operations, such as convolution (a set of linear combinations of image's (or internal map's) fragments with a kernel), activation function, and pooling (e.g., max, mean, etc.). Every calculation neuron in a CNN follows the general processing scheme of a neuron in an MLP: linear combination of some inputs with subsequent calculation of activation function. So a CNN is transformed using the MLP2TNNX algorithm described above for multilayer perceptrons, according to some implementations.
Conv1D is a convolution performed over time coordinate.
In some implementations, convolutional layers are represented by trapezia-like neurons and fully connected layer is represented by cross-bar of resistors. Some implementations use cross-bars, and calculate resistance matrix for the cross-bars.
Example Approximation Algorithm for Single Layer Perceptron with Multiple Outputs
In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, and/or the analog neural network optimization module 246, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
Suppose a single layer perceptron SLP(K, L) includes K inputs and L output neurons, each output neuron performing an activation function F. Suppose further that U∈RL×K is a weight matrix for SLP(K, L). The following is an example for constructing a T-neural network from neurons TN(NI, NO) using an approximation algorithm Layer2TNNX_Approx, according to some implementations. The algorithm applies Layer2TNN1 algorithm (described above) at the first stage in order to decrease a number of neurons and connections, and subsequently applies Layer2TNNX to process the input of the decreased size. The outputs of the resulted neural net are calculated using shared weights of the layers constructed by the Layer2TNN1 algorithm. The number of these layers is determined by the value p, a parameter of the algorithm. If p is equal to 0 then Layer2TNNX algorithm is applied only and the transformation is equivalent. If p>0, then p layers have shared weights and the transformation is approximate.
Approximation Algorithm for Multilayer Perceptron with Several Outputs
Suppose a multilayer perceptron (MLP) includes K inputs, S layers and Li calculation neurons in i-th layer, represented as MLP(K, S, Ll, . . . LS). Suppose further Ui∈RL
In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, and/or the analog neural network optimization module 246, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
This section describes example methods of compression of transformed neural networks, according to some implementations. Some implementations compress analog pyramid-like neural networks in order to minimize the number of operational amplifiers and resistors, necessary to realize the analog network on chip. In some implementations, the method of compression of analog neural networks is pruning, similar to pruning in software neural networks. There is nevertheless some peculiarities in compression of pyramid-like analog networks, which are realizable as IC analog chip in hardware. Since the number of elements, such as operational amplifiers and resistors, define the weights in analog based neural networks, it is crucial to minimize the number of operational amplifiers and resistors to be placed on chip. This will also help minimize the power consumption of the chip. Modern neural networks, such as convolutional neural networks, can be compressed 5-200 times without significant loss of the accuracy of the networks. Often, whole blocks in modern neural networks can be pruned without significant loss of accuracy. The transformation of dense neural networks into sparsely connected pyramid or trapezia or cross-bar like neural networks presents opportunities to prune the sparsely connected pyramid or trapezia-like analog networks, which are then represented by operational amplifiers and resistors in analog IC chips. In some implementations, such techniques are applied in addition to conventional neural network compression techniques. In some implementations, the compression techniques are applied based on the specific architecture of the input neural network and/or the transformed neural networks (e.g., pyramids versus trapezia versus cross-bars).
For example, since the networks are realized by means of analog elements, such as operational amplifiers, some implementations determine the current which flows through the operational amplifier when the standard training dataset is presented, and thereby determine if a knot (an operational amplifier) is needed for the whole chip or not. Some implementations analyze the SPICE model of the chip and determine the knots and connections, where no current is flowing and no power is consumed. Some implementations determine the current flow through the analog IC network and thus determine the knots and connections, which are then pruned. Besides, some implementations also remove the connections if the weight of connection is too high, and/or substitute resistor to direct connector if the weight of connection is too low. Some implementations prune the knot if all connections leading to this knot have weights that are lower than a predetermined threshold (e.g., close to 0), deleting the connections where an operational amplifier always provides zero at output, and/or changing an operational amplifier to a linear junction if the amplifier gives linear function without amplification.
Some implementations apply compression techniques specific to pyramid, trapezia, or cross-bar types of neural networks. Some implementations generate pyramids or trapezia with larger amount of inputs (than without the compression), thus minimizing the number of layers in pyramid or trapezia. Some implementations generate a more compact trapezia network by maximizing the number of outputs of each neuron.
In some implementations, the example computations described herein are performed by the weight matrix computation or weight quantization module 238 (e.g., using the resistance calculation module 240) that compute the weights 272 for connections of the transformed neural networks, and/or corresponding resistance values 242 for the weights 272.
This section describes an example of generating an optimal resistor set for a trained neural network, according to some implementations. An example method is provided for converting connection weights to resistor nominals for implementing the neural network (sometimes called a NN model) on a microchip with possibly less resistor nominals and possibly higher allowed resistor variance.
Suppose a test set ‘Test’ includes around 10,000 values of input vector (x and y coordinates) with both coordinates varying in the range [0; 1], with a step of 0.01. Suppose network NN output for given input X is given by Out=NN(X). Suppose further that input value class is found as follows: Class_nn(X)=NN(X)>0.61 ? 1: 0.
The following compares a mathematical network model M with a schematic network model S. The schematic network model includes possible resistor variance of rv and processes the ‘Test’ set, each time producing a different vector of output values S(Test)=Out_s. Output error is defined by the following equation:
Classification error is defined by the following equation:
Some implementations set the desired classification error as no more than 1%.
Suppose another network O produces output values with a constant shift versus relevant M output values, there would be classification error between O and M. To keep the classification error below 1%, this shift should be in the range of [−0.045, 0.040]. Thus, possible output error for S is 45 mV.
Possible weight error is determined by analyzing dependency between weight/bias relative error over the whole network and output error. The charts 1710 and 1720 shown in
A resistor set together with a {R+, R−} pair chosen from this set has a value function over the required weight range [−wlim; wlim] with some degree of resistor error r_err. In some implementations, value function of a resistor set is calculated as follows:
Some implementations iteratively search for an optimal resistor set by consecutively adjusting each resistor value in the resistor set on a learning rate value. In some implementations, the learning rate changes over time. In some implementations, an initial resistor set is chosen as uniform (e.g., [1; 1; . . . ; 1]), with minimum and maximum resistor values chosen to be within two orders of magnitude range (e.g., [1; 100] or [0.1; 10]). Some implementation choose R+=R−. In some implementations, the iterative process converges to a local minimum. In one case, the process resulted in the following set: [0.17, 1.036, 0.238, 0.21, 0.362, 1.473, 0.858, 0.69, 5.138, 1.215, 2.083, 0.275]. This is a locally optimal resistor set of 12 resistors for the weight range [−2; 2] with rmin=0.1 (minimum resistance), rmax=10 (maximum resistance), and r_err=0.001 (an estimated error in the resistance). Some implementations do not use the whole available range [rmin; rmax] for finding a good local optimum. Only part of the available range (e.g., in this case [0.17; 5.13]) is used. The resistor set values are relative, not absolute. Is this case, relative value range of 30 is enough for the resistor set.
In one instance, the following resistor set of length 20 is obtained for abovementioned parameters: [0.300, 0.461, 0.519, 0.566, 0.648, 0.655, 0.689, 0.996, 1.006, 1.048, 1.186, 1.222, 1.261, 1.435, 1.488, 1.524, 1.584, 1.763, 1.896, 2.02]. In this example, the value 1.763 is also the R−=R+ value. This set is subsequently used to produce weights for NN, producing corresponding model S. The model S's mean square output error was 11 mV given the relative resistor error is close to zero, so the set of 20 resistors is more than required. Maximum error over a set of input data was calculated to be 33 mV. In one instance, S, DAC, and ADC converters with 256 levels were analyzed as a separate model, and the result showed 14 mV mean square output error and 49 mV max output error. An output error of 45 mV on NN corresponds to a relative recognition error of 1%. The 45 mV output error value also corresponds to 0.01 relative or 0.01 absolute weight error, which is acceptable. Maximum weight modulus in NN is 1.94. In this way, the optimal (or near optimal) resistor set is determined using the iterative process, based on desired weight range [−wlim; wlim], resistors error (relative), and possible resistors range.
Typically, a very broad resistor set is not very beneficial (e.g., between 1-⅕ orders of magnitude is enough) unless different precision is required within different layers or weight spectrum parts. For example, suppose weights are in the range of [0, 1], but most of the weights are in the range of [0, 0.001], then better precision is needed within that range. In the example described above, given the relative resistor error is close to zero, the set of 20 resistors is more than sufficient for quantizing the NN network, with given precision. In one instance, on a set of resistors [0.300, 0.461, 0.519, 0.566, 0.648, 0.655, 0.689, 0.996, 1.006, 1.048, 1.186, 1.222, 1.261, 1.435, 1.488, 1.524, 1.584, 1.763, 1.896, 2.02] (note values are relative), an average S output error of 11 mV was obtained.
In some implementations, the example computations described herein are performed by the weight matrix computation or weight quantization module 238 (e.g., using the resistance calculation module 240) that compute the weights 272 for connections of the transformed neural networks, and/or corresponding resistance values 242 for the weights 272.
This section describes an example process for quantizing resistor values corresponding to weights of a trained neural network, according to some implementations. The example process substantially simplifies the process of manufacturing chips using analog hardware components for realizing neural networks. As described above, some implementations use resistors to represent neural network weights and/or biases for operational amplifiers that represent analog neurons. The example process described here specifically reduces the complexity in lithographically fabricating sets of resistors for the chip. With the procedure of quantizing the resistor values, only select values of resistances are needed for chip manufacture. In this way, the example process simplifies the overall process of chip manufacture and enables automatic resistor lithographic mask manufacturing on demand.
The following equations determine the weights, based on resistor values:
Voltage at the output of neuron is determined by the following equation:
The weights of each connection are determined by following equation:
The following example optimization procedure quantizes the values of each resistance and minimize the error of neural network output, according to some implementations:
Some implementations use an iterative approach for resistor set search. Some implementations select an initial (random or uniform) set {R1, . . . , Rn} within the defined range. Some implementations select one of the elements of the resistor set as a R−=R+ value. Some implementations alter each resistor within the set by a current learning rate value until such alterations produce ‘better’ set (according to a value function). This process is repeated for all resistors within the set and with several different learning rate values, until no further improvement is possible.
Some implementations define the value function of a resistor set as follows:
Suppose the required weight range [−wlim; wlim] for a model is set to [−5; 5], and the other parameters include N=20, r_err=0.1%, rmin=100 KΩ, rmax=5 Ma Here, rmin and rmax are minimum and maximum values for resistances, respectively.
In one instance, the following resistor set of length 20 was obtained for abovementioned parameters: [0.300, 0.461, 0.519, 0.566, 0.648, 0.655, 0.689, 0.996, 1.006, 1.048, 1.186, 1.222, 1.261, 1.435, 1.488, 1.524, 1.584, 1.763, 1.896, 2.02] MΩ. R−=R+=1.763 Ma
Some implementations determine Rn and Rp using an iterative algorithm such as the algorithm described above. Some implementations set Rp=Rn (the tasks to determine Rn and Rp are symmetrical—the two quantities typically converge to a similar value). Then for each weight wi, some implementations select a pair of resistances {Rni, Rpi} that minimizes the estimated weight error value:
Some implementations subsequently use the {Rni; Rpi; Rn; Rp} values set to implement neural network schematics. In one instance, the schematics produced mean square output error (sometimes called S mean square output error, described above) of 11 mV and max error of 33 mV over a set of 10,000 uniformly distributed input data samples, according to some implementations. In one instance, S model was analyzed along with digital-to-analog converters (DAC), analog-to-digital converters (ADC), with 256 levels as a separate model. The model produced 14 mV mean square output error and 49 mV max output error on the same data set, according to some implementations. DAC and ADC have levels because they convert analog value to bit value and vice-versa. 8 bits of digital value is equal to 256 levels. Precision cannot be better than 1/256 for 8-bit ADC.
Some implementations calculate the resistance values for analog IC chips, when the weights of connections are known, based on Kirchhoff's circuit laws and basic principles of operational amplifiers (described below in reference to
Some implementations manufacture resistors in a lithography layer where resistors are formed as cylindrical holes in the SiO2 matrix and the resistance value is set by the diameter of hole. Some implementations use amorphous TaN, TiN of CrN or Tellurium as the highly resistive material to make high density resistor arrays. Some ratios of Ta to N Ti to N and Cr to N provide high resistance for making ultra-dense high resistivity elements arrays. For example, for TaN, Ta5N6, Ta3N5, the higher the N ratio to Ta, the higher is the resistivity. Some implementations use Ti2N, TiN, CrN, or Cr5N, and determine the ratios accordingly. TaN deposition is a standard procedure used in chip manufacturing and is available at all major Foundries.
In some implementations, operational amplifiers such as the example described above are used as the basic element of integrated circuits for hardware realization of neural networks. In some implementations, the operational amplifiers are of the size of 40 square microns and fabricated according to 45 nm node standard.
In some implementations, activation functions, such as ReLU, Hyperbolic Tangent, and Sigmoid functions are represented by operational amplifiers with modified output cascade. For example, RELU, Sigmoid, or Tangent function is realized as an output cascade of an operational amplifier (sometimes called OpAmp) using corresponding well-known analog schematics, according to some implementations.
In the examples described above and below, in some implementations, the operational amplifiers are substituted by inverters, current mirrors, two-quadrant or four quadrant multipliers, and/or other analog functional blocks, that allow weighted summation operation.
The outputs of modules X2 20080 (
Referring to
Referring back to
Similar transformations that occur with the signals include:
The current mirror (transistors M1 21052, M2 21053, M3 21054, and M4 21056) powers the portion of the four quadrant multiplier circuit shown on the left, made with transistors M5 21058, M6 21060, M7 21062, M8 21064, M9 21066, and M10 21068. Current mirrors (on transistors M25 21098, M26 21100, M27 21102, and M28 21104) power supply of the right portion of the four-quadrant multiplier, made with transistors M29 21106, M30 21108, M31 21110, M32 21112, M33 21114, and M34 21116. The multiplication result is taken from the resistor Ro 21022 enabled in parallel to the transistor M3 21054 and the resistor Ro 21188 enabled in parallel to the transistor M28 21104, supplied to the adder on U3 21044. The output of U3 21044 is supplied to an adder with a gain of 7,1, assembled on U5 21048, the second input of which is compensated by the reference voltage set by resistors R1 21024 and R2 21026 and the buffer U4 21046, as shown in
The sigmoid function is formed by adding the corresponding reference voltages on a differential module assembled on the transistors M1 2266 and M2 2268. A current mirror for a differential stage is assembled with active regulation operational amplifier U3 2254, and the NMOS transistor M3 2270. The signal from the differential stage is removed with the NMOS transistor M2 and resistor R5 2220 is input to the adder U2 2252. The output signal sigm_out 2210 is removed from the U2 adder 2252 output.
The weights of the connections of a single neuron (with two inputs and one output) are set by the resistor ratio: w1=(R feedback/R1+)−(R feedback/R1−); w2=(R feedback/R2+)−(R feedback/R2−); wbias=(R feedback/Rbias+)−(R feedback/Rbias−); w1=(R p*K amp/R1+)−(R n*K amp/R1−); w2=(R p*K amp/R2+)−(R n*K amp/R2−); wbias=(R p*K amp/Rbias+)−(R n*K amp/Rbias−), where K amp=R1ReLU/R2ReLU. R feedback=100 k—used only for calculating w1, w2, wbias. According to some implementations, example values include: R feedback=100 k, Rn=Rp=Rcom=10 k, K amp ReLU=1+90 k/10 k=10, w1=(10 k*10/22.1 k)−(10 k*10/21.5 k)=−0.126276, w2=(10 k*10/75 k)−(10 k*10/71.5 k)=−0.065268, wbias=(10 k*10/71.5 k)−(10 k*10/78.7 k)=0.127953.
The input of the negative link adder of the neuron (M1-M17) is received from the positive link adder of the neuron (M17-M32) through the Rcom resistor.
The method also includes transforming (2710) the neural network topology to an equivalent analog network of analog components. Referring next to
Referring next to
represents an input, and A and B are predetermined coefficient values (e.g., A=−0.1; B=11.3) of the sigmoid activation block; (iv) a hyperbolic tangent activation block (2742) with a block output Vout=A*tanh (B*Vin). Vin represents an input, and A and B are predetermined coefficient values (e.g., A=0.1, B=−10.1); and a signal delay block (2744) with a block output U(t)=V(t−dt). t represents a current time-period, V(t−1) represents an output of the signal delay block for a preceding time period t−1, and dt is a delay value.
Referring now back to
Referring now back to
Referring next to
Referring next to
Referring now back to
The method also includes generating (2714) a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components. Referring next to
Referring next to
Referring now back to
Referring now back to
The method also includes calculating (28008) one or more connection constraints based on analog integrated circuit (IC) design constraints (e.g., the constraints 236). For example, IC design constraints can set the current limit (e.g., 1A), and neuron schematics and operational amplifier (OpAmp) design can set the OpAmp output current in the range [0-10 mA], so this limits output neuron connections to 100. This means that the neuron has 100 outputs which allow the current to flow to the next layer through 100 connections, but current at the output of the operational amplifier is limited to 10 mA, so some implementations use a maximum of 100 outputs (0.1 mA times 100=10 mA). Without this constraint, some implementations use current repeaters to increase number of outputs to more than 100, for example.
The method also includes transforming (28010) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent sparsely connected network of analog components satisfying the one or more connection constraints.
In some implementations, transforming the neural network topology includes deriving (28012) a possible input connection degree Ni and output connection degree No, according to the one or more connection constraints.
Referring next to
Referring next to
Referring next to
Referring next to
Referring back to
Referring now to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
analog neurons performing identity activation function, and a layer LAo with L analog neurons performing the activation function F, such that each analog neuron in the layer LAp has NO outputs, each analog neuron in the layer LAh has not more than NI inputs and NO outputs, and each analog neuron in the layer LAo has NI inputs. In some such cases, computing (28148) the weight matrix for the equivalent sparsely connected network includes generating (2850) a sparse weight matrices Wo and Wh by solving a matrix equation Wo·Wh=W that includes K·L equations in K·NO+L·NI variables, so that the total output of the layer LAo is calculated using the equation Yo=F(Wo·Wh·x). The sparse weight matrix Wo∈RK×M represents connections between the layers LAp and LAh, and the sparse weight matrix Wh∈RM×L represents connections between the layers LAh and LAo.
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
for all weights j of the neuron except ki; and (ii) setting all other weights of the pyramid neural network to 1; and (ii) generating (28194) weights for the trapezium neural network including (i) setting weights of each neuron i of the first layer of the trapezium neural network (considering the whole net, this is (p+1)th layer) according to the equation
and (ii) setting other weights of the trapezium neural network to 1.
Referring next to
Referring back to
The method includes obtaining (2906) a neural network topology (e.g., the topology 224) and weights (e.g., the weights 222) of a trained neural network (e.g., the networks 220). In some implementations, weight quantization is performed during training. In some implementations, the trained neural network is trained (2908) so that each layer of the neural network topology has quantized weights (e.g., a particular value from a list of discrete values; e.g., each layer has only 3 weight values of +1, 0, −1).
The method also includes transforming (2910) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons.
The method also includes computing (2912) a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection.
The method also includes generating (2914) a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix and represents a resistance value.
Referring next to
within the range [−Rbase, Rbase] for all combinations of {Ri,Rj} within the limited length set of resistance values. In some implementations, weight values are outside this range, but the square average distance between weights within this range is minimum; (iii) selecting (2922) a resistance value R+=R−, from the limited length set of resistance values, either for each analog neuron or for each layer of the equivalent analog network, based on maximum weight of incoming connections and bias wmax of each neuron or for each layer of the equivalent analog network, such that R+=R− is the closest resistor set value to Rbase*Wmax. In some implementations, R+ and R− are chosen (2924) independently for each layer of the equivalent analog network. In some implementations, R+ and R− are chosen (2926) independently for each analog neuron of the equivalent analog network; and (iv) for each element of the weight matrix, selecting (2928) a respective first resistance value R1 and a respective second resistance value R2 that minimizes an error according to equation
for all possible values of R1 and R2 within the predetermined range of possible resistance values. w is the respective element of the weight matrix, and rerr is a predetermined relative tolerance value for the possible resistance values.
Referring next to
Referring next to
Referring next to
Referring next to
The method includes obtaining (3006) a neural network topology (e.g., the topology 224) and weights (e.g., the weights 222) of a trained neural network (e.g., the networks 220).
The method also includes transforming (3008) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons.
Referring next to
Referring next to
Referring back to
Referring next to
Referring next to
Referring back to
The method also includes pruning (3014) the equivalent analog network to reduce number of the plurality of operational amplifiers or the plurality of resistors, based on the resistance matrix, to obtain an optimized analog network of analog components.
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
The method also includes transforming (3106) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors (for recurrent neural networks, also use signal delay lines, multipliers, Tanh analog block, Sigmoid Analog Block). Each operational amplifier represents a respective analog neuron, and each resistor represents a respective connection between a respective first analog neuron and a respective second analog neuron.
The method also includes computing (3108) a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection.
The method also includes generating (3110) a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix.
The method also includes generating (3112) one or more lithographic masks (e.g., generating the masks 250 and/or 252 using the mask generation module 248) for fabricating a circuit implementing the equivalent analog network of analog components based on the resistance matrix, and fabricating (3114) the circuit (e.g., the ICs 262) based on the one or more lithographic masks using a lithographic process.
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Some implementations include components that are not integrated into the chip (i.e., these are external elements, connected to the chip) selected from the group consisting of: voice recognition, video signal processing, image sensing, temperature sensing, pressure sensing, radar processing, LIDAR processing, battery management, MOSFET circuits current and voltage, accelerometers, gyroscopes, magnetic sensors, heart rate sensors, gas sensors, volume sensors, liquid level sensors, GPS satellite signal, human body conductance sensor, gas flow sensor, concentration sensor, pH meter, and IR vision sensors.
Examples of analog neuromorphic integrated circuits manufactured according to the processes described above are provided in the following section, according to some implementations.
In some implementations, a neuromorphic IC is manufactured according to the processes described above. The neuromorphic IC can be used for keyword spotting.
The input network is a neural network with 2-D Convolutional and 2-D Depthwise Convolutional layers, with input audio mel-spectrogram of size 49 times 10. In some implementations, the network includes 5 convolutional layers, 4 depthwise convolutional layers, an average pooling layer, and a final dense layer.
In some implementations, the networks are pre-trained to recognize 10 short spoken keywords (yes”, “no”, “up”, “down”, “left”, “right”, “on”, “off”, “stop”, “go”) from Google Speech Commands Dataset, with a recognition accuracy of 94.4%.
In some implementations, the Integration Circuit is manufactured based on Depthwise Separable Convolutional Neural Network (DS-CNN) for the voice command identification. In some implementations, the original DS-CNN network is T-transformed with following parameters: maximum input and output connections per neuron=100, signal limit of 5. In some implementations, the resulting T-network had following properties: 13 layers, approximately 72,000 neurons, and approximately 2.6 million connections.
In one instance, a keyword spotting network is transformed to a T-network, according to some implementations. The network is a neural network of 2-D Convolutional and 2-D Depthwise Convolutional layers, with input audio spectrogram of size 49×10. Network consists of 5 convolutional layers, 4 depthwise convolutional layers, average pooling layer and final dense layer. Network is pre-trained to recognize 10 short spoken keywords (yes”, “no”, “up”, “down”, “left”, “right”, “on”, “off”, “stop”, “go”) from Google Speech Commands Dataset https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html. There are 2 additional classes which correspond to ‘silence’ and ‘unknown’. Network output is a softmax of length 12.
The trained neural network (input to the transformation) had a recognition accuracy of 94.4%, according to some implementations. In the neural network topology, each convolutional layer is followed with BatchNorm layer and ReLU layer, and ReLU activations are unbounded, and included around 2.5 million multiply-add operations.
After transformation, the transformed analog network was tested with a test set of 1000 samples (100 of each spoken command). All test samples are also used as test samples in the original dataset. Original DS-CNN network gave close to 5.7% recognition error for this test set. Network was converted to a T-network of trivial neurons. BatchNormalization layers in ‘test’ mode produce simple linear signal transformation, so can be interpreted as weight multiplier+some additional bias. Convolutional, AveragePooling and Dense layers are T-transformed quite straightforwardly. Softmax activation function was not implemented in T-network and was applied to T-network output separately.
Resulting T-network had 12 layers including an Input layer, approximately 72,000 neurons and approximately 2.5 million connections.
Various examples for setting network limitations for the transformed network are described herein, according to some implementations. For signal limit, as ReLU activations used in the network are unbounded, and some implementations use a signal limit on each layer.
This could potentially affect mathematical equivalence. For this, some implementations use a signal limit of 5 on all layers which corresponds to power voltage of 5 in relation to input signal range.
For quantizing the weights, some implementations use a nominal set of 30 resistors [0.001, 0.003, 0.01, 0.03, 0.1, 0.324, 0.353, 0.436, 0.508, 0.542, 0.544, 0.596, 0.73, 0.767, 0.914, 0.985, 0.989, 1.043, 1.101, 1.149, 1.157, 1.253, 1.329, 1.432, 1.501, 1.597, 1.896, 2.233, 2.582, 2.844].
Some implementations select R− and R+ values (see description above) separately for each layer. For each layer, some implementations select a value which delivers most weight accuracy. In some implementations, subsequently all the weights (including bias) in the T-network are quantized (e.g., set to the closest value which can be achieved with the input or chosen resistors).
Some implementations convert the output layer as follows. Output layer is a dense layer that does not have ReLU activation. The layer has softmax activation which is not implemented in T-conversion and is left for digital part, according to some implementations. Some implementations perform no additional conversion.
A modular structure of converted neural networks is described herein, according to some implementations. Each module of a modular type neural network is obtained after transformation of (a whole or a part of) one or more trained neural network. In some implementations, the one or more trained neural networks is subdivided into parts, and then subsequently transformed into an equivalent analog network. Modular structure is typical for some of the currently used neural networks, and modular division of neural networks corresponds to a trend in neural network development. Each module can have an arbitrary number of inputs or connections of input neurons to output neurons of a connected module, and an arbitrary number of outputs connected to input layers of a subsequent module. In some implementations, a library of preliminary (or a seed list of) transformed modules is developed, including lithographic masks for manufacture of each module. A final chip design is obtained as a combination of (or by connecting) preliminary developed modules. Some implementations perform commutation between the modules. In some implementations, the neurons and connections within the module are translated into chip design using ready-made module design templates. This significantly simplifies the manufacture of the chip, accomplished by just connecting corresponding modules.
Some implementations generate libraries of ready-made T-converted neural networks and/or T-converted modules. For example, a layer of CNN network is a modular building block, LSTM chain is another building block, etc. Larger neural networks NNs also have modular structure (e.g., LSTM module and CNN module). In some implementations, libraries of neural networks are more than by-products of the example processes, and can be sold independently. For example, a third-party can manufacture a neural network starting with the analog circuits, schematics, or designs in the library (e.g., using CADENCE circuits, files and/or lithography masks). Some implementations generate T-converted neural networks (e.g., networks transformable to CADENCE or similar software) for typical neural networks, and the converted neural networks (or the associated information) are sold to a third-party. In some instances, a third-party chooses not to disclose structure and/or purpose of the initial neural network, but uses the conversion software (e.g., SDK described above) to converts the initial network into trapezia-like networks and passes the transformed networks to a manufacturer to the fabricate the transformed network, with a matrix of weights obtained using one of the processes described above, according to some implementations. As another example, where the library of ready-made networks are generated according to the processes described herein, corresponding lithographic masks are generated and a customer can train one of the available network architectures for his task, perform lossless transformation (sometimes called T transformation) and provide the weights to a manufacturer for fabricating a chip for the trained neural networks.
In some implementations, the modular structure concept is also used in the manufacture of multi-chip systems or the multi-level 3D chips, where each layer of the 3D chip represents one module. The connections of outputs of modules to the inputs of connected modules in case of 3D chips will be made by standard interconnects that provide ohmic contacts of different layers in multi-layer 3D chip systems. In some implementations, the analog outputs of certain modules is connected to analog inputs of connected modules through interlayer interconnects. In some implementations, the modular structure is used to make multi-chip processor systems as well. A distinctive feature of such multi-chip assemblies is the analog signal data lines between different chips. The analog commutation schemes, typical for compressing several analog signals into one data line and corresponding de-commutation of analog signals at receiver chip, is accomplished using standard schemes of analog signal commutation and de-commutation, developed in analog circuitry.
One main advantage of a chip manufactured according to the techniques described above, is that analog signal propagation can be broadened to multi-layer chips or multi-chip assemblies, where all signal interconnects and data lines transfer analog signals, without a need for analog-to-digital or digital-to-analog conversion. In this way, the analog signal transfer and processing can be extended to 3D multi-layer chips or multi-chip assemblies.
The method includes obtaining (3206) a plurality of neural network topologies (e.g., the topologies 224), each neural network topology corresponding to a respective neural network (e.g., a neural network 220).
The method also includes transforming (3208) each neural network topology (e.g., using the neural network transformation module 226) to a respective equivalent analog network of analog components.
Referring next to
Referring back to
Referring next to
Referring next to
Referring next to
Example Methods for Optimizing Enemy Efficiency of Neuromorphic Analog Integrated Circuits
The method includes obtaining (3306) an integrated circuit (e.g., the ICs 262) implementing an analog network (e.g., the transformed analog neural network 228) of analog components including a plurality of operational amplifiers and a plurality of resistors. The analog network represents a trained neural network (e.g., the neural networks 220), each operational amplifier represents a respective analog neuron, and each resistor represents a respective connection between a respective first analog neuron and a respective second analog neuron.
The method also includes generating (3308) inferences (e.g., using the inferencing module 266) using the integrated circuit for a plurality of test inputs, including simultaneously transferring signals from one layer to a subsequent layer of the analog network. In some implementations, the analog network has layered structure, with the signals simultaneously coming from previous layer to the next one. During inference process, the signals propagate through the circuit layer by layer; simulation at device level; time delays every minute.
The method also includes, while generating inferences using the integrated circuit, determining (3310) if a level of signal output of the plurality of operational amplifiers is equilibrated (e.g., using the signal monitoring module 268). Operational amplifiers go through a transient period (e.g., a period that lasts less than 1 millisecond from transient to plateau signal) after receiving inputs, after which the level of signal is equilibrated and does not change. In accordance with a determination that the level of signal output is equilibrated, the method also includes: (i) determining (3312) an active set of analog neurons of the analog network influencing signal formation for propagation of signals. The active set of neurons need not be part of a layer/layers. In other words, the determination step works regardless of whether the analog network includes layers of neurons; and (ii) turning off power (3314) (e.g., using the power optimization module 270) for one or more analog neurons of the analog network, distinct from the active set of analog neurons, for a predetermined period of time. For example, some implementations switch off power (e.g., using the power optimization module 270) of operational amplifiers which are in layers behind an active layer (to where signal propagated at the moment), and which do not influence the signal formation on the active layer. This can be calculated based on RC delays of signal propagation through the IC. So all the layers behind the operational one (or the active layer) are switched off to save power. So the propagation of signals through the chip is like surfing—the wave of signal formation propagate through chip, and all layers which are not influencing signal formation are switched off. In some implementations, for layer-by-layer networks, signal propagates layer to layer, and the method further includes decreasing power consumption before a layer corresponding to the active set of neurons because there is no need for amplification before the layer.
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Referring next to
Some implementations include means for delaying and/or controlling signal propagation from layer to layer of the resulting hardware-implemented neural network.
An example transformation of MobileNet v.1 into an equivalent analog network is described herein, according to some implementations. In some implementations, single analog neurons are generated, then converted into SPICE schematics with a transformation of weights from MobileNet into resistor values. MobileNet v1 architecture is depicted in the Table shown in
In some implementations, the resulting transformed network included 30 layers including an input layer, approximately 104,000 analog neurons, and approximately 11 million connections. After transformation, the average output absolute error (calculated over 100 random samples) of transformed network versus MobileNet v.1 was 4.9e-8.
As every convolutional and other layers of MobileNet have an activation function ReLU6, the output signal on each layer of the transformed network is also limited by the value 6. As part of the transformation, the weights are brought into accordance with a resistor nominal set. Under each nominal set, different weight values are possible. Some implementations use resistor nominal sets e24, e48 and e96, within the range of [0.1-1] Mega Ohm. Given that the weight ranges for each layer vary, and for most layers weight values do not exceed 1-2, in order to achieve more weight accuracy, some implementations decrease R− and R+ values. In some implementations, the R− and R+ values are chosen separately for each layer from the set [0.05, 0.1, 0.2, 0.5, 1] Mega Ohm. In some implementations, for each layer, a value which delivers most weight accuracy is chosen. Then all the weights (including bias) in the transformed network are ‘quantized’, i.e., set to the closest value which can be achieved with used resistors. In some implementations, this reduced transformed network accuracy versus original MobileNet according to the Table shown below. The Table shows mean square error of transformed network, when using different resistor sets, according to some implementations.
Some implementations provide a method for fabricating a neuromorphic Integrated Circuit for voice clarification, using techniques described above. Various types of trained neural networks can be used for this purpose. For example, a neural network can be trained to identify only one voice, suppressing and removing everything else. In particular, the neural network can identify the voice that is the closest to the microphone. As another example, a neural network can be trained to identify several voices, suppressing and removing everything else. Voices can be identified and preserved regardless of their distance from the microphone(s). Alternatively, voices can be prioritized by their distances from the microphone(s) and given different weights in the output signal, based on their respective distances form the microphone. As another alternative, voices can be identified and preserved regardless of their relative strength (e.g., volume). As yet another alternative, voices can be prioritized by their relative strength and be given different weights in the output signal, based on their respective relative strength. A neural network can process the signal that is originating from the microphone(s). Such a signal may include analog and/or digital signals. A neural network can process an analog and/or a digital signal that is transmitted over a transmission media and received by the neural network. Such a signal can be transmitted across wireless or digital/internet networks for the purposes of phone communication. Such a signal can also be input after pre- and post-processing of the original voice(s), either before the signal is ready to be transmitted, or after the signal has been transmitted and delivered to the recipient. As another example, a neural network can process a signal that is a mix of several voice signals, with associated noises. In particular, such a mix can be delivered to the recipient from several different sources. Such a signal can be pre- and post-processed by different methods for different components. As yet another example, a neural network can process the signal that is a mix of several external voice signals, with associated noises, combined with the own voice(s) on the recipient side. In particular, such a mix can be delivered to the recipient from several different sources, including the recipient's own voice overlapped with recipient's own noises. Such a signal can be pre- and/or post-processed by different methods for different components. The clarification of voice(s) can be performed for the combined signal. As another example, a neural network can process a signal that includes voice(s) from the recipient side. In particular, such a signal can be processed before it is transmitted to the other party. Such a signal can be processed by the neural network before it is pre- and/or post-processed by different methods prior to transmission.
Example Methods for Extracting Voice from Inbound or Outbound Analog Noisy Signal
Described herein are example techniques for the extraction of voice from a noisy signal, both inbound and outbound, where noise can be either stationary or non-stationary, using a neuromorphic analog Integrated Circuit. Such a circuit implements a noise suppression neural network at the hardware level. The circuit design of the analog neuromorphic Integrated Circuit is realized by converting (using techniques described above) a noise suppression (or voice extraction) neural network.
As described in the Background and Summary sections, the task of extracting the voice from noisy signal is of great importance for communication in smartphones, smartwatches, notebooks, or other voice transmitting devices. There are conventional realizations of noise cancellation or active noise suppression using dual microphone scheme, where the signal from one microphone is used to cancel noise at a main microphone. But these solutions do not cancel all noises, especially non-stationary ones, since not all noise is canceled in such combination of two microphones. There are also filters, which can filter out stationary noise from inbound or outbound analog signal. There are also software realizations of neural networks, which extract voice from a noisy signal by converting some part of the signal using Fourier transformation, thereby reducing components that are not similar to voice. These products are realized as software applications, which can be installed on smartphones or notebook computers, and can effectively suppress noise coming from a microphone. However, such applications require high computational power and consequently lead to higher power consumption. Also, such applications require powerful processors, which cannot be installed in earbuds or other miniature devices.
Described herein are techniques for voice extraction using a specially designed Integrated Circuit, realized from a trained neural network. The Integrated Circuit is realized as a hardware solution and is represented by a set of operational amplifiers and resistors, connected in such a way that the resulting neuromorphic hardware chip operates similarly to the initial neural network (e.g., the neural network realized in software), with the absolute error not exceeding a maximum threshold percentage (e.g., 1% absolute) from the error corresponding to the software neural network. The schematics of the Integration Circuit are obtained using techniques described above, thus ensuring full equivalency of analog neuromorphic hardware realization of the neural network and its initial software neural network model. The analog Integrated Circuit may be used for voice extraction from noisy analog inbound or outbound signals, with low latency and low power consumption.
In some implementations, the hardware realization of a voice extraction neural network can be used to process both inbound and outbound noisy signals. In some implementations, the Integrated Circuit has direct analog input and is placed adjacent to a microphone or a speaker of a smartphone, smartwatch, earbuds, notebook computer, or similar device. The Integrated Circuit provides telecommunication voice transfer, extracting voice from noisy analog signals. Such a solution suppresses both stationary and non-stationary noise from inbound or outbound analog signals (e.g., signals from a microphone or signals directed to a speaker or earbuds) and is characterized by excellent noise suppression, unlike conventional methods.
The resulting hardware realization of a voice extraction algorithm is characterized by low power operation, small latency, and small die area, which makes analog hardware realization an advantageous solution for noise reduction in smartphones, earbuds, notebook computers, tablets, or other voice transmitting devices, in comparison with software neural network voice extraction algorithms. The small die area makes it possible to include the Integrated Circuit application in true wireless (TWS) earbuds or other miniature devices. Such analog Integrated Circuits may also be used for two-way voice extraction (noise reduction) in Notebook PCs or Smartphones, where a neuromorphic analog integration circuit is installed both at the analog output of the microphone and at the analog input of the speaker or earbuds.
Some implementations obtain a convolutional neural network with 1D convolutions (e.g., as described in “Single Channel Speech Enhancement Using A Convolutional Neural Network,” by T. Kounovsky and J. Malek, 2017), an example of which is shown in
The network architecture shown in
Some implementations include a supervisor switch (sometimes referred to as a digital switch, an artificial intelligence (AI) sound supervisor, or an AI sound supervisor switch) for sound processing, multiplexing, and/or controlling, one or more analog neural cores that provide voice, music, and/or acoustic event detection, voice extraction from a noisy stream, voice command recognition, and/or speaker identification. In some implementations, a neurovoice processor includes one or more neuromorphic analog signal processors.
Described above are example techniques for voice extraction using a specially designed Integrated Circuit, realized from a trained neural network. The Integrated Circuit is realized as a hardware solution and is represented by a set of operational amplifiers and resistors, connected in such a way that the resulting neuromorphic hardware chip operates similarly to the initial neural network (e.g., the neural network realized in software), with the absolute error not exceeding a maximum threshold percentage (e.g., 1% absolute) from the error corresponding to the software neural network. The schematics of the Integrated Circuit are obtained using example techniques described above, thus ensuring full equivalency of analog neuromorphic hardware realization of the neural network and its initial software neural network model. The analog Integrated Circuit may be used for voice extraction from noisy analog inbound or outbound signals, with low latency and low power consumption.
In some implementations, the hardware realization of a voice extraction neural network is used to process both inbound and outbound noisy signals. In some implementations, the Integrated Circuit has direct analog input and is placed adjacent to a microphone or a speaker of a smartphone, smartwatch, earbuds, notebook computer, or similar device. The Integrated Circuit provides telecommunication voice transfer, extracting voice from noisy analog signals. Such a solution suppresses both stationary and non-stationary noise from inbound or outbound analog signals (e.g., signals from a microphone or signals directed to a speaker or earbuds) and is characterized by excellent noise suppression, unlike conventional methods. Figures described above provide example networks and transformation techniques for generating and/or fabricating circuits and devices using analog components, based on trained neural networks, according to some implementations. In particular,
Some implementations include a music voice acoustic event detector, which is an AI-based low-power hardware programmed neuromorphic module, which analyzes a sound stream and provides a digital logical signal as output, to indicate if voice, music, or an acoustic event are present in the sound stream. Some implementations include a voice extractor, which is another AI-based neuromorphic low power module configured to enhance voice in an audio stream, and/or suppress noises. Some implementations include a keyword spotting module, which is an AI-based neuromorphic low power module that recognizes several spoken words in speech. Some implementations include a wake word detection module, which is an AI-based neuromorphic low power module that recognizes one or more wake words. Some implementations include a speaker identification module, which is an analog neuromorphic low power module that authenticates a speaker.
A sound supervisor (sometimes referred to as a supervisor switch or an AI sound supervisor) orchestrates voice processing blocks of a system to provide different features as output. There are several examples where the systems, methods, and techniques described herein may be advantageous. Consider a hearing aid where a user hears sounds enhanced for their hearing configured by certain digital signal processing (DSP) presets. Voices are cleared but background sound is left in suppressed form. When there is no voice, a user needs just environment sounds. Suppose the user needs to listen to music. With conventional systems, the user has to manually change a hearing aid DSP preset to hear music unchanged, and the user has to manually change the system settings to a voice preset after the music. Another problem with true wireless stereo (TWS) that is controlled by voice is that voice appears in the sound stream. The music sound should be suppressed and voice enhanced in order to recognize voice commands. In other cases, noise should be suppressed. Conventional systems rely on micro-controllers or DSPs, but such devices consume a high amount of power and are not suitable for wearable devices.
In some implementations, the AI sound supervisor 3910 controls the operation and timing of the system 3900. Some implementations pass audio to the neuromorphic analog signal processor cores according to different operating modes of the system. For example, the system 3900 operates in several operation modes defining how input signals pass through, how and in what order the signals are passed to the neuromorphic analog signal processor cores, and/or how the signals are conditioned on output. The AI sound supervisor 3910 controls one or more aspects of these operation modes. Some implementations pass input audio with suppressed volume or unchanged input audio, according to the operating mode of the system. In some implementations, a logging module (e.g., a module implemented in the AI sound supervisor 3910) continuously logs audio parameters (e.g., audio volume at different stages, before and after signal conditioning), configuration (e.g., operating mode configuration for the system), and states of activated neural cores (e.g., configuration for a specific core, such as a threshold for false/true decision, a sensitity threshold for silence, a sensitivity for voice processing, and/or sensitivity for word detection). In some implementations, the AI sound supervisor switch 3910 modifies and routes audio steams between neuromorphic analog signal processor cores. In some implementations, the AI sound supervisor switch 3910 downsamples audio to 16 kilohertz (kHz) before processing, stores an input audio stream, stores the voice extractor processed audio stream 3920, and/or selects input or voice extractor processed audio for wake word detection or keyword spotting. In some implementations, the AI sound supervisor 3910 combines input audio and the voice extractor-processed audio stream according to its configuration and signals. In some implementations, the AI sound supervisor 3910 upsamples the audio stream for output.
In some implementations, the AI sound supervisor 3910 is a digital logic unit connected to several neural cores to provide flexible conditional sound processing depending on whether voice, music, or alarm sound, are present. In some implementations, the AI sound supervisor 3910 is based on an AI music/voice/acoustic event detection module (e.g., the MVED 3902), which recognizes voice, music, and/or acoustic events, in the input audio stream. Depending on the task, a voice is recognized and enhanced in the audio stream and/or other sounds are suppressed. In other configurations, music is detected and a device preset is changed. For example, a hearing aid has a preset for voice and music, with different digital signal processor (DSP) configurations for comfortable hearing. With conventional devices, such presets have to be configured by a user. In contrast, devices based on techniques described herein can use the output signal 3930 to select presets for voice or music automatically. In some implementations, the AI sound supervisor 3910 is a digital logic unit, connected to neuromorphic analog signal processor cores, which perform depending on flags generated by an AI music/voice/acoustic event detector. If there is only ambient noise at the input, as determined by the MVED 3902, some implementations translate ambient sounds or suppress them using a voice extractor module (e.g., the VE 3904). Some implementations use speaker identification using the music voice acoustic event detector module (e.g., the MVED 3902), due to which personalization modes can be added to listening devices. In some implementations, the AI sound supervisor 3910 also upsamples or downsamples the sound stream based on the task. Neuromorphic analog signal processors described above typically operate with 16 kHz signals.
In some implementations, the AI music/voice/acoustic event detection module 3902 analyses an audio stream (e.g., with a latency of approximately 4 milliseconds or less) and sets an output flag for the AI sound supervisor 3910 to recognize.
Voice commands have become popular but are still generally unavailable in wearable devices without an Internet connection because of power limitations. There is also a security concern using Internet-connected voice processing devices. The neurovoice processor described herein has a very low power consumption (e.g., 200 micro-watt or less) and works locally, thus providing privacy, according to some implementations.
In some implementations, the AI sound supervisor 3910 transforms a sound stream into a modified stream, where voice, noise and music are enhanced or suppressed depending on the operation mode, which may be user configurable or automatically configured.
Some implementations include a music/voice/acoustic event detection module (e.g., the MVED 3902), which provides output to the sound supervisor 3910 (which is a digital control unit). In some implementations, a music/voice/acoustic event detection module 3902 analyzes a sound stream and provides a flag to indicate music, voice, or an acoustic event, with latency of 4 milliseconds or less, which allows fast real time control of sound streams.
Some implementations dispatch sounds using the AI sound supervisor 3910, in real-time, due to low latency of the MVED module 3902. For example, the MVED 3902 may take between 16 to 200 milliseconds to detect voice, music, or an acoustic event. In some implementations, supervised analog neuromorphic blocks for sound autonomous processing consume ultra-low power (e.g., 200 microwatts or less) when compared to digital processors.
In some implementations, a hearing aid device is provided. The device includes separate DSP presets for voice, music, and no-voice conditions. Voice presets can be used to increase voice frequencies in an audio spectrogram. For certain hearing disorders, specific frequencies need to be increased. Music presets typically do not include an increase in frequencies. Conventional devices cannot distinguish between these conditions, so a user must switch between them manually. In some implementations, a hearing aid sound supervisor is responsible for voice extraction in the sound stream based on the MVED module 3902. In some implementations, if there is a voice, the noise is suppressed by means of the voice extraction module 3904 and a hearing aid preset is activated. If there is no voice in the input stream, ambient sound is transmitted as the output signal. If there is music, a hearing aid preset is activated. Some implementations operate using wake word detection (e.g., the WWD 3906) and keyword spotting modules (e.g., the KWS 3908). Some implementations use neuromorphic analog signal processor-based voice and music activity detection in a hearing aid device, which can automatically change DSP presets. Voice extraction is activated optionally to enhance voice quality if noise is detected, according to some implementations. In some implementations, neuro-voice continuously monitors and logs sound parameters, its configuration and state, to enable future device personalization for the hearing specifics of the current user. Some implementations provide speaker identification to ensure that only the device owner can control the device.
There are true wireless stereo (TWS) models which are controlled by voice commands. Such models can be processed using the apparatus described herein. Moreover, some implementations provide a switch between music and voice. If a voice is recognized, voice extraction is switched on if there is a voice at the input of the TWS earphones. In some implementations, the AI-based music voice acoustic event detection module detects voice in the input stream, provides the signal to the AI sound supervisor 3910, which suppresses the music and amplifies voice to make commands clear for a keyword spotting module (e.g., the KWS 3908) or a wake-word detection module (e.g., the WWD 3906) or it switches on the voice extraction module to suppress noise in the input voice stream. The MVED module provides speaker identification to ensure that only the device owner can control the device.
Some implementations provide passive noise cancellation and active noise cancellation (ANC) or environmental noise cancellation (ENC) features that isolate the user from all environmental sounds. A user can miss important voice information in this mode. A sound supervisor core can detect voice using an ANC/ENC microphone, to trigger a watchdog signal and instruct a subsystem to switch off ANC/ENC and mute music playback. Some implementations perform voice extraction for voice clarification (examples of which are described above).
According to some implementations, a hardware apparatus 3900 is provided. The hardware apparatus 3900 includes a digital switch 3910 coupled to a plurality of analog neuromorphic cores (e.g., cores corresponding to the MVED 3902, the VE 3904, the WWD 3906, and/or the KWS 3908). The digital switch 3910 is configured to obtain one or more sound streams 3912 from one or more sound sources. The digital switch 3910 is also configured to transmit data based on the one or more sound streams to the plurality of analog neuromorphic cores. The digital switch 3910 is also configured to receive output from the plurality of analog neuromorphic cores. The digital switch 3910 is also configured to output one or more modified sound streams based on the output received from the plurality of analog neuromorphic cores. Each analog neuromorphic core comprises a respective analog network of analog components and is configured to (i) receive respective input data from the digital switch 3910, (ii) perform a respective voice-related function, and (iii) transmit a respective output to the digital switch 3910, for the one or more sound streams.
In some implementations, the digital switch 3910 is further configured to switch on or off at least one of the plurality of analog neuromorphic cores.
In some implementations, the analog components include a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron, and each resistor represents a connection between two analog neurons.
In some implementations, the plurality of analog neuromorphic cores includes (i) a first core (e.g., a core corresponding to the MVED 3902) configured to detect music, voice and acoustic events, in the one or more sound streams, and (ii) a second core (e.g., a core corresponding to the VE 3904) configured to extract and/or enhance voice in the one or more sound streams.
In some implementations, the digital switch 3910 is further configured to: initially transmit the data based on the one or more sound streams to the first core; and in response to receiving, from the first core, a signal (e.g., the output 3916) detecting a voice, subsequently transmit data based on the signal to the second core, and, in response, receive an enhanced voice signal (e.g., the output 3920) from the second core. In some implementations, the digital switch 3910 is further configured to inter-operate with a neuromorphic analog core configured to perform voice extraction, a neuromorphic analog core configured to perform voice activation detection, a neuromorphic analog core configured to perform wake-word detection, a neuromorphic analog core configured to perform keyword spotting, in any order, and/or any combinations thereof.
In some implementations, the plurality of analog neuromorphic cores further includes (i) a third core (e.g., a core corresponding to the KWS 3908) configured to spot keywords in the one or more sound streams, and (ii) a fourth core (e.g., a core corresponding to the WWD 3906) configured to detect wake words in the one or more sound streams.
In some implementations, the digital switch 3910 is further configured to: transform the one or more sound streams to normalize volume (e.g., automatic gain control), to obtain a sound stream suitable for processing in an analog neuromorphic core; and transmit the sound stream to the analog neuromorphic core. For example, input audio may have a low volume that is not suitable for voice extraction. To improve voice extraction quality, input audio volume is increased and passed to the voice extraction module.
In some implementations, the digital switch 3910 is further configured to log information related to audio parameters, configuration, and one or more states of activated neural cores amongst the plurality of analog neuromorphic cores.
In some implementations, the digital switch 3910 is further configured to: down-sample the one or more streams to 16 KHz, to obtain down-sampled data; transmit the down-sampled data to the plurality of analog neuromorphic cores; and up-sample audio stream output from the plurality of analog neuromorphic cores, for output (e.g., the output 3932).
In some implementations, the plurality of analog neuromorphic cores includes (i) a first core configured to detect music, voice, and acoustic events, in the one or more sound streams, and (ii) a second core configured to extract and/or enhance voice in the one or more sound streams. The digital switch is further configured to: initially transmit the data based on the one or more sound streams to the first core; in response to receiving, from the first core, a signal detecting a voice, subsequently transmit data based on the signal to the second core, and, in response, receive an enhanced voice signal from the second core with noises suppressed; and in response to receiving, from the first core, a signal detecting no voice, subsequently output ambient noise in the one or more sound streams.
In some implementations, the plurality of analog neuromorphic cores includes a first core configured to detect a voice, music, or no voice. The system includes different operating modes, for voice, music, and no voice conditions. The digital switch 3910 is further configured to: initially transmit the data based on the one or more sound streams to the first core; and in response to receiving, from the first core, a signal detecting a voice, music, or no voice, select a different operating mode based on the signal.
In some implementations, the plurality of analog neuromorphic cores includes a first core configured to detect the voice of a user. The digital switch 3910 is further configured to: initially transmit the data based on the one or more sound streams to the first core; and in response to receiving, from the first core, a signal detecting the voice of the user, only then activate another core of the plurality of analog neuromorphic cores.
In some implementations, the plurality of analog neuromorphic cores includes (i) a first core configured to detect voice, (ii) a second core configured to enhance voice signals, and (iii) a third core configured to either spot keywords or detect wake words. In some implementations, the plurality of analog neuromorphic cores includes separate cores for spotting keywords and wake word detection, and the core for keyword spotting reacts to an output of the core for wake word detection. The digital switch 3910 is further configured to: initially transmit the data based on the one or more sound streams to the first core; in response to receiving, from the first core, a signal detecting a voice, subsequently transmit data based on the signal to a second core, and, in response, receive an enhanced voice signal from the second core with music suppressed and voice amplified; and in response to receiving, from the second core, the enhanced voice signal, transmit the enhanced voice signal to the third core for either spotting keywords or detecting wake words.
In some implementations, the plurality of analog neuromorphic cores includes one or more cores including: (i) a first core that implements a trained neural network (e.g., depth wise separable convolutional neural network (DS-CNN)) trained to spot keywords, (ii) a second core that implements a trained neural network (e.g., a recurrent neural network (RNN)) trained to detect wake words, (iii) a third core that implements a trained neural network (e.g., a recurrent neural network (RNN)) trained for voice activity detection, and/or (iv) a fourth core that implements a trained neural network (e.g., recurrent neural network) to extract voice from noisy sound streams.
In another aspect, a method is provided for sound signal processing. The method includes, at the digital switch 3910 coupled to a plurality of analog neuromorphic cores (e.g., cores corresponding to the MVED 3902, the VE 3904, the WWD 3906, and/or the KWS 3908): obtaining one or more sound streams from one or more sound sources; transmitting data based on the one or more sound streams to the plurality of analog neuromorphic cores; receiving output from the plurality of analog neuromorphic cores; and outputting one or more modified sound streams based on the output received from the plurality of analog neuromorphic cores. Each analog neuromorphic core includes a respective analog network of analog components. Each analog neuromorphic core receives input data from the digital switch; performs a respective voice-related function; and transmits a respective output to the digital switch, for the one or more sound streams.
In some implementations, the method further includes, at the digital switch 3910, switching on or off at least one of the plurality of analog neuromorphic cores.
In some implementations, the analog components include a plurality of operational amplifiers and a plurality of resistors, where each operational amplifier represents an analog neuron, and each resistor represents a connection between two analog neurons.
In some implementations, the method includes, at a core of the plurality of analog neuromorphic cores, detecting music, voice, and acoustic events, in the one or more sound streams.
In some implementations, the method further includes, at a core of the plurality of analog neuromorphic cores, extracting and/or enhancing voice in the one or more sound streams.
In some implementations, the order of operation/control (e.g., voice activation detection followed by voice enhancement) changes adaptively. For example, initially voice activation detection is performed followed by voice enhancement. Subsequently, voice enhancement is performed first then voice activation detection is performed. Following that, the system returns to original order of processing, and so on, and the digital switch changes control based on the application and/or the environment in which the hardware apparatus is operating and/or user preferences.
In some implementations, the method further includes, at the digital switch 3910: initially transmitting the data based on the one or more sound streams to a first core for detecting music, voice and acoustic events, in the one or more sound streams; and in response to receiving, from the first core, a signal detecting a voice, subsequently transmitting data based on the signal to a second core, and, in response, receive an enhanced voice signal from the second core.
In some implementations, the method further includes, at the digital switch 3910: initially transmitting the data based on the one or more sound streams to a first core for enhancing voice signals, in the one or more sound streams; and in response to receiving, from the first core, an enhanced voice signal, subsequently transmitting data based on the signal to a second core, and, in response, receive a signal from the second core for detecting music, voice and acoustic events.
In some implementations, the method further includes, at a core of the plurality of analog neuromorphic cores: detecting wake words in the one or more sound streams.
In some implementations, the method further includes, at a core of the plurality of analog neuromorphic cores, spotting keywords in the one or more sound streams. Typically, detecting wake words is followed by keyword spotting.
In some implementations, the digital switch 3910 transmits the sound stream directly to a wake word detection and/or keyword spotting neural network without voice activation detection and/or voice extraction.
The present application discloses subject-matter in correspondence with the following numbered clauses:
(A1) A hardware apparatus comprising: a digital switch coupled to a plurality of analog neuromorphic cores, the digital switch configured to: obtain one or more sound streams from one or more sound sources; transmit data based on the one or more sound streams to the plurality of analog neuromorphic cores; receive output from the plurality of analog neuromorphic cores; and output one or more modified sound streams based on the output received from the plurality of analog neuromorphic cores; and the plurality of analog neuromorphic cores, each analog neuromorphic core comprising a respective analog network of analog components and configured to (i) receive a respective input data from the digital switch, (ii) perform a respective voice-related function, and (iii) transmit a respective output to the digital switch, for the one or more sound streams.
(A2) The hardware apparatus as recited in clause (A1), wherein the digital switch is further configured to switch on or off at least one of the plurality of analog neuromorphic cores.
(A3) The hardware apparatus as recited in any of clauses (A1)-(A2), wherein the analog components include a plurality of operational amplifiers and a plurality of resistors, wherein each operational amplifier represents an analog neuron, and each resistor represents a connection between two analog neurons.
(A4) The hardware apparatus as recited in any of clauses (A1)-(A3), wherein the plurality of analog neuromorphic cores includes (i) a first core configured to detect music, voice and acoustic events, in the one or more sound streams, and (ii) a second core configured to extract and/or enhance voice in the one or more sound streams.
(A5) The hardware apparatus as recited in clause (A4), wherein the digital switch is further configured to: initially transmit the data based on the one or more sound streams to the first core; and in response to receiving, from the first core, a signal detecting a voice, subsequently transmit data based on the signal to the second core, and, in response, receive an enhanced voice signal from the second core.
(A6) The hardware apparatus as recited in clause (A4), wherein the plurality of analog neuromorphic cores further includes (i) a third core configured to spot keywords in the one or more sound streams, and (ii) a fourth core configured to detect wake words in the one or more sound streams.
(A7) The hardware apparatus as recited in any of clauses (A1)-(A6), wherein the digital switch is further configured to: transform the one or more sound streams to normalize volume, to obtain a sound stream suitable for processing in an analog neuromorphic core; and transmit the sound stream of data to the analog neuromorphic core.
(A8) The hardware apparatus as recited in any of clauses (A1)-(A7), wherein the digital switch is further configured to: log information related to audio parameters, configuration, and one or more states of activated neural cores amongst the plurality of analog neuromorphic cores.
(A9) The hardware apparatus as recited in any of clauses (A1)-(A8), wherein the digital switch is further configured to: down-sample the one or more streams to 16 KHz, to obtain down-sampled data; transmit the down-sampled data to the plurality of analog neuromorphic cores; and up-sample audio stream output from the plurality of analog neuromorphic cores, for output.
(A10) The hardware apparatus as recited in any of clauses (A1)-(A9), wherein the plurality of analog neuromorphic cores includes (i) a first core configured to detect music, voice and acoustic events, in the one or more sound streams, and (ii) a second core configured to extract and/or enhance voice in the one or more sound streams, wherein the digital switch is further configured to: initially transmit the data based on the one or more sound streams to the first core; in response to receiving, from the first core, a signal detecting a voice, subsequently transmit data based on the signal to the second core, and, in response, receive an enhanced voice signal from the second core with noises suppressed; and in response to receiving, from the first core, a signal detecting no voice, subsequently output ambient noise in the one or more sound streams.
(A11) The hardware apparatus as recited in any of clauses (A1)-(A10), wherein the plurality of analog neuromorphic cores includes a first core configured to detect a voice, music, or no voice, wherein the digital switch is further configured to: initially transmit the data based on the one or more sound streams to the first core; and in response to receiving, from the first core, a signal detecting a voice, music, or no voice, select a different operating mode based on the signal.
(A12) The hardware apparatus as recited in any of clauses (A1)-(A11), wherein the plurality of analog neuromorphic cores includes a first core configured to detect a voice of a user, wherein the digital switch is further configured to: initially transmit the data based on the one or more sound streams to the first core; and in response to receiving, from the first core, a signal detecting the voice of the user, only then activate another core of the plurality of analog neuromorphic cores.
(A13) The hardware apparatus as recited in any of clauses (A1)-(A12), wherein the plurality of analog neuromorphic cores includes (i) a first core configured to detect a voice, (ii) a second core configured to enhance voice signals, and (iii) a third core configured to either spot keywords or detect wake words, wherein the digital switch is further configured to: initially transmit the data based on the one or more sound streams to the first core; in response to receiving, from the first core, a signal detecting a voice, subsequently transmit data based on the signal to a second core, and, in response, receive an enhanced voice signal from the second core with music suppressed and voice amplified; and in response to receiving, from the second core, the enhanced voice signal, transmit the enhanced voice signal to the third core for either spotting keywords or detecting wake words.
(A14) The hardware apparatus as recited in any of clauses (A1)-(A13), wherein the plurality of analog neuromorphic cores includes one or more cores selected from the group consisting of: (i) a first core that implements a trained neural network trained to spot keywords, (ii) a second core that implements a trained neural network trained to detect wake words, (iii) a third core that implements a trained neural network trained for voice activity detection, and (iv) a fourth core that implements a trained neural network to extract voice from noisy sound streams.
(A15) The hardware apparatus as recited in clause (A14), wherein (i) the first core implements a trained depth wise separable convolutional neural network (DS-CNN) trained to spot keywords, (ii) the second core implements a trained recurrent neural network (RNN) trained to detect wake words, (iii) the third core implements a trained recurrent neural network (RNN) trained for voice activity detection, and (iv) the fourth core implements a trained recurrent neural network trained to extract voice from noisy sound stream.
(B1) A method comprising: at a digital switch coupled to a plurality of analog neuromorphic cores: obtaining one or more sound streams from one or more sound sources; transmitting data based on the one or more sound streams to the plurality of analog neuromorphic cores; receiving output from the plurality of analog neuromorphic cores; and outputting one or more modified sound streams based on the output received from the plurality of analog neuromorphic cores; and at each of the plurality of analog neuromorphic cores, each analog neuromorphic core comprising a respective analog network of analog components: receiving a respective input data from the digital switch; performing a respective voice-related function; and transmitting a respective output to the digital switch, for the one or more sound streams.
(B2) The method as recited in clause (B1), further comprising: at the digital switch: switching on or off at least one of the plurality of analog neuromorphic cores.
(B3) The method as recited in any of clauses (B1)-(B2), wherein the analog components includes a plurality of operational amplifiers and a plurality of resistors, wherein each operational amplifier represents an analog neuron, and each resistor represents a connection between two analog neurons.
(B4) The method as recited in any of clauses (B1)-(B3), further comprising: at a core of the plurality of analog neuromorphic cores: detecting music, voice and acoustic events, in the one or more sound streams.
(B5) The method as recited in any of clauses (B1)-(B4), further comprising: at a core of the plurality of analog neuromorphic cores: extracting and/or enhancing voice in the one or more sound streams.
(B6) The method as recited in any of clauses (B1)-(B5), further comprising: at the digital switch: initially transmitting the data based on the one or more sound streams to a first core for detecting music, voice and acoustic events, in the one or more sound streams; and in response to receiving, from the first core, a signal detecting a voice, subsequently transmitting data based on the signal to a second core, and, in response, receive an enhanced voice signal from the second core.
(B7) The method as recited in any of clauses (B1)-(B6), further comprising: at the digital switch: initially transmitting the data based on the one or more sound streams to a first core for enhancing voice signals, in the one or more sound streams; and in response to receiving, from the first core, an enhanced voice signal, subsequently transmitting data based on the signal to a second core, and, in response, receive a signal from the second core for detecting music, voice and acoustic events.
(B8) The method as recited in any of clauses (B1)-(B7), further comprising: at a core of the plurality of analog neuromorphic cores: detecting wake words in the one or more sound streams.
(B9) The method as recited in any of clauses (B1)-(B8), further comprising: at a core of the plurality of analog neuromorphic cores: spotting keywords in the one or more sound streams.
The terminology used in the description of the invention herein is for the purpose of describing particular implementations only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various implementations with various modifications as are suited to the particular use contemplated.
This application is a continuation-in-part of and claims priority to U.S. application Ser. No. 17/196,960, filed Mar. 9, 2021, entitled “Analog Hardware Realization of Trained Neural Networks for Voice Clarity,” which is a continuation-in-part of and claims priority to U.S. application Ser. No. 17/189,109, filed Mar. 1, 2021, entitled “Analog Hardware Realization of Neural Networks,” which is a continuation of PCT Application PCT/RU2020/000306, filed Jun. 25, 2020, entitled “Analog Hardware Realization of Neural Networks,” each of which is incorporated by reference herein in its entirety. U.S. application Ser. No. 17/189,109 is also a continuation-in-part of PCT Application PCT/EP2020/067800, filed Jun. 25, 2020, entitled “Analog Hardware Realization of Neural Networks,” which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/RU2020/000306 | Jun 2020 | US |
Child | 17189109 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17196960 | Mar 2021 | US |
Child | 18093315 | US | |
Parent | 17189109 | Mar 2021 | US |
Child | 17196960 | US | |
Parent | PCT/EP2020/067800 | Jun 2020 | US |
Child | PCT/RU2020/000306 | US |