The present invention relates generally to analog computers. More specifically, it relates to techniques for designing analog computers that implement machine learning computations.
Recently, machine learning has had notable success in performing complex information processing tasks, such as computer vision and machine translation, which were intractable through traditional methods. However, the computing requirements of these applications is increasing exponentially, motivating efforts to develop new, specialized hardware platforms for fast and efficient execution of machine learning models.
Analog computing is one attractive approach to novel machine learning hardware, wherein the computation is performed by naturally evolving a physical system. Analog machine learning hardware platforms could potentially be faster and more energy-efficient than their digital counterparts. However, the realization of analog computer implementation of machine learning has thus far proved elusive because (1) one must identify a physical system capable of performing the necessary computation, and (2) one must be able to train the physical system on a given machine learning task.
The inventors have identified a formal correspondence between the dynamics of wave-based physical systems and the computation in recurrent neural networks (RNNs) and ex-ploited this correspondence to develop techniques for the design of analog computing platforms that implement RNNs. Using a simulation of a physical wave system, physical parameters of the system are trained to learn complex features in temporal data, using training techniques for neural networks. The physical system simulation is trained on a machine learning task using inverse design techniques, which optimize the physical characteristics of the system in the context of numerical simulations.
The dynamic evolution of waves in the trained physical system implements an analog computation of an RNN on the temporal data. RNNs are one of the most important machine learning models and have been widely used to perform tasks such as natural language processing and time-series prediction, which involve processing of sequential data.
A wave-based physical system constructed according to the trained design can passively process signals and information in their native domain, without analog-to-digital conversion. Compared to conventional digital-computer implemented RNNs, such an analog computer implemented RNN has an improved processing speed, energy efficiency, and compactness. Furthermore, the approach is general to wave-based physical systems, so that the physical system implementing the RNN may be realized in physical systems supporting optical, acoustic, hydraulic, or geophysical wave propagation.
Applications of these analog computer implemented RNNs can be envisioned as hardware with improved computational performance on machine learning problems involving sequential data. Some examples including: time-series prediction and classification, natural language processing, machine translation, speech recognition, genetic sequence analysis. Generality of the approach leads to applications in wide range of fields, from optics, audio/acoustics, medicine, biology, finance, and speech recognition.
Embodiments of this invention can be deployed as methods, computer algorithms or code, hardware processors executing programmable language, algorithms or code, as well as system incorporating such methods, algorithms, code, processors, or the like.
Embodiments of the invention have advantages over prior approaches to analog computing for machine learning, such as reservoir computing, as these prior approaches do not provide an ability to train the physical system, which is crucial for implementing models, such as RNNs. The approach of this invention uses inverse design techniques during numerical modeling to design the physical system, e.g., its material patterning, which can be realized using 3D printing, photolithography, and other fabrication techniques. Furthermore, this approach provides analog computational implementation of an RNN, which is a specific and complicated model for handling sequential data.
In one aspect, the invention provides a method of designing an analog computer that implements a trained recurrent neural network, the method comprising: simulating a wave-based physical system using a computational simulation, wherein the computational simulation comprises: a wave propagation domain, a boundary layer that approximates a boundary condition, a source of waves, probes for measuring properties of propagated waves, a material within a central region of the wave propagation domain, and a discretized numerical model of a differential equation describing dynamics of wave propagation in the physical system; training the simulation with sequential training data, wherein the training comprises: inputing samples of the training data at the source in batches, computing for each batch measured properties of propagated waves at the probes, evaluating for each batch a loss function between the measured properties of propagated waves at the probes and correct classification, and minimizing the loss function with respect to physical characteristics of the material within a central region of the simulation domain using gradient-based optimization.
The physical characteristics may comprise a material density distribution of the material within a central region of the simulation domain. The simulating may comprise a low-pass spatial filtering applied to a wave speed distribution to implement training regularization. The simulating and training may be implemented using a machine learning computing platform.
The wave-based physical system may be an acoustic, hydraulic, or optical system. The boundary layer may be an absorbing boundary layer and the boundary condition is an open boundary condition. Alternatively, the boundary layer may be a reflecting boundary layer and the boundary condition is a closed boundary condition. The probes for measuring properties of propagated waves may be point probes or spatially extended probes. The measured properties of propagated waves may comprise time-integrated power or field amplitude.
Underlying the techniques of the present invention is an insight into the formal correspondence between the dynamics of wave-based physical systems and the computation in recurrent neural networks (RNNs). This correspondence will now be described in relation to
h
t=σ(h)(W(h)·ht-1+W(x)·xt) (1)
y
t=σ(y)(W(y)·ht), (2)
which are represented diagrammatically in
The operation prescribed by Eq. 1 and Eq. 2, when applied to each element of an input sequence, can be described by the directed graph shown in
We now discuss the formal correspondence between the dynamics in the RNN as described by Eq. 1 and Eq. 2, and the dynamics of a wave-based physical system.
As an illustration, the dynamics of a scalar wave field distribution u(x, y, z) are governed by the second-order partial differential equation,
where
is the Laplacian operator, c=c(x, y, z) is the spatial distribution of the wave speed, and ƒ=ƒ(x, y, z, t) is a source term.
To make the correspondence with the RNN more exact, the continuous physical system is represented in discrete time. A finite-difference discretization of Eq. 3, with a temporal step size of Δt, results in the recurrence relation,
Here, the subscript, t, indicates the value of the scalar field at a fixed time step. The wave system's hidden state is defined as the concatenation of the field distributions at the current and immediately preceding time steps, ht≡[ut, ut-1]T, where ut and ut-1 are vectors given by the flattened fields, ut and ut-1, represented on a discretized grid over the spatial domain. Then, the update of the wave equation may be written as
h
t
=A(ht-1)·ht-1+P(i)−xt (5)
y
t=(P(o)·ht)2, (6)
where xt and yt describe the input signal and output signal, respectively, of the wave equation, where the sparse matrix A describes the update of the wave fields ut and ut-1 without a source, and where P(i) and P(o) are linear operators that describe connections between the hidden state and the input and output of the wave equation. These discretized dynamics are represented diagrammatically in
For sufficiently large field strengths, the dependence of A on ht-1 can be achieved through an intensity-dependent wave speed of the form c=clin+ut2·cnl, where cnl is exhibited in regions of material with a nonlinear response. In practice, this form of nonlinearity is encountered in a wide variety of wave physics, including shallow water waves, nonlinear optical materials via the Kerr effect, and acoustically in bubbly fluids and soft materials. Like the σ(y)(·) activation function in the standard RNN, a nonlinear relationship between the hidden state, ht, and the output, yt, of the wave equation is typical in wave physics when the output corresponds to a wave intensity measurement, as we assume here for Eq. 6.
Like the standard RNN, the connections between the hidden state ht and the input and output xt and yt are also defined by linear operators, given by P(i) and P(o). These matrices define the injection and measuring points within the spatial domain. Unlike the standard RNN, where the input and output matrices are dense, the input and output matrices of the wave equation are sparse because they are non-zero only at the location of injection and measurement points. Moreover, these matrices are unchanged by the training process.
Most importantly, the trainable free parameter of the wave equation is the distribution of the wave speed, c(x, y, z). In practical terms, this corresponds to the physical configuration and layout of materials within the domain that influence wave propagation. Thus, when modeled numerically in discrete time as represented in
Similarly to the RNN, the full time dynamics of the wave equation may be represented as a directed graph of discrete time steps of the continuous physical system, as shown in
Based on the formal correspondence between the dynamics of wave-based physical systems and the computation in recurrent neural networks (RNNs), an analog computer that implements a trained recurrent neural network can be designed as follows.
A wave-based physical system, which for example may be an acoustic, hydraulic, or optical system, is simulated using a computational simulation such as a machine learning computing platform. As illustrated in
This simulation is trained with sequential training data to minimize a loss function with respect to physical characteristics of the material 132 that is distributed within a central region 134 of the simulation domain using gradient-based optimization. The trained physical characteristics of the material may be, for example, a material density distribution of the material. The training is performed by inputing training samples of the training data at the source 124 in batches, computing for each batch measured properties of propagated waves at the probes 126, 128, 130, and evaluating for each batch the loss function between the measured properties of propagated waves at the probes and a correct classification of each sample in the training data.
As a concrete illustrative example, we now describe how an inverse-designed inhomo-geneous medium can perform vowel classification on raw audio signals as their waveforms scatter and propagate through it, achieving performance comparable to a standard digital implementation of a recurrent neural network.
The analog computer is designed by simulating the physical system and training its inho-mogeneous material distribution so that the propagation through the distribution of audio signals input into the system results in distinct classifying signals at the probes depending on the input vowel. The training in this illustrative example uses a training dataset consisting of 930 raw audio recordings of 10 vowel classes from 45 different male speakers and 48 different female speakers. For the learning task, we select a subset of 279 recordings corresponding to three vowel classes contained in the words had, hayed, and heed, respectively.
The procedure for training the vowel recognition system is as follows. First, each vowel waveform is downsampled from its original recording, with a 16 kHz sampling rate, to a sampling rate of 10 kHz. Next, the entire dataset of (3 classes)×(45 males+48 females)=279 vowel samples is divided into 5 groups of approximately equal size.
Cross validated training is performed with 4 out of the 5 sample groups forming a training set and 1 out of the 5 sample groups forming a testing set. Independent training runs are performed with each of the 5 groups serving as the testing set, and the metrics are averaged over all training runs. Each training run is performed for 30 epochs using the Adam optimization algorithm with a learning rate of 0.0004. During each epoch, every sample vowel sequence from the training set is windowed to a length of 1000, taken from the center of the sequence. This limits the computational cost of the training procedure by reducing the length of the time through which gradients must be tracked.
All windowed samples from the training set are run through the simulation in batches of 9 and the categorical cross entropy loss is computed between the output probe probability distribution and the correct one-hot vector for each vowel sample. To encourage the optimizer to produce a binarized distribution of the wave speed with relatively large feature sizes, the optimizer minimizes this loss function with respect to a material density distribution, p(x, y) within a central region of the simulation domain, indicated by the green region in
The frequency content of the three vowel classes after downsampling to 10 kHz is shown in
As shown in
The audio waveform of each vowel, represented by x(i), is injected by the source 208 at a single grid cell on the left side of the domain, emitting waveforms which propagate through a trainable region 210 with a distribution of the wave speed that is optimized during the training process. Three probe points 212 are defined on the right hand side of this region, each assigned to one of the three vowel classes. To determine the system's output, y(i), the time-integrated power at each probe is measured.
Using automatic differentiation, the gradient of the loss function with respect to the density of material in the trainable region 210 is computed. The material density is updated iteratively, using gradient-based stochastic optimization techniques, until convergence. For the illustrative purposes of this numerical demonstration, we consider binarized systems made of two materials: a background material with a normalized wave speed c0=1.0, and a second material with c1=0.5. We assume that the second material has a nonlinear parameter, cnl=−30, while the background material has a linear response. In practice, the wave speeds would be selected to correspond to different materials to be used in the physical realization of the design. For example, in an acoustic setting the material distribution could consist of air, where the sound speed is 331 m/s, and porous silicone rubber, where the sound speed is 150 m/s.
At the beginning of the training, the initial distribution of the wave speed may be selected to correspond to a uniform region of material with a speed which is midway between those of the two materials. This choice of starting structure allows for the optimizer to shift the density of each pixel towards either one of the two materials to produce a binarized structure made of only those two materials. To train the system, we perform back-propagation through the model of the wave equation to compute the gradient of the cross entropy loss function of the measured outputs with respect to the density of material in each pixel of the trainable region. Then, we use this gradient information update the material density using the Adam optimization algorithm, repeating until convergence on a final structure.
Numerical modeling and simulation of the wave equation physics was performed using a custom package written in Python. The software was developed on top of the popular machine learning library, pytorch, to compute the gradients of the loss function with respect to the material distribution via reverse-mode automatic differentiation. In the context of inverse design in the fields of physics and engineering, this method of gradient computation is commonly referred to as the adjoint variable method and has a computational cost of performing one additional simulation. We note that related approaches to numerical modeling using machine learning frameworks have been proposed previously for full-wave inversion of seismic datasets. The code for performing numerical simulations and training of the wave equation, as well as generating the figures presented in this description, may be found online at http://www.github.com/fancompute/wavetorch/.
We now discuss vowel recognition training results in relation to
The techniques presented here have a number of favorable qualities that make it a promising candidate for designing analog computers for processing temporally-encoded information. Unlike the standard RNN, the update of the wave equation from one time step to the next enforces a nearest-neighbor coupling between elements of the hidden state through the Laplacian operator, which is represented by the sparse matrix in
We have shown that the dynamics of the wave equation are conceptually equivalent to those of a recurrent neural network. This conceptual connection opens up the opportunity for a new class of analog hardware platform, in which evolving time dynamics play a significant role in both the physics and the dataset. While we have focused on a the most general example of wave dynamics, characterized by a scalar wave equation, our results can be readily extended to other wave-like physics. Such an approach of using physics to perform computation is envisioned to provide a new platform for analog machine learning devices that can perform computation far more naturally and efficiently than their digital counterparts. The generality of the approach implies that many physical systems can be used for performing RNN-like computations on dynamic signals, such as those in optics, acoustics, or seismics.
Those skilled in the art will recognize in light of the present description of the invention and examples give that there are many possible variations. For example, the inventors envision that with minor modifications to the example discussed above closed boundary conditions may be used instead of open boundary conditions. From a simulation and training perspective, the change would simply require removing the absorbing layer, which can be done by modifying the loss coefficient for the wave propagation outside of the central design region. From a physical perspective, using a reflective/closed boundary condition would mean that the injected signal bounces around the system far more readily. From some point of view, this might help the training process because the system can have greater ‘memory’ of input signals from earlier time steps. From another perspective, this could hurt training because much of this signal may be irrelevant to the training task. In some sense, we believe that the choice of boundary condition or presence of loss, more generally, is an engineering problem that can be explored in future studies and applications, but there are arguments for both approaches, or a hybrid approach.
The inventors also envision that with minor modifications the model output probes may be extended probe regions measuring various properties of the waves. In the example discussed above, the output of the model was a vector of length 3 where each element was related to the probability of this audio signal being from one of three vowels. One can instead use many other more complicated models. For example, we could consider a model where the output is, instead, a 2 dimensional image, where the wave power at each point in the device is related to the brightness of the image as a function of x and y. This would be one example of a spatially extended probe region.
Furthermore, while we chose to integrate our signal power over time (giving a single number for each probe output), we could rather use the time-dependent power measurement (P(t) at each probe) as our output. For example, we could input a time signal I(t) into our analog processor and measure the power over time at a receiver P(t), which would be some kind of nonlinear filter I(t)→P(t). As a concrete application, we could input audio from a male voice as I(t) and have the model output a female-sounding voice as P(t).
This application claims priority from U.S. Provisional Patent Application 62/836,328 filed Apr. 19, 2019, which is incorporated herein by reference.
This invention was made with Government support under contract FA9550-17-1-0002 awarded by the United States Air Force, and under contract N00014-17-1-3030 awarded by the Department of Defense. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62836328 | Apr 2019 | US |