The present techniques relate generally to the field of multiwavelet-based operators for differential equations.
Many natural and human-built systems (e.g., aerospace, complex fluids, neuro-glia information processing) exhibit complex dynamics characterized by partial differential equations (PDEs). For example, the design of wings and airplanes that are robust to turbulence can utilize learning of complex PDEs. Along the same lines, complex fluids (gels, emulsions) are multiphasic materials characterized by a macroscopic behavior modeled by non-linear PDEs. Understanding their variations in viscosity as a function of the shear rate is useful for many engineering projects. Moreover, modelling the dynamics of continuous and discrete cyber and physical processes in complex cyber-physical systems can be achieved through PDEs.
Learning PDEs (e.g., mappings between infinite-dimensional spaces of functions), from trajectories of variables, generally utilize machine learning techniques, such as deep neural networks (NNs). Towards this end, a stream of work aims at parameterizing the solution map as deep NNs. One issue, however, is that the NNs are tied to a resolution during training, and therefore, may not generalize well to other resolutions, thus, requiring retraining (and possible modifications of the model) for every set of discretizations. In parallel, another stream of work focuses on constructing the PDE solution function as a NN architecture. This approach, however, is designed to work with one instance of a PDE and, therefore, upon changing the coefficients associated with the PDE, the model has to be re-trained. Additionally, the approach is not a complete data-dependent one, and hence, cannot be made oblivious to the knowledge of the underlying PDE structure. Finally, the closest stream of work to the problem we investigate is represented by the “Neural Operators”. Being a complete data-driven approach, the neural operators method aims at learning the operator map without having knowledge of the underlying PDEs. The neural operators have also demonstrated the capability of discretization-independence. Obtaining the data for learning the operator map could be prohibitively expensive or time consuming (e.g., aircraft performance to different initial conditions). To be able to better solve the problem of learning the PDE operators from scarce and noisy data, we would ideally explore fundamental properties of the operators that have implication in data-efficient representation.
These and other aspects and implementations are discussed in detail below. The foregoing information and the following detailed description include illustrative examples of various aspects and implementations, and provide an overview or framework for understanding the nature and character of the claimed aspects and implementations. The drawings provide illustration and a further understanding of the various aspects and implementations, and are incorporated in and constitute a part of this specification. Aspects can be combined and it will be readily appreciated that features described in the context of one aspect of the invention can be combined with other aspects. Aspects can be implemented in any convenient form. As used in the specification and in the claims, the singular form of ‘a’, ‘an’, and ‘the’ include plural referents unless the context clearly dictates otherwise.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The accompanying drawings are not intended to be drawn to scale. Like reference numbers and designations in the various drawings indicate like elements. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:
We present here some technical preliminaries that are used in the present disclosure. The literature for some of the topics is vast, and we list only the properties that are useful specifically for the techniques described herein.
Our intuition is to transform the problem of learning a PDE to a domain where a compact representation of the operator exists. With a mild assumption regarding the smoothness of the operator's kernel, except finitely many singularities, the multiwavelets, with their vanishing moments property, sparsify the kernel in their projection with respect to (w.r.t.) a measure. Therefore, learning an operator kernel in the multiwavelet domain is feasible and data efficient. The wavelets have a rich history in signal processing, and are popular in audio, image compression. For multiwavelets, the orthogonal polynomial (OP) w.r.t. a measure emerges as a natural basis for the multiwavelet subspace, and an appropriate scale/shift provides a sequence of subspaces which captures the locality at various resolutions. We generalize and exploit the multiwavelets concept to work with arbitrary measures which opens-up new possibilities to design a series of models for the operator learning from complex data streams.
We incorporate the multiwavelet filters derived using a variety of the OP basis into our operator learning model, and show that the proposed architecture outperforms the existing neural operators. Our main contributions are as follows: (i) Based on some fundamental properties of the integral operator's kernel, we develop a multiwavelet-based model which learns the operator map efficiently. (ii) For the 1-D dataset of non-linear Korteweg-de Vries and Burgers equations, we observe an order of magnitude improvement in the relative L2 error (as described herein). (iii) We demonstrate that the proposed model is in validation with the theoretical properties of the pseudo-differential operator (as described herein). (iv) We show how the proposed multiwavelet-based model is robust towards the fluctuation strength of the input signal (as described herein). (v) Next, we demonstrate the applicability on higher dimensions of 2-D Darcy flow equation (as described herein), and finally show that the proposed approach can learn at lower resolutions and generalize to higher resolutions.
We start by defining the problem of operator learning as described herein. We define the multiwavelet transform for the proposed operator learning problem and derive the transformation operations across different scales. Then the proposed operator learning model is outlined. Finally, we list some of the useful properties of the operators which leads to an efficient implementation of multiwavelet-based models.
Given two functions a (x) and u(x) with x∈D, the operator is a map T such that Ta=u. Formally, let A and be two Sobolev spaces H
s,p (s>0, p≥1), then the operator T is such that T: A→
. The Sobolev spaces are useful in the analysis of partial differential equations (PDEs), and we restrict our attention to s>0 and p=2. Note that, for s=0, the
0,p coincides with Lp, and, ƒ∈
0,p does not necessarily have derivatives in Lp. We choose p=2 in order to be able to define projections with respect to (w.r.t.) measures μ in a Hilbert space structure.
We take the operator T as an integral operator with the kernel K:D×D→L2 such that
For the case of inhomogeneous linear PDEs, =ƒ, with ƒ being the forcing function,
is the differential operator, and the associated kernel is commonly termed as Green function. In our case, we do not put the restriction of linearity on the operator. From eq. (1), it is apparent that learning the complete kernel K (.,.) would solve the operator map problem, but it is not necessarily a numerically feasible solution. Indeed, a better approach would be to exploit possible useful properties (as described herein) such that a compact representation of the kernel can be made. For an efficient representation of the operator kernel, we can determine an appropriate subspace (or sequence of subspaces), and projection tools to map to such spaces.
Norm with respect to measures: Projecting a given function onto a fixed basis may utilize a measure dependent distance. For two functions ƒ and g, we take the inner product w.r.t measure μ as ƒ, g
u=∫ƒ(x)g(x)dμ(x), and the associated norm as ∥ƒ∥u=
ƒ, ƒ
μ1/2. We now discuss the next ingredient, which refers to the subspaces that can be used to project the kernel.
In this section, we briefly overview the concept of multiwavelets and extend it to work with non-uniform measures at each scale. The multiwavelet transform synergizes the advantages of orthogonal polynomials (OPs) as well as the wavelets concepts, both of which have a rich history in the signal processing. The properties of wavelet bases like (i) vanishing moments, and (ii) orthogonality can effectively be used to create a system of coordinates in which a wide class of operators (as described herein) have nice representation. Multiwavelets go few steps further, and provide a fine-grained representation using OPs, but also act as basis on a finite interval. For the rest of this section, we restrict our attention to the interval [0, 1]; however, the transformation to any finite interval [a, b] could be straightforwardly obtained by an appropriate shift and scale.
Multi Resolution Analysis: We begin by defining the space of piecewise polynomial functions, for k∈ and
∈
+∪{0} as, Vnk=∪l=02
Similarly, we define the sequence of measures μ0μ1, . . . such that ƒ∈Vnk is measurable w.r.t. μn and the norm of ƒ is taken as ∥ƒ∥=ƒ, ƒ
μ
+∪{0}, such that
For a given OP basis for V0k as ∅0, ∅1, . . . , ∅k−1 w.r.t. measure μ0, a basis of the subsequent spaces Vnk, n>1 can be obtained by shift and scale (hence the name, multi-scale) operations of the original basis as follows:
where, un is obtained as the collections of shift and scale of μ0, accordingly.
Multiwavelets: For the multiwavelet subspace W0k, the orthonormal basis (of piecewise polynomials) are taken as ψ0, ψ1, . . . , ψk−1 such that ψi, ψj
μ0 for i≠j and 1, otherwise. From eq. (3), Vnk⊥Wnk, and since Vnk spans the polynomials of degree at most k, therefore, we conclude that
Similarly to eq. (4), a basis for multiwavelet subspace Wnk are obtained by shift and scale of ψi as
and ψjln are orthonormal w.r.t. measure μn, e.g. ψjln, ψj′l′
μ
Note: Since V1k=V0k⊕W0k from eq. (3), therefore, for a given basis ∅i of V0k w.r.t. measure μ0 and ∅jln as a basis for V1k w.r.t. μ1, a set of basis ψi can be obtained by applying Gram-Schmidt Orthogonalization using appropriate measures. We refer the reader to supplementary materials for the detailed procedure.
Note: Since V0k and W0k lives in V1k, therefore, ∅i, ψi can be written as a linear combination of the basis of V1k. We term these linear coefficients as multiwavelet decomposition filters (H(0), H(1), G(0), G(1)), since they are transforming a fine n=1 to coarse scale n=0. A uniform measure (μ0) version is discussed in, and we extend it to any arbitrary measure by including the correction terms Σ(0) and Σ(1). We refer to supplementary materials for the complete details. The capability of using the non-uniform measures enables us to apply the same approach to any OP basis with finite domain, for example, Chebyshev, Gegenbauer, etc.
For a given ƒ(x), the multiscale, multiwavelet coefficients at the scale n are defined as sln=[ƒ, ∅iln
μ
ƒ, ψiln
μ
k×2
The wavelet (and also multiwavelet) transformation can be straightforwardly extended to multiple dimensions using tensor product of the bases. For our purpose, a function ƒ∈d has multiscale, multiwavelet coefficients sln, dln∈
k× . . . ×k×2
H(0)Σ(0) (and similarly others) are replaced with their d-times Kronecker product.
Non-Standard Form: The multiwavelet representation of the operator kernel K (x, y) can be obtained by an appropriate tensor product of the multiscale and multiwavelet basis. One issue, however, in this approach, is that the basis at various scales are coupled because of the tensor product. To untangle the basis at various scales, we use a trick as proposed in called the non-standard wavelet representation. The extra mathematical price paid for the non-standard representation, actually serves as a ground for reducing the proposed model complexity (as described herein), thus, providing data efficiency. For the operator under consideration T with integral kernel K (x, y), let us denote Tn as the projection of T on Vnk, which is obtained by projecting the kernel K onto basis ∅jln w.r.t. measure μn. If Pn is the projection operator such that Pnƒ=Σj,lƒ, ∅jln
μ
where, Qi=Pi−Pi−1 and L is the coarsest scale under consideration (L≥0). From eq. (3), it is apparent that Qi is the multiwavelet operator. Next, we denote Ai=QiTQi, Bi=QiTpi−1, Ci=Pi−1TQi and
where, (Usln, Udln)/(sln, dln) are the multiscale, multiwavelet coefficients of u/a, respectively, and L is the coarsest scale under consideration. With these mathematical concepts, we now proceed to define our multiwavelet-based operator learning model as described herein.
Based on the discussion as described herein, we propose a multiwavelet-based model (MWT) as shown in
the goal of the MWT model is to map the multiwavelet-transform of the input (slN) to output (UslN) at the finest scale N. The model includes at least two parts: (i) Decomposition (dec), and (ii) Reconstruction (rec). The dec acts as a recurrent network, and at each iteration the input is sn+1. Using (6)-(7), the input is used to obtain multiscale and multiwavelet coefficients at a coarser level sn and dn, respectively. Next, to compute the multiscale/multiwavelet coefficients of the output u, we approximate the non-standard kernel decomposition from (11) using four neural networks (NNs) A, B, C and
At each iteration, the filters in dec module downsamples the input, but compared to popular techniques (e.g., maxpool), the input is only transformed to a coarser multiscale/multiwavelet space. By virtue of its design, since the non-standard wavelet representation does not have inter-scale interactions, it basically allows us to reuse the same kernel NNs A, B, C at different scales. A follow-up advantage of this approach is that the model is resolution independent, since the recurrent structure of dec is input invariant, and for a different input size M, only the number of iterations would possibly change for a maximum of log2 M. The reuse of A, B, C by re-training at various scales also enable us to learn an expressive model with fewer parameters (θA, θb, θC, θ
The dec/rec module uses the filter matrices which are fixed beforehand, therefore, this part may not utilize training processes. The model does not work for any arbitrary choice of fixed matrices H, G. We show as described herein that for randomly selected matrices, the model does not learn, which validates that careful construction of filter matrices may be necessary.
This section outlines definition of the integral kernels that are useful in an efficient compression of the operators through multiwavelets. We then discuss a fundamental property of the pseudo-differential operator.
Definition 1. Calderón-Zygmund Operator. The integral operators that have kernel K(x, y) which is smooth away from the diagonal, and satisfy the following.
The smooth functions with decaying derivatives are gold to the multiwavelet transform. Note that, smoothness implies Taylor series expansion, and the multiwavelet transform with sufficiently large k zeroes out the initial k terms of the expansion due to vanishing moments property (5). This is how multiwavelet sparsifies the kernel (see
The next property, from, points out that with input/output being single-dimensional functions, for any pseudo-differential operator (with smooth coefficients), the singularity at the diagonal is also well-characterized.
Property 1. Smoothness of Pseudo-Differential Operator. For the integral kernel K(x, y) of a pseudo-differential operator, K(x, y)∈C∞∀x≠y, and ƒ or x=y, K(x, y)∈CT−1, where T+1 is the highest derivative order in the given pseudo-differential equation.
The property 1 implies that, for the class of pseudo-differential operator, and any set of basis with the initial J vanishing moments, the projection of kernel onto such bases will have the diagonal dominating the non-diagonal entries, exponentially, if J>T−1[21]. For the case of multiwavelet basis with k OPs, J=k (from eq. (5)). Therefore, k>T−1 sparsifies the kernel projection onto multiwavelets, for a fixed number of bits precision ϵ. We see the implication of the Property 1 on our proposed model as described herein.
In this section, we evaluate the multiwavelet-based model (MWT) on several PDE datasets. We show that the proposed MWT model not only exhibits orders of magnitude higher accuracy when compared against the state-of-the-art (Sota) approaches but also works consistently well under different input conditions without parameter tuning. From a numerical perspective, we take the data as point-wise evaluations of the input and output functions. Specifically, we have the dataset (ai, ui) with ai=a(xi), ui=u(ui) for x1, x2, . . . , xN∈D, where xi are M-point discretization of the domain D. Unless stated otherwise, the training set is of size 1000 while test is of size 200.
Model architectures: Unless otherwise stated, the NNs A, B and (' in the proposed model (
From a mathematical viewpoint, the dec and rec modules in
Benchmark models: We compare our MWT model using two different OP basis (Leg, Chb) with other neural operators. Specifically, we consider the graph neural operator (GNO), the multipole graph neural operator (MGNO), the LNO which makes a low-rank (r) representation of the operator kernel K(x, y) (also similar to unstacked DeepONet), and the Fourier neural operator (FNO). We experiment on three competent datasets setup by the work of FNO (Burgers' equation (1-D), Darcy Flow (2-D), and Navier-Stokes equation (time-varying 2-D)). In addition, we also experiment with Korteweg-de Vries equation (1-D). For the 1-D cases, a modified FNO with careful parameter selection and removal of Batch-normalization layers results in a better performance compared with the original FNO, and we use it in our experiments. The MWT model demonstrates the highest accuracy in all the experiments. The MWT model also shows the ability to learn the function mapping through lower-resolution data, and able to generalize to higher resolutions.
All the models (including ours) are trained for a total of 500 epochs using Adam optimizer with an initial learning rate (LR) of 0.001. The LR decays after every 100 epochs with a factor of y=0.5. The loss function is taken as relative L2 error. All of the experiments are performed on a single Nvidia V100 32 GB GPU, and the results are averaged over a total of 3 seeds.
The Korteweg-de Vries (KdV) equation was first proposed by Boussinesq and rediscovered by Korteweg and de Vries [25]. KdV is a 1-D non-linear PDE that may be used to describe the non-linear shallow water waves. For a given field u(x, t), the dynamics takes the following form:
The task for the neural operator is to learn the mapping of the initial condition u0(x) to the solutions u(x, t=1). We generate the initial condition in Gaussian random fields according to u0˜(0, 74(−Δ+72I)−2.5) with periodic boundary conditions. The equation is numerically solved using chebfun package [29] with a resolution 210, and datasets with lower resolutions are obtained by sub-sampling the highest resolution data set.
Varying resolution: The experimental results of the Kdv equation for different input resolutions s are shown in Table 1. We see that, compared to any of the benchmarks, our proposed MWT Leg exhibits the lowest relative error and is lowest nearly by an order of magnitude. Even in the case of the resolution of 64, the relative error is low, which means that a sparse data set with a coarse resolution of 64 is can be used for the neural operator to learn the function mapping between infinite-dimensional spaces.
Varying fluctuations: We now vary the smoothness of the input function u0(x, 0) by controlling the parameter λ, where low values of λ imply more frequent fluctuations and λ→0 reaches the Brownian motion limit. To isolate the importance of incorporating the multiwavelet transformation, we use the same convolution operation as in FNO, e.g., Fourier transform-based convolution with different modes km (only single-layer) for A, B, C. We see in
We test the ability of the proposed MWT model to capture the theoretical properties of the pseudo-differential operator in this section. Towards that, we consider the Euler-Bernoulli equation that models the vertical displacement of a finite length beam over time. A Fourier transform version of the beam equation with the constraint of both ends being clamped is as follows
where u(x) is the Fourier transform of the time-varying beam displacement, ω is the frequency, ƒ(x) is the applied force. The Euler-Bernoulli is a pseudo-differential equation with the maximum derivative order T+1=4. We take the task of learning the map from ƒ to u. In
The 1-D Burgers' equation is a non-linear PDE occurring in various areas of applied mathematics. For a given field u(x, t) and diffusion coefficient v, the 1-D Burgers' equation reads:
The task for the neural operator is to learn the mapping of initial condition u(x, t=0) to the solutions at t=1 u(x, t=1). To compare with many advanced neural operators under the same conditions, we use the Burgers' data and results. The initial condition is sampled as Gaussian random fields where u0˜(0, 54(−Δ+521)−2) with periodic boundary conditions. A is the Laplacian, meaning the initial conditions are sampled by sampling its first several coefficients from a Gaussian distribution. In the Burgers' equation, v is set to 0.1. The equation is solved with resolution 213, and the data with lower resolutions are obtained by sub-sampling the highest resolution data set.
The results of the experiments on Burgers' equation for different resolutions are shown in
Darcy flow formulated by Darcy is one of the basic relationships of hydrogeology, describing the flow of a fluid through a porous medium. We experiment on the steady-state of the 2-d Darcy flow equation on the unit box, where it takes the following form:
We set the experiments to learn the operator mapping the coefficient a(x) to the solution u(x). The coefficients are generated according to a α˜(0,(−Δ+32I)−2), where Δ is the Laplacian with zero Neumann boundary conditions. The threshold of a (x) is set to achieve ellipticity. The solutions u(x) are obtained by using a 2nd-order finite difference scheme on a 512×512 grid. Data sets of lower resolution are sub-sampled from the original data set.
The results of the experiments on Darcy Flow for different resolutions are shown in Table2. MWT Leg again obtains the lowest relative error compared to other neural operators at various resolutions. We also perform an additional experiment, in which the multiwavelet filters H(i), G(i), i=0,1 are replaced with random values (properly normalized). We see in Table 2, that MWT Rnd does not learn the operator map, in fact, its performance is worse than all the models. This signifies the importance of the careful choice of the filter matrices.
Full results for these experiments are provided in the supplementary materials.
Navier Stokes Equation: The Navier-Stokes (NS) are 2d time-varying PDEs modeling the viscous, incompressible fluids. The proposed MWT model does a 2d multiwavelet transform for the velocity u, while uses a single-layered 3d convolution for A, B and C to learn dependencies across space-time. We have observed that the proposed MWT Leg is in par with the Sota on the NS equations in Appendix D.1.
Prediction at high resolution: We show that MWT model trained at lower resolutions for various datasets (for example, training with s=256 for Burgers) can predict the output at finer resolutions s=2048, with relative error of 0.0226, thus eliminating the need for expensive sampling. The training and testing with s=2048 yields a relative error of 0.00189. The full experiment is discussed in Appendix D.2.
Train/evaluation with different sampling rules: We study the operator learning behavior when the training and evaluation datasets are obtained using random function from different generating rules. In Appendix D.4.2, the training is done with squared exponential kernel but evaluation is done on different generating rule with controllable parameter λ.
We address the problem of data-driven learning of the operator that maps between two function spaces. Motivated from the fundamental properties of the integral kernel, we found that multiwavelets constitute a natural basis to represent the kernel sparsely. After generalizing the multiwavelets to work with arbitrary measures, we proposed a series of models to learn the integral operator. These techniques may be used to design efficient Neural operators utilizing properties of the kernels, and the suitable basis. These techniques may be used to solve many engineering and biological problems such as aircraft wing design, complex fluids dynamics, metamaterials design, cyber-physical systems, neuron-neuron interactions that are modeled by complex PDEs.
The wavelets represent sets of functions that result from dilation and translation from a single function, often termed as ‘mother function’, or ‘mother wavelet’. For a given mother wavelet ψ(x), the resulting wavelets are written as
where a, b are the dilation, translation factor, respectively, and D is the domain of the wavelets under consideration. We are interested in the compactly supported wavelets, or D is a finite interval [l, r], and we also take ψ∈L2. The consideration for non-compact wavelets, for example, will be a future consideration. Without loss of generality, we provide examples that utilize the finite domain D=[0,1], and extension to any [l, r] can be simply done by making suitable shift and scale.
From a numerical perspective, discrete values (or Discrete Wavelet Transform) of a, b are more useful, and hence, we take a a=2−j, j=0,1, . . . , L−1, where L are the finite number of scales up to which the dilation occurs, and the dilation factor is 2. For a given value of a=2−j, the values of b can be chosen as, b=na, n=0,1, . . . , 2−1. The resulting wavelets are now expressed as ψj,n(x)=2j/2ψ(2jx−n), n=0,1, . . . 2j−1, and x∈[n2−j, (n+1)2−j]. Given a mother wavelet function, the dilation and translation operations for three scales (L=3) is shown in
where cj,n are the discrete wavelet transform coefficients.
The next set of ingredients that are useful to us are the family of orthogonal polynomials (OPs). Specifically, the OPs in the present disclosure will serve as the mother wavelets or span the ‘mother subspace’. Therefore, we are interested in the OPs that are non-zero over a finite domain, and are zero almost everywhere (a.e.). For a given measure μ that defines the OPs, a sequence of OPs P0(x), P1(x), . . . satisfy deg(Pi)=i, and Pi, Pj
μ=0, ∀i≠j where
Pi, Pj
μ=∫Pi(x)Pj(x)dμ. Therefore, sequence of OPs are useful as they can act as a set of basis for the space of polynomials with degree<d by using P0, . . . , Pd-1(x).
The popular set of OPs are hypergeometric polynomials (also known as Jacobi polynomials). Among them, the common choices are Legendre, Chebyshev, and Gegenbauer (which generalize Legendre and Chebyshev) polynomials. These polynomials are defined on a finite interval of [−1,1] and are useful for the techniques described herein. The other set of OPs are Laguerre, and Hermite polynomials which are defined over non-finite domain. Such OPs can be used to extend the present techniques to non-compact wavelets. We now review some defining properties of the Legendre and Chebyshev polynomials.
The Legendre polynomials are defined with respect to (w.r.t.) a uniform weight function WL(x)=1 for −1≤x≤1 or WL(x)=1[−1,1](x) such that
For our purpose, we shift and scale the Legendre polynomials such that they are defined over [0,1] as Pi(2x−1), and the corresponding weight function as wL(2x−1).
Derivatives: The Legendre polynomials satisfy the following recurrence relationships
which allow us to express the derivatives as a linear combination of lower-degree polynomials itself as follows: . . .
where the summation ends at either P0(x) or P1(x), with P0(x)=1 and P1(x)=x.
Basis: A set of orthonormal basis of the space of polynomials with degree<d defined over the interval [0,1] is obtained using shifted Legendre polynomials such that
w.r.t. weight function w(x)=wL(2x−1), such that
The Chebyshev polynomials are two sets of polynomial sequences (first, second order) as Ti, Ui. We take the polynomial of the first order Ti(x) of degree i which is defined w.r.t. weight function wCh(x)=1/√{square root over (1−x2)} for −1≤x≤1 as
After applying the scale and shift to the Chebyshev polynomials such that their domain is limited to [0,1], we get Ti(2x−1) and the associated weight function as wCh(2x−1) such that Ti(2x−1) are orthogonal w.r.t. wCh(2x−1) over the interval [0,1].
Derivatives: The Chebyshev polynomials of the first order satisfy the following recurrence relationships
The derivative of the Ti(x) can be written as the following summation of sequence of lower degree polynomials
where the series ends at either T0(x)=1, or T1(x)=x. Alternatively, the derivative of Ti(x) can also be written as T′i(x)=iUi−1(x), where Ui(x) is the second-order Chebyshev polynomial of degree i.
Basis: A set of orthonormal basis of the space of polynomials of degree up to d and domain [0,1] is obtained using Chebyshev polynomials as
w.r.t. weight function wCh(2x−1), or
Roots: Another useful property of Chebyshev polynomials is that they can be expressed as trigonometric functions; specifically, Tn(cos θ)=cos(nθ). The roots of such are also well-defined in the interval [−1,1]. For Tn(x), the n roots x1, . . . , xn are given by
The multiwavelets can exploit the advantages of both wavelets, as well as OPs, as described herein. For a given function ƒ, instead of projecting the function onto a single wavelet function (wavelet transform), the multiwavelets go one step further and projects the function onto a subspace of degree-restricted polynomials. Along the essence of the wavelet- transform, in multiwavelets, a sequence of wavelet bases are constructed which are a scaled/shifted version of the basis of the coarsest scale polynomial subspace.
We present a measure-version of the multiwavelets which opens-up a family of the multiwavelet-based models for the operator learning. Below, we provide a detailed mathematical formulation for developing multiwavelets using any set of OPs with measures which can be non-uniform. To be able to develop compactly supported multiwavelets, we have restricted ourself to the family of OPs which are non-zero only over a finite interval. The extension to non-compact wavelets could be done by using OPs which are non-zero over complete/semi range of the real-axis (for example, Laguerre, Hermite polynomials). As an example, we present the expressions for Legendre polynomials which use uniform, as described herein, and Chebyshev polynomials which use non-uniform, as described herein. These techniques can be readily extended to other family of OPs like Gegenbauer polynomials.
The linear inhomogeneous pseudo-differential equations =ƒ have the operator which takes the following form
where A is the subset of natural numbers ∪{0}, and x∈
n. The order of the equation is denoted by the highest integer in the set
. The simplest and the most useful case of pseudo-differential operators
is the one in which aa(x)∈C∞. In the pseudo-differential operators literature, it is often convenient to have a symbolic representation for the pseudo-differential operator. First, the Fourier transform of a function ƒ is taken as {circumflex over (ƒ)}(ξ)=
ƒ(x)e−i2πξx dx. The pseudo-differential operator over a function ƒ is defined as
where the operator Ta is parameterized by the symbol a (x, ξ) which for the differential equation (22) is given by
The Euler-Bernoulli equation as discussed herein has ={0,4}.
Below, we discuss in detail the multiwavelet filters as presented herein. First, we introduce some mathematical terminologies that are useful for multiwavelets filters and then preview a few useful tools.
Measures: The functions are expressed w.r.t. basis usually by using measures u which could be non-uniform in-general. Intuitively, the measure provides weights to different locations over which the specified basis are defined. For a measure μ, let us consider a Radon-Nikodym derivative as
where dλ:=dx is the Lebesgue measure. In other words, the measure-dependent integrals ∫ƒdμ(x), can now be defined as ∫ƒ(x)w(x)dx.
Basis: A set of orthonormal basis w.r.t. measure μ, are ϕ0, . . . , ϕk−1 such that ϕi, ϕj
μ=δij. With the weighting function w(x), which is a Radon-Nikodym derivative w.r.t. Lebesgue measure, the orthonormality condition can be re-written as ∫ϕi(x)ϕj(x)w(x)dx=δij.
The basis can also be appended with a multiplicative function called tilt X(x) such that for a set of basis ϕi which is orthonormal w.r.t. μ with weighting function
a new set of basis ϕiX are now orthonormal w.r.t. a measure having weighting function w/X2. We will see that for OPs like Chebyshev, as discussed herein, a proper choice of tilt X(x) simplifies the analysis.
Projections: For a given set of basis ϕi defined w.r.t. measure μ and corresponding weight function w(x), the inner-products are defined such that they induce a measure-dependent Hilbert space structure . Next, for a given function ƒ such that ƒ∈
, the projections onto the basis polynomials are defined as ci=∫ƒ(x)ϕi(x)w(x)dx.
The Gaussian quadrature are the set of tools which are useful in approximating the definite integrals of the following form
where, wi are the scalar weight coefficients, and xi are the n locations chosen appropriately. For an-point quadrature, the eq. (23) is exact for the functions ƒ that are polynomials of degree≤2n−1. This is useful, as we will see below.
From the result in [64], it can be argued that, for a class of OPs Pi defined w.r.t. weight function w(x) over the interval [a, b] such that x1, x2, . . . , xn are the roots of Pn, if
then,
for any ƒ such that ƒ is a polynomial of degree≤2n−1. The weight coefficients can also be written in a closed-form expression [1] as follows
where, an is the coefficient of xn in Pn. Thus, the integral in (23) can be computed using family of OPs defined w.r.t. weight function w(x). Depending on the class of OPs chosen, the Gaussian quadrature formula can be derived accordingly using eq. (24). For a common choice of OPs, the corresponding name for the Quadrature is ‘Gaussian-Legendre’, ‘Gaussian-Chebyshev’, ‘Gaussian-Laguerre’, etc.
The Gram-Schmidt Orthogonalization (GSO) is a common technique for deriving a (i) set of vectors in a subspace, orthogonal to an (ii) another given set of vectors. We briefly write the GSO procedure for obtaining a set of orthonormal polynomials w.r.t. measures which in-general is different for polynomials in set (i) and (ii). We consider that for a given subspace of polynomials with degree<k as V0 and another subspace of polynomials with degree<k V1, such that V0⊂V1, we wish to obtain a set of orthonormal basis for the subspace of polynomials with degree<k W0, such that V0⊥W0 and W0⊂V1. It is apparent that, if dim(W0)=n, dim(V0)=m and dim(V1)=p, then m+n≤p.
Let (ψ0, . . . , ψn−1) be a set of basis of the polynomial subspace W0, (0(0), . . . , ϕm−1(0)) be a set of basis for V0, and (ϕ1(0), . . . , ϕp−1(1)) be a set of basis for V1. We take that basis ψ1 and ϕ1(0) are defined w.r.t. same measure μ0, while ϕi(1) are defined w.r.t. a different measure μ1. A set of ψi can be obtained by iteratively applying the following procedure for i=0,1, . . . , n−1
The procedure in (25) results in a set of orthonormal basis of W0 such that ψi, ψj
μ
ψi, ϕj(0)
μ
Using the mathematical preliminaries and tools discussed previously herein, we are now in shape to present a detailed derivations for the measure dependent multiwavelet filters. We start with deriving the general filters expressions. Expressions for Legendre polynomials then for Chebyshev polynomials are presented below.
The ‘multiwavelet filters’ play the role of transforming the multiwavelet coefficients from one scale to another. Let us revisit the Section 2.2, where we defined a space of piecewise polynomial functions, for k∈ and n∈
+∪{0} as, Vnk. The dim(Vnk)=2nk, and for subsequent n, each subspace is contained in another, e.g., Vn−1k⊂dependent nk. Now, if ϕ0, . . . , ϕk−1 are a set of basis polynomials for V0k w.r.t. measure μ0, then we know that a set of basis for V1k can be obtained by scale and shift of ϕi as ϕjl=21/2ϕj(2x−l)l=0,1, and the measure accordingly as μ1. For a given function ƒ, its multiwavelet coefficients for projections over V0k are taken as s0i0=
ƒ, ϕi
)μ0 and for V1k is taken as sli1=
ƒ, ϕil1
μ
Let us begin by considering a simple scenario. Since, V0k⊂V1k, the basis are related as
It is straightforward to see that if ϕi and ϕil1 are defined w.r.t. same measure, or μ0=μ1 almost everywhere (a.e.), then the filters transforming the multiwavelet coefficients from higher to lower scale, are exactly equal to the subspace mapping coefficients aij(0), aij(1) (by taking inner-product with f on both sides in (27)). However, this is not the case in-general, e.g., the measures w.r.t. which the basis are defined at each scale are not necessarily same. To remedy this issue, and to generalize the multiwavelet filters, we now present a general measure-variant version of the multiwavelet filters.
We note that, solving for filters H that satisfy eq. (26) indeed solves the general case of n +1->n scale, which can be obtained by a simple change of variables as sl,in=Σj=0k−1Hij(0)s2l,jn+1+Σj=0k−1Hij(1)s2l+1,jn+1. Now, for solving (26), we consider the following equation
where
is the Radon-Nikodym derivative as discussed herein, and we have also defined dλ:=dx. We observe that eq. (26) can be obtained from (28) by simply integrating with ƒ on both sides.
Next, we observe an important fact about multiwavelets (or wavelets in-general) that the advantages offered by multiwavelets rely on their ability to project a function locally. One way to achieve this is by computing basis functions which are dilation/translations of a fixed mother wavelet, for example,
We re-write the eq. (28), by substituting ϕi(2x)←{tilde over (ϕ)}i(2x), ϕi(2x−1)←{tilde over (ϕ)}i(2x−1) and μ1←{tilde over (μ)}1, in its most useful form for the present techniques as follows
Thus, filter coefficients can be looked upon as subspace projection coefficients, with a proper choice of tilted basis. Note that eq. (33) is now equivalent to (27) but is an outcome of a different back-end machinery. Since, √{square root over (2)}ϕj(2x), √{square root over (2)}ϕj(2x−1) are orthonormal basis for V1k, we have
and hence we obtain the filter coefficients as follows
For a given set of basis of V0k as ϕ0, . . . , ϕk−1 defined w.r.t. measure/weight function w(x), the filter coefficients H can be derived by solving eq. (29). In a similar way, if ψ0, . . . , ψk−1 is the basis for the multiwavelet subspace W0k w.r.t. measure μ0 such that V0k⊕W0k=V1k, and the projection of function ƒ over W0k is denoted by d0,i0=ƒ, ψi
μ
Again using a change of variables, we get dl,in=Σj=0k−1Gij(0)s2l,jn+1+Σj=0k−1Gij(1)s2l+1,jn+1. To solve for G in (32), similar to eq. (29), the measure-variant multiwavelet basis transformation (with appropriate tilt) is written as
Similar to eq. (30)-(31), the filter coefficients G can be obtained from (33) as follows
Since ϕi, ϕƒ
μ
ψi, ψj
μ
ϕi, ϕj
_82
Let us define filter matrices as H(l)=[Hij(l)]∈k×k and G(l)=[Gij(l)]∈
k×k for l=0,1. Also, we define correction matrices as Σ(0)=[Σij (0)], Σ(1)=[Σij(1)] such that
Now, we can write that
Rearranging eq. we can finally express the relationships between filter matrices and correction matrices as follows
The discussion till now is related to ‘decomposition’ or transformation of multiwavelet transform coefficients from higher to lower scale. However, the other direction, e.g., ‘reconstruction’ or transformation from lower to higher scale can also be obtained from (41). First, note that the general form of eq. (26), (32) can be written in the matrix format as
Next, we observe that Σ(0), Σ(1)0, which follows from their definition. Therefore, eq. (41) can be inverted to get the following form
Finally, by using (43), we can essentially invert the eq. (42) to get
In the following section, we see the filters H, G in (42), (44) for different polynomial basis.
The basis for Vo are chosen as normalized shifted Legendre polynomials of degree upto k w.r.t. weight function wL(2x−1)=1[0,1](x), previously discussed. For example, the first three bases are
For deriving a set of basis ψi of W0k using GSO, we can evaluate the integrals which could be done efficiently using Gaussian quadrature.
Gaussian-Legendre Quadrature: The integrals involved in GSO procedure, and the computations of H, G can be done efficiently using the Gaussian quadrature as discussed previously. Since the basis functions ϕi, ψi are polynomials, therefore, the quadrature summation would be exact. For a given k basis of the subspace V0k, the deg (ϕiϕj)<2k−1, as well as deg (ϕiψj)<2k−1, therefore a k-point quadrature can be used for expressing the integrals. Next, we take the interval [a, b] = [0,1], and the OPs for approximation in Gaussian quadrature as shifted Legendre polynomials Pk(2x−1). The weight coefficients ωi can be written as
where xi are the k roots of Pk(2x−1) and ak can be expressed in terms of ak−1 using the recurrence relationship of Legendre polynomials from Section A.2.1.
A set of basis for V1k is √{square root over (2)}ϕi(2x) and √{square root over (2)}ϕi(2x−1) with weight functions wL(4x−1)=1[0,1/2](x) and wL(4x−3)=1[1/2,1](x), respectively. We now use GSO procedure, as discussed previously, to obtain set of basis ψ0, . . . , ψk−1 for W0k. We use Gaussian-Legendre quadrature formulas for computing the inner-products. As an example, the inner-products are computed as follows
where ϕi(2xi)=0 for xi>0.5.
With shifted Legendre polynomials as basis for V03, the multiwavelet bases for W03 are
Next, we compute the filter matrices, but first note that since the weighting function for Legendre polynomials basis are wL(x)=1[0,1](x), therefore, Σ(0), Σ(1) in eq. (39) are just identity matrices because of orthonormality of the basis √{square root over (2)}ϕi(2x) and √{square root over (2)}ϕi(2x)−1 w.r.t. 1[0,1/2](x) and 1[1/2,1](x), respectively. The filter coefficients can be computed using Gaussian-Legendre quadrature as follows
and similarly other coefficients can be obtained in eq. (30)-(31), (34)-(35). As an example, for k=3, following the outlined procedure, the filter coefficients are derived as follows
We choose the basis for V0k as shifted Chebyshev polynomials of the first-order from degree 0 to k−1. The weighting function for shifted Chebyshev polynomials is wCh(2x−1)=1√{square root over (1−(2x−1)2)} as discussed herein.
The first three bases using Chebyshev polynomials are as follows
The Gaussian quadrature for the Chebyshev polynomials is used to evaluate the integrals that appears in the GSO procedure as well as in the computations of filters H, G.
Gaussian-Chebyshev Quadrature: The basis functions ϕi, ψi resulting from the use of shifted Chebyshev polynomials are also polynomials with degree of their products such that deg (ϕiϕj)<2k−1 and deg(ϕiψj)<2k−1, therefore a k-point quadrature can be used for evaluating the integrals that have products of bases. Upon taking the interval [a, b] as [0,1], and using the canonical OPs as shifted Chebyshev polynomials, the weight coefficients are written as
where xi are the k roots of Tk(2x−1), (a) is using the fact that an/an−1=2 by using the recurrence relationship of Chebyshev polynomials, as discussed herein, and assumes k>1 for the squared integral. For (b), we first note that Tk(cos θ)=cos(kθ), hence, T′k(cos θ)=n sin(nθ)/sin(θ). Since xi are the roots of Tk(2x−1), therefore,
Substituting the xi, we get T′k(2xi−1)Tk−1(2xi−1)=k.
A set of basis for V1k is √{square root over (2)}ϕi(2x) and √{square root over (2)}ϕi(2x−1) with weight functions wCh(4x−1)=1/√{square root over (1−(4x−1)2)} and wCb(4x−3)=1/√{square root over (1−(4x−3)2)}, respectively. We now use GSO procedure as previously outlined to obtain set of basis ψ0, . . . , ψk−1 for W0k. We use Gaussian-Chebyshev quadrature formulas for computing the inner-products. As an example, the inner-products are computed as follows
where ϕi(2xi)=0 for xi>0.5.
With shifted Chebyshev polynomials as basis for V03, the multiwavelet bases for W03 are derived as
Next, we compute the filter and the correction matrices. The filter coefficients can be computed using Gaussian-Chebyshev quadrature as follows
and similarly, other coefficients can be obtained in eq. (30)-(31), (34)-(35). Using the outlined procedure for Chebyshev based OP basis, for k=3, the filter and the corrections matrices are derived as
The numerical computations of the filter matrices are done using Gaussian quadrature as discussed herein for Legendre and Chebyshev polynomials, respectively. For odd k, a root of the canonical polynomial (either Legendre, Chebyshev) would be exactly 0.5. Since the multiwavelets bases ψi for W0k are discontinuous at 0.5, the quadrature sum can lead to an unexpected result due to the finite-precision of the roots xi. One solution for this is to add a small number ∈, ∈>0 to xi to avoid the singularity. Another solution, which we have used, is to perform a {tilde over (k)}-quadrature, where {tilde over (k)}=2k. Note that, any high value of quadrature sum would work as long as it is greater than k, and we choose an even value to avoid the root at the singularity (x=0.5).
To check the validity of the numerically computed filter coefficients from the Gaussian quadrature, we can use eq. (41). In a k-point quadrature, the summation involves up to k degree polynomials, and we found that for large values of k, for example, k>20, the filter matrices tend to diverge from the mathematical constraint of (41). Note that this is not due to the involved mathematics but the precision offered by floating-point values. For these examples, we found values of k in the range of [1,6] to be most useful.
We present numerical evaluation of the proposed multiwavelets-based models on an additional dataset of Navier-Stokes below. Next, we present numerical results for prediction at finer resolutions with the use of lower-resolution trained models. Subsequently, we present additional results on the evaluation of multiwavelets on pseudo-differential equations.
Navier-Stokes Equations describe the motion of viscous fluid substances, which can be used to model the ocean currents, the weather, and air flow. We experiment on the 2-d Navier-Stokes equation for a viscous, incompressible fluid in vorticity form on the unit torus, where it takes the following form:
We set the experiments to learn the operator mapping the vorticity w up to time 10 to w at a later time T>10. A task for the neural operator is to map the first T time units to last T−10 time units of vorticity w. To compare with the state-of-the-art model FNO and other configurations under the same conditions, we use the same Navier-Stokes' data and the results that have been published in. The initial condition is sampled as Gaussian random fields where
with periodic boundary conditions. The forcing function ƒ(x)=0.1(sin(2π(x1+x2)+cos(2π(x1+x2). The experiments are conducted with (1) the viscosities v=1e−4, the final time T=50, the number of training pairs N=1000; (2)v=1e−4, T=30, N=1000; (3)v=1e−4, T=30, N=10000; (4)v=1e−5, T=20, N=1000. The data sets are generated on a 256×256 grid and are subsampled to 64×64.
We see in Table 3 that the proposed MWT Leg outperforms the existing Neural operators as well as other deep NN benchmarks. The MWT models have used a 2d multiwavelet transform with k=3 for the vorticity w, and 3d convolutions in the A, B, C NNs for estimating the time-correlated kernels. The MWT models (both Leg and Chb) are trained for 500 epochs for all the experiments except for N=10000, T=30, v=1e−4 case where the models are trained for 200 epochs. Note that similar to FNO-2D, a time-recurrent version of the MWT models could also be trained and most likely will improve the resulting L2 error for the less data setups like N=1000, v=1e−4 and N=1000, v=1e−5. However, these experiments consider the 3d convolutions (for A, B, C) version.
The proposed multiwavelets-based operator learning model is resolution-invariant by design. Upon learning an operator map between the function spaces, the proposed models have the ability to generalize beyond the training resolution. In this section, we evaluate the resolution extension property of the MWT models using the Burgers' equation dataset as described herein. A pipeline for the experiment is shown in
Similar to the experiments presented previously for the Euler-Bernoulli equation, we now present an additional result on a different pseudo-differential equation. We modify the Euler-Bernoulli beam to a 3rd order PDE as follows
where u(x) is the Fourier transform of the time-varying displacement, ω=215 is the frequency, ƒ(x) is the external function. The eq. (51) is not known to have a physical meaning like Euler-Bernoulli, however, from a simulation point-of-view it can be used as a canonical PDE. A sample force function (input) and the solution of the PDE in (51) is shown in
We present additional results for the Kdv equation different number of OP basis k. First, we demonstrate the operator learning when the input is sampled from a squared exponential kernel. Second, we experiment on the learning behavior of the Neural operators when the train and test samples are generated from different random sampling schemes.
We sample the input u0(x) from a squared exponential kernel, and solve the KdV equation in a similar setting as mentioned previously. Due to the periodic boundary conditions, a periodic version of the squared exponential kernel 160] is used as follows.
where, P is the domain length and Z is the smoothing parameter of the kernel. The random input function is sampled from (0, Km) with Km being the kernel matrix by taking P=1 (domain length) and L=0.5 to avoid the sharp peaks in the sampled function. The results for the Neural operators (similar to Table 1) is shown in Table 5. We see that MWT models perform better than the existing neural operators at all resolutions.
The experiments implementing the present techniques and also in other neural operator techniques have used the datasets such that the train and test samples are generated by sampling the input function using the same rule. For example, in KdV, a complete dataset is first generated by randomly sampling the inputs u0(x) from (0, 74(Δ+72I)−2.5) and then splitting the dataset into train/test. This setting is useful when dealing with the systems such that the future evaluation function samples have similar patterns like smoothness, periodicity, presence of peaks. However, from the viewpoint of learning the operator between the function spaces, this is not a general setting. We have seen in
The numerical values for the Burgers' equation experiment, as presented in
For example, the computing system 2605 may communicate with one or more interfaces, such as display interfaces, to present the results of any computations or calculations, or to provide insight into the differential equations learned using the MWT model 2620, as described herein. The MWT model 2620 can be any of the MWT implementations as described herein. The processor 2610 can execute processor-executable instructions to carry out the functionalities as described herein. The MWT model 2620 can be used to learn differential equations for a variety of applications. The MWT model 2620 can be, for example, the MWT model described herein in connection with
At step 2702, the method can include receiving input data. The input data may be sparse, and can depend on the particular application for which differential equations are being learned. The input data may be sorted data (e.g., in a particular sequence, such as an ordered time-series sequence of data, etc.). The input data may be received via a computer network, or may be provided via one or more communications interfaces or from a suitable storage medium.
At step 2704, the method can include identifying filter functions. The filter functions can be any type of the multiwavelet filter functions described herein (e.g., the filter functions H and G). In some implementations, the filter functions may be selected or identified based on the type of differential equations being learned (e.g., a predetermined or preselected set of filters, etc.). The filters can be any type of function that can take items of the data as input. The filter functions may be used in further steps of the method 2700.
At step 2706, the method can include' transforming the dataset into subset(s). To transform the data into subsets, the filters can be applied (e.g., executed over) the input data (or one or more of the subsets of data generated during an iteration of the method 2700). When executing the filter functions over the data, the data may be split may be split into one or more subsets (e.g., by bisecting the set of data into two equal subsets, etc.).
At step 2708, the method can include processing the subset(s) with one or more model(s). The models can be any type of neural network model, such as a deep neural network, a convolutional neural network, a recurrent neural network, or a fully connected neural network, among others. The models can be, for example, the NNs A, B and C, and T, as described herein. Processing the data can input providing one or more of the transformed subsets as input to one or more of the models to generate sets of output data. The output data for each iteration can be stored and ultimately combined in STEP 2712 to produce final output data. Each of the models may have the same hyperparameters or may have different hyperparameters. The models may be selected or trained for various applications, as described herein.
At step 2710, the method can include determining whether there is additional data to process. For example, if the transformed data (e.g., which may be a sequence of data) includes enough data to be bisected into two groups of data, the set of data may be bisected and provided and subsequently treated as the input data for a subsequent iteration at STEP 2706. The selected subset of information (e.g., as shown in
At step 2712, the method can include summing to generate an output. The summing process can follow the right-hand portion of
The central processing unit 2821 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 2822. In many implementations, the central processing unit 2821 is provided by a microprocessor unit, such as: those manufactured by Intel Corporation of Mountain View, California; those manufactured by International Business Machines of White Plains, New York; those manufactured by Advanced Micro Devices of Sunnyvale, California; or those manufactured by Advanced RISC Machines (ARM). The computing device 2800 may be based on any of these processors, or any other processors capable of operating as described herein.
Main memory unit 2822 may be one or more memory chips capable of storing data and allowing any storage location to be directly accessed by the microprocessor 2821, such as any type or variant of Static random access memory (SRAM), Dynamic random access memory (DRAM), Ferroelectric RAM (FRAM), NAND Flash, NOR Flash and Solid State Drives (SSD). The main memory 2822 may be based on any of the above described memory chips, or any other available memory chips capable of operating as described herein. In the implementation shown in
A wide variety of I/O devices 2830a-830n may be present in the computing device 2800. Input devices include keyboards, mice, trackpads, trackballs, microphones, dials, touch pads, touch screen, and drawing tablets. Output devices include video displays, speakers, inkjet printers, laser printers, projectors and dye-sublimation printers. The I/O devices may be controlled by an I/O controller 2823 as shown in
Referring again to
Furthermore, the computing device 2800 may include a network interface 2818 to interface to the network 2804 through a variety of connections including, but not limited to, standard telephone lines, LAN or WAN links (e.g., 2802.11, T1, T3, 56kb, X.25, SNA, DECNET), broadband connections (e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethernet-over-SONET), wireless connections, or some combination of any or all of the above. Connections can be established using a variety of communication protocols (e.g., TCP/IP, IPX, SPX, NetBIOS, Ethernet, ARCNET, SONET, SDH, Fiber Distributed Data Interface (FDDI), RS232, IEEE 2802.11, IEEE 2802.11a, IEEE 2802.11b, IEEE 2802.11g, IEEE 2802.11n, IEEE 2802.11ac, IEEE 2802.11ad, CDMA, GSM, WiMax and direct asynchronous connections). In one implementation, the computing device 2800 communicates with other computing devices 2800′ via any type and/or form of gateway or tunneling protocol such as Secure Socket Layer (SSL) or Transport Layer Security (TLS). The network interface 2818 may include a built-in network adapter, network interface card, PCMCIA network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for interfacing the computing device 2800 to any type of network capable of communication and performing the operations described herein.
In some implementations, the computing device 2800 may include or be connected to one or more display devices 2824a-824n. As such, any of the I/O devices 2830a-830n and/or the I/O controller 2823 may include any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection and use of the display device(s) 2824a-824n by the computing device 2800. For example, the computing device 2800 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use the display device(s) 2824a-824n. In one implementation, a video adapter may include multiple connectors to interface to the display device(s) 2824a-824n. In other implementations, the computing device 2800 may include multiple video adapters, with each video adapter connected to the display device(s) 2824a-824n. In some implementations, any portion of the operating system of the computing device 2800 may be configured for using multiple displays 2824a-824n. One ordinarily skilled in the art will recognize and appreciate the various ways and implementations that a computing device 2800 may be configured to have one or more display devices 2824a-824n.
In further implementations, an I/O device 2830 may be a bridge between the system bus 2850 and an external communication bus, such as a USB bus, an Apple Desktop Bus, an RS-232 serial connection, a SCSI bus, a FireWire bus, a FireWire 2800 bus, an Ethernet bus, an AppleTalk bus, a Gigabit Ethernet bus, an Asynchronous Transfer Mode bus, a FibreChannel bus, a Serial Attached small computer system interface bus, a USB connection, or a HDMI bus.
This technical solution is also directed to to explicitly embed the exponential operators in the neural operator architecture for dealing with the IVP like datasets. The exponential operators are non-linear, and therefore, this removes the requirement of having multi-cell linear integral operator layers. However, this is seldom a feasible scenario for the expensive real-world experiments, or on-going recent issues like COVID19 prediction. This technical solution is helpful in providing data-efficiency analytics, and is useful in dealing with scarce and noisy datasets. To the advantage of Pade approximation, the exponential of a ′ given operator can be computed with the pre-defined coefficients and a recurrent polynomial mechanism.
This technical solution can: (i) For the IVPs, we propose to embed the exponential operators in the neural operator learning mechanism. (ii) By using the Pade approximation, we compute the exponential of the operator using a novel ′ recurrent neural architecture that also eliminates the need for matrix inversion. (iii) This technical solution can demonstrate that the proposed recurrent scheme, using the Pade coefficients, have bounded gradients ′ with respect to (w.r.t.) the model parameters across the recurrent horizon. (iv) We demonstrate the data-efficiency on the synthetic 1D datasets of Korteweg-de Vries (KdV) and Kuramoto-Sivashinsky (KS) equations, where with less parameters we achieve state-of-the-art performance. (v) For example, a system can formulate and investigate epidemic forecasting as a 2D time-varying neural operator problem, and show that for real-world noisy and scarce data, the proposed model can, for example, outperform the best neural operator architectures by at least 53% and best non-neural operator schemes by at least 52%. For example, a system can include a Pade Model Implementation. The operator L can be fixed as a single-layered convolution operator for 1D datasets, and 2-layered convolution for 2D datasets. For getting the input/output operator mapping, the multiwavelet transform can be used only for discretizing the spatial domain. A Pade neural model fits into the sockets of the multiwavelet transformation based neural operator ′. The multiwavelet filters can be obtained using shifted Legendre OPs with degree K″4.
Implementations of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software embodied on a tangible medium, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer programs, e.g., one or more components of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. The program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can include a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).
The features disclosed herein may be implemented on a smart television module (or connected television module, hybrid television module, etc.), which may include a processing module configured to integrate internet connectivity with more traditional television programming sources (e.g., received via cable, satellite, over-the-air, or other signals). The smart television module may be physically incorporated into a television set or may include a separate device such as a set-top box, Blu-ray or other digital media player, game console, hotel television system, and other companion device. A smart television module may be configured to allow viewers to view videos, movies, photos and other content on the web, on a local cable TV channel, on a satellite TV channel, or stored on a local hard drive. A set-top box (STB) or set-top unit (STU) may include an information appliance device that may contain a tuner and connect to a television set and an external source of signal, turning the signal into content which is then displayed on the television screen or other display device.
The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
The terms “data processing apparatus”, “feature extraction system,” “data processing system”, “client device”, “computing platform”, “computing device”, or “device” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatuses can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The elements of a computer include a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), for example. Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), plasma, or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can include any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
The computing system such as the feature extraction system 105 can include clients and servers. For example, the feature extraction system 105 can include one or more servers in one or more data centers or server farms. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some implementations, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving input from a user interacting with the client device). Data generated at the client device (e.g., a result of an interaction, computation, or any other event or computation) can be received from the client device at the server, and vice-versa.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular implementations of the systems and methods described herein. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results.
In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products. For example, the feature extraction system 105 could be a single module, or a logic device having one or more processing modules.
Having now described some illustrative implementations and implementations, it is apparent that the foregoing is illustrative and not limiting, having been presented by way of example. In particular, although many of the examples presented herein involve specific combinations of method acts or system elements, those acts and those elements may be combined in other ways to accomplish the same objectives. Acts, elements and features discussed only in connection with one implementation are not intended to be excluded from a similar role in other implementations or implementations.
The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including” “comprising” “having” “containing” “involving” “characterized by” “characterized in that” and variations thereof herein, is meant to encompass the items listed thereafter, equivalents thereof, and additional items, as well as alternate implementations consisting of the items listed thereafter exclusively. In one implementation, the systems and methods described herein consist of one, each combination of more than one, or all of the described elements, acts, or components.
Any references to implementations or elements or acts of the systems and methods herein referred to in the singular may also embrace implementations including a plurality of these elements, and any references in plural to any implementation or element or act herein may also embrace implementations including only a single element. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements to single or plural configurations. References to any act or element being based on any information, act or element may include implementations where the act or element is based at least in part on any information, act, or element.
Any implementation disclosed herein may be combined with any other implementation, and references to “an implementation,” “some implementations,” “an alternate implementation,” “various implementation,” “one implementation” or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the implementation may be included in at least one implementation. Such terms as used herein are not necessarily all referring to the same implementation. Any implementation may be combined with any other implementation, inclusively or exclusively, in any manner consistent with the aspects and implementations disclosed herein.
References to “or” may be construed as inclusive so that any terms described using “or” may indicate any of a single, more than one, and all of the described terms.
Where technical features in the drawings, detailed description or any claim are followed by reference signs, the reference signs have been included for the sole purpose of increasing the intelligibility of the drawings, detailed description, and claims. Accordingly, neither the reference signs nor their absence have any limiting effect on the scope of any claim elements.
The systems and methods described herein may be embodied in other specific forms without departing from the characteristics thereof. Although the examples provided may be useful for multiwavelet-based operator learning for differential equations, the systems and methods described herein may be applied to other environments. The foregoing implementations are illustrative rather than limiting of the described systems and methods. The scope of the systems and methods described herein may thus be indicated by the appended claims, rather than the foregoing description, and changes that come within the meaning and range of equivalency of the claims are embraced therein.
This application claims the benefit of priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application Ser. No. 63/280,857, entitled “MULTIWAVELET-BASED OPERATOR LEARNING FOR DIFFERENTIAL EQUATIONS,” filed Nov. 18, 2021, the contents of such application being hereby incorporated by reference in its entirety and for all purposes as if completely and fully set forth herein.
This invention was made with government support under grant nos. 66001-17-1-4044 awarded by the Defense Advanced Research Projects Agency (DARPA), Career CPS/CNS-1453860 awarded by the National Science Foundation (NSF), CCF-1837131 awarded by the NSF, MCB-1936775 awarded by the NSF, CNS1932620 awarded by the NSF, and W911NF-17-1-0076 awarded by the Army Research Office (ARO). The government has certain rights in the invention.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/US2022/043885 | 9/16/2022 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 63280857 | Nov 2021 | US |