SYSTEMS AND METHODS FOR TRAINING PHYSICS-INFORMED NEURAL NETWORK SURROGATE MODELS TO MODEL PHYSICAL PROBLEMS USING THE FINITE ELEMENT METHOD

TECHNICAL FIELD

The present disclosure relates to systems and methods for training physics-informed neural network (PINN) surrogate models to model physical problems.

BACKGROUND

Workhorse convolution operators used in convolutional neural networks assume a uniform, pixelated grid (e.g., an image). Such a grid provides a poor representation of irregular domains, leading to non-conformal representations at the boundaries. Finite element meshes, however, are conformal to boundaries, but elements have irregular shapes.

If the convolution operator is not rotationally equivariant, then any arbitrary rotation of the input can yield inaccurate results. This is an undesirable consequence that limits potential application of the technology. The rotational invariance of the laws of physics demands that solutions to physical problems satisfy rotational equivariance.

The present disclosure describes systems and methods related to the above.

SUMMARY

In one aspect of the present disclosure, a method for training physics-informed neural network (PINN) surrogate models to model physical problems is provided. The method may include coupling, using a processor, one or more convolutional neural networks (CNNs) with the finite element method (FEM). The coupling may include calculating, for a finite element mesh comprising a plurality of finite elements, an internal force vector, P, a force vector, F, and a tangent stiffness matrix, K_T, using the FEM. Each finite element may include one or more finite element nodes. The coupling may further include applying a CNN to the finite element mesh to obtain a solution to a physical problem. The CNN may be trained on a loss function, and the loss function may incorporate the internal force vector, P, and the force vector, F.

In another aspect of the present disclosure, the plurality of finite elements may represent a spatiotemporal variation of one or more physical quantities into which the physical problem can be divided.

In another aspect of the present disclosure, the one or more physical quantities may include one or more of the following: one or more quantities pertaining to solid mechanics, fluid mechanics, electromagnetic radiation, and heat transfer.

In another aspect of the present disclosure, the loss function may include:

$ℒ (d) =  P (d) - F $

where d denotes a solution vector of the physical problem.

In another aspect of the present disclosure, when the physical problem includes a linear physical problem, the internal force vector, P, may include: P(d)=Kd, where K is a stiffness matrix.

In another aspect of the present disclosure, when the physical problem includes a non-linear physical problem, the applying the CNN to each finite element node may include iteratively applying a CNN for each of a plurality of solution vectors, d.

In another aspect of the present disclosure, backpropagation may be used to update hyperparameters by computing derivatives of the loss function with respect to a plurality of solution vectors, d.

In another aspect of the present disclosure, the derivatives of the loss function with respect to the plurality of solution vectors may include:

$\frac{\partial ℒ}{\partial d} = K_{T} \frac{R}{ R }$

where:

R is a residual vector including: R=P(d)−F; and

K_Tis a tangent stiffness matrix including:

$K_{T} = \frac{\partial R}{\partial d} .$

In another aspect of the present disclosure, the applying the CNN may include applying a convolutional operator, using a stencil tensor, S, to perform one or more convolutions.

In another aspect of the present disclosure, the stencil tensor, S, may be a collection of points defined based on Gaussian quadrature, and one or more convolutions of the CNN may be enabled by placing the stencil tensor over top of a finite element node, of one or more finite element nodes, and then evaluating a field at each of one or more stencil points using an inverse isoparametric map.

In another aspect of the present disclosure, convolution operator is substantially rotationally equivariant

In another aspect of the present disclosure, the convolution operator may include:

$O_{a c_{0} m_{0}}^{(l_{0})} \approx \frac{π^{2} r_{c}}{4} \sum_{α = 1}^{n_{gp, α}} \sum_{β = 1}^{n_{gp, β}} \sum_{γ = 1}^{n_{gp, γ}} w_{α} w_{β} w_{γ} \cdot {\tilde{r}}_{a, γ}^{2} \sin {\tilde{θ}}_{β} \cdot C_{(l_{f}, m_{f}) (l_{i}, m_{i})}^{(l_{0}, m_{o})} \cdot R_{c_{i} c_{0}}^{(l_{i}, l_{f})} ({\tilde{r}}_{a, γ}) \cdot Y_{m_{f}}^{(l_{f})} ({\tilde{θ}}_{β}, {\tilde{ϕ}}_{α}) \cdot I_{c_{i} m_{i}}^{(l_{i})} ({\tilde{r}}_{a, γ}, {\tilde{θ}}_{β}, {\tilde{ϕ}}_{α})$

where:

- l is a rotation order, c_iis an input channel index, and m is a representation index;
- O_ac_o_m_o^(l^o⁾is an output at node a for an output channel c_o, the rotation order l_o, and the representation index m₀;
- a number of the input channels, c_i, and a number of the output channels, c_o, can differ;
- r_cis a cut-off radius within a spherical neighborhood of node a;
- {tilde over (r)}_a,γ, {tilde over (θ)}_β, and {tilde over (ϕ)}_α are spherical coordinates, the spherical coordinates including:

${\tilde{r}}_{a, γ} = \frac{r_{c}}{2} (ξ_{γ} + 1), {\tilde{θ}}_{β} = \frac{π}{2} (ξ_{β} + 1),$

and {tilde over (ϕ)}_α=ϕ(ξ_α+1);

- where: ξ_α, ξ_β, and ξ_γ are positions of Gauss points within an interval of [−1, 1]
- n_gp,γ, n_gp,β, and n_gp,γare numbers of the Gauss points, evaluated at the spherical coordinates {tilde over (r)}_a,γ, {tilde over (θ)}_β, and {tilde over (ϕ)}_α;
- w_α, w_β, and w_γ are Gauss weights;
- C_(l_f_,m_f_)(l_i_,m_i₎^(l^o^,m^o⁾are Clebsch-Gordon coefficients;
- R_c_i_c_o^(lⁱ^,l^f⁾(r_a,γ) is a radial weight function;
- Y_m_f^(l^f⁾({tilde over (θ)}_B, {tilde over (ϕ)}_α) is a spherical harmonics component; and
- l_c_i_m_i^(lⁱ⁾({tilde over (r)}_a,γ, {tilde over (θ)}_β, {tilde over (ϕ)}_α) is an input field to a particular convolutional neuron.

In another aspect of the present disclosure, the radial weight function may be expanded in a set of basis functions, the basis functions may include:

$R_{c_{i} c_{0}}^{(l_{i}, l_{f})} (r) = \sum_{p = 1}^{p_{\max}} α_{c_{i} c_{o} p}^{(l_{f}, l_{i})} g_{p} (r)$

where:

- g_p(r) is a p-th radial basis function;
- r is a radius; and
- a_c_i_c_o_p^(l^f^,lⁱ⁾are trainable parameters.

In another aspect of the present disclosure, the set of basis functions may include cosines and Chebyshev polynomials, the set of basis functions may include:

$g_{1} (r) = 1 + \cos (\frac{π r}{r_{c}});$

$g_{p} (r) = \frac{1}{4} [1 - T_{p - 1} (x)] [1 + \cos (\frac{π r}{r_{c}})], p \geq 2; and$

$x = 1 - 2 (\frac{\exp (- λ (r / r_{c} - 1) - 1)}{\exp (- λ) - 1});$

where:

- T_pis a p-th Chebyshev polynomial of a first kind;
- and λ is a parameter.

In another aspect of the present disclosure, the input field may be specified using a finite element approximation, the input field may include:

$I_{c m}^{(l)} (x) = \sum_{e} \sum_{n \in e} d_{c m n}^{(l)} {\tilde{N}}^{n e} (x)$

where:

- d_cmn^(l)is a nodal value for the field I_cm^(l)at node n;
- Ñ^ne(⋅) is a shape function evaluated in a physical domain for node n contained in an element e; and
- summations are performed for all the elements e and all the nodes n.

In another aspect of the present disclosure, the convolution operator may be a numerical approximation of an analytical expression, the analytical expression may include:

$O_{a c_{0} m_{0}}^{(l_{0})} = \int_{0}^{2 π} \int_{0}^{π} \int_{0}^{r_{c}} r^{2} \sin θ \cdot C_{(l_{f}, m_{f}) (l_{i}, m_{i})}^{(l_{0}, m_{o})} \cdot R_{c_{i} c_{0}}^{(l_{i}, l_{f})} (r_{a}) \cdot Y_{m_{f}}^{(l_{f})} (θ, ϕ) \cdot I_{c_{i} m_{i}}^{(l_{i})} (r, θ, ϕ) drd θ d ϕ$

where r_ais a radial distance from the node a and includes: r_a=∥x−x_a∥.

In another aspect of the present disclosure, an isoparametric mapping function may include:

$x_{i} (ξ_{i}) = \sum_{e} \sum_{n \in e} X_{i}^{n} N^{n e} (ξ_{i})$

where:

- X_iⁿis an i-th coordinate of the node n in a physical domain; and
- N^ne(ξ_i) is an elemental shape function evaluated at a parent domain coordinate ξ_i. A relationship between the parent domain and the physical domain may include:

${\tilde{N}}^{n e} (x_{i}) = N^{n e} (ξ_{i} (x_{i}));$

- where: the inverse isoparametric map may include ξ_i(x_i).

In another aspect of the present disclosure, the calculating, for each finite element, the internal force vector, P, the force vector, F, and the tangent stiffness matrix, K_T, using the FEM, may include training a surrogate model to a plurality of finite elements of the one or more finite elements.

In another aspect of the present disclosure, the calculating, for the finite element mesh, the internal force vector, P, the force vector, F, and the tangent stiffness matrix, K_T, using the FEM, may include constructing a training case, and each training case may correspond to one or more ingredients necessary to establish a finite element problem.

In another aspect of the present disclosure, the one or more ingredients may include one or more of the following: a finite element mesh of a chosen geometry; one or more constitutive models to describe one or more substance behaviors in the one or more constitutive models; a set of boundary conditions; a set of source terms; and a set of body forces.

In another aspect of the present disclosure, a system for training PINN surrogate models to model physical problems is provided. The system may include a computing device, further including a processor and a memory configured to store programming instructions. The programming instructions, when executed by the processor, may be configured to cause the processor to couple one or more CNNs with the FEM. The coupling may include calculating, for a finite element mesh comprising a plurality of finite elements, an internal force vector, P, a force vector, F, and a tangent stiffness matrix, K_T, using the FEM. Each finite element may include one or more finite element nodes. The coupling may further comprise applying a CNN to the finite element mesh, to obtain a solution to a physical problem. The CNN may be trained on a loss function, and the loss function may incorporate the internal force vector, P, and the force vector, F.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings are illustrative of particular embodiments of the present disclosure and therefore do not limit the scope of the present disclosure. Embodiments of the present disclosure will hereinafter be described in conjunction with the appended drawings, wherein like numerals denote like elements.

FIG. 1 is an example flowchart of a method for training physics-informed neural network (PINN) surrogate models to model physical problems, according to various embodiments of the present disclosure;

FIG. 2 illustrates a stencil tensor, S, applied to a finite element mesh at node n, according to various embodiments of the present disclosure;

FIGS. 3A-3C show example results for a simulation for a range of wedge angles, illustrating loss convergence during training (FIG. 3A), solution fields from FEM and a trained finite element PINN (FE-PINN) (FIG. 3B), and an FE-PINN error as a function of wedge angle after training with three wedged block geometries (FIG. 3C), according to various embodiments of the present disclosure; and

FIG. 4 illustrates example elements of a computing device, according to various embodiments of the present disclosure.

DETAILED DESCRIPTION

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. These terms are merely intended to distinguish one component from another component, and the terms do not limit the nature, sequence or order of the constituent components. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Throughout the specification, unless explicitly described to the contrary, the word “comprise” and variations such as “comprises” or “comprising” will be understood to imply the inclusion of stated elements but not the exclusion of any other elements. In addition, the terms “unit”, “-er”, “-or”, and “module” described in the specification mean units for processing at least one function and operation and can be implemented by hardware components or software components and combinations thereof.

In this document, when terms such as “first” and “second” are used to modify a noun, such use is simply intended to distinguish one item from another and is not intended to require a sequential order unless specifically stated. In addition, terms of relative position such as “vertical” and “horizontal”, or “front” and “rear”, when used, are intended to be relative to each other and need not be absolute and only refer to one possible position of the device associated with those terms depending on the device's orientation.

An “electronic device” or a “computing device” refers to a device that includes a processor and memory. Each device may have its own processor and/or memory, or the processor and/or memory may be shared with other devices as in a virtual machine or container arrangement. The memory may contain or receive programming instructions that, when executed by the processor, cause the electronic device to perform one or more operations according to the programming instructions.

The terms “memory,” “memory device,” “computer-readable storage medium,” “data store,” “data storage facility” and the like each refer to a non-transitory device on which computer-readable data, programming instructions or both are stored. Except where specifically stated otherwise, the terms “memory,” “memory device,” “computer-readable storage medium,” “data store,” “data storage facility” and the like are intended to include single device embodiments, embodiments in which multiple memory devices together or collectively store a set of data or instructions, as well as individual sectors within such devices.

The terms “processor” and “processing device” refer to a hardware component of an electronic device that is configured to execute programming instructions. Except where specifically stated otherwise, the singular term “processor” or “processing device” is intended to include both single-processing device embodiments and embodiments in which multiple processing devices together or collectively perform a process.

The term “module” refers to a set of computer-readable programming instructions, as executed by a processor, which cause the processor to perform a specified function.

Although exemplary embodiment is described as using a plurality of units to perform the exemplary process, it is understood that the exemplary processes may also be performed by one or plurality of modules. Additionally, it is understood that the term controller/control unit refers to a hardware device that includes a memory and a processor and is specifically programmed to execute the processes described herein. The memory is configured to store the modules, and the processor is specifically configured to execute said modules to perform one or more processes which are described further below.

Further, the control logic of the present disclosure may be embodied as non-transitory computer readable media on a computer readable medium containing executable programming instructions executed by a processor, controller, or the like. Examples of computer readable media include, but are not limited to, ROM, RAM, compact disc (CD)-ROMs, magnetic tapes, floppy disks, flash drives, smart cards and optical data storage devices. The computer readable medium can also be distributed in network-coupled computer systems so that the computer readable media may be stored and executed in a distributed fashion, such as, e.g., by a telematics server or a Controller Area Network (CAN).

Unless specifically stated or obvious from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example, within 2 standard deviations of the mean. About can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value.

With respect to the above description then, it is to be realized that the optimum dimensional relationships for the parts of the invention, to include variations in size, materials, shape, form, function and manner of operation, assembly and use, are deemed readily apparent and obvious to one skilled in the art, and all equivalent relationships to those illustrated in the drawings and described in the specification are intended to be encompassed by the present invention.

The inventive concepts are described with reference to the attached figures, wherein like reference numerals represent like parts and assemblies throughout the several views. Several aspects of the inventive concepts are described below with reference to example applications for illustration. It should be understood that numerous specific details, relationships, and methods are set forth to provide a full understanding of the inventive concepts. One having ordinary skill in the relevant art, however, will readily recognize that the inventive concepts can be practiced without one or more of the specific details or with other methods. In other instances, well-known structures or operation are not shown in detail to avoid obscuring the inventive concepts.

Hereinafter, systems and methods for training physics-informed neural network (PINN) surrogate models to model physical problems, according to embodiments of the present disclosure, will be described with reference tox c the accompanying drawings.

Referring to FIG. 1, an example method 100 for training physics-informed neural network (PINN) surrogate models to model physical problems is illustratively depicted, in accordance with various embodiments of the present disclosure.

At 105, one or more convolutional neural networks (CNNs) may be coupled with a finite element method (FEM). According to various embodiments, the coupling may be performed using a computing device (e.g., computing device 400 as shown in FIG. 4). According to various embodiments, the coupling may comprise, at 110, training a surrogate model. According to various embodiments, training the surrogate model may comprise, at 112, calculating, for a finite element mesh comprising a plurality of finite elements, an internal force vector, P, a force vector, F, and a tangent stiffness matrix, K_T, using the FEM. Each finite element may comprise one or more finite element nodes.

According to various embodiments, the plurality of finite elements may represent a spatiotemporal variation of one or more physical quantities into which the physical problem can be divided. According to various embodiments, the one or more physical quantities may comprise one or more of the following: one or more quantities pertaining to solid mechanics, fluid mechanics, electromagnetic radiation, and heat transfer. According to various embodiments, d may denote a nodal solution vector. According to various embodiments, d may be a nodal solution vector for whatever field is of interest (e.g., temperature, displacement, etc.). This leads to a “physics-informed” neural network since the network is configured to “learn” one or more governing equations rather than “learning” one or more chosen solutions to the equations. This approach may be implemented for industries including, but not limited to, automotive, aviation, aerospace, defense, energy, oil & gas, and structural engineering industries.

The coupling may further comprise, at 115, applying a CNN to a finite element mesh to obtain a solution to a physical problem. According to various embodiments, the CNN may be trained on a loss function, and the loss function may incorporate the internal force vector, P, and the force vector, F. According to various embodiments, the loss function takes the form of Equation 1, where d denotes a solution vector of the physical problem.

$\begin{matrix} ℒ (d) =  P (d) - F  & Equation 1 \end{matrix}$

According to various embodiments, the physical problem may comprise a linear physical problem, a non-linear physical problem, and/or a combination of one or more linear physical problems and/or non-linear physical problems. For example, the physical problem may comprise a linear physical problem. According to various embodiments, when the physical problem comprises a linear physical problem, the internal force vector, P, may be computed according to Equation 2.

$\begin{matrix} P (d) = Kd & Equation 2 \end{matrix}$

where K is a stiffness matrix.

According to various embodiments, when the physical problem comprises a non-linear physical problem, the applying the CNN to a finite element node may comprise iteratively applying a CNN for each of a plurality of solution vectors, d.

According to various embodiments, the applying the CNN may comprise, at 117, applying a convolutional operator, using a stencil tensor, S, (e.g., a non-uniform grid that is yet to be defined) to perform one or more convolutions. The stencil tensor, S, may be a regular N×N grid with spacing, Δ, as shown, e.g., in FIG. 2. In alternative embodiments, the stencil tensor, S, may be a collection of points based on Gaussian quadrature. In other alternative embodiments, the stencil tensor, S, may be another suitable distribution of points. It is noted, however, that other stencil tensors may be incorporated while maintaining the spirit and functionality of the present disclosure. According to various embodiments, pooling operations, unsampling operations, and/or other suitable operations may be incorporated, in addition to, or instead of, convolution operations, while maintaining the spirit and functionality of the present disclosure.

One or more convolutions of the CNN may be enabled by placing the stencil tensor, S, over top of a finite element node, of one or more finite element nodes, and then evaluating a field at each of the one or more stencil points, may be evaluated using an inverse isoparametric map. According to various embodiments, the convolution operator may take the form of Equation 3.

$\begin{matrix} O_{i} = \sum_{m = - α}^{α} \sum_{n = - α}^{α} I (x_{i} + S_{m, n}, P_{m, n}) W_{m, n} & Equation 3 \end{matrix}$

where O_iis an output of a convolution for node i, x_iis a position vector of node i, W_m,nis a weight tensor for the convolution for node i, P_m,nis an additional parameter tensor used in convolution for node i, I(x) is an evaluation of the input field at position x, and α=(N−1)/2.

In alternate embodiments, one or more convolutions of the CNN may be intrinsically equivariant with respect to rotations in three-dimensional space. For example, an input field to a particular convolutional neuron may be represented by I_cm^(l)(x), where l is a rotation order, c is an input channel index, and m is a representation index. The representation index can span an inclusive range of values from −l to l. Values for the rotation order and the representation index can be assigned depending on a field being characterized by the input field. For example, the rotation index and the representation index are both zero if the input field is a scalar field. If, however, the input field is a vector field, then the rotation index is set to unity and the representation index is assigned values of m=−1; m=0; and m=1 to denote three components of a vector. The input field can be specified using a finite element mesh and expressed using a finite element approximation and may therefore take the form of Equation 4.

$I_{c m}^{(l)} (x) = \sum_{e} \sum_{n \in e} d_{c m n}^{(l)} {\tilde{N}}^{n e} (x)$

where d_cmj^(l)is a nodal value for the field I_cm^(l)at a node n, Ñ^ne(⋅) is a shape function evaluated in a physical domain for the node n contained in an element e. Summations are carried out for all of the elements and all of the nodes contained in each element.

A convolution of the CNN can be expressed as an integral and may take the form of Equation 5.

$\begin{matrix} O_{a} = \int_{𝒩_{a}} W (x - x_{a}) I (x) dx & Equation 5 \end{matrix}$

where O_ais an output at node a, x_ais a position of node a, N_ais a neighborhood of node a used for the convolution, and W(⋅) is a convolutional weight function. To support rotational equivariance, Equation 5 can be limited to a spherical neighborhood of node a with a cut-off radius of r_c. Equation 5 can further be expressed in spherical coordinates, which may take the form of Equation 6.

$\begin{matrix} O_{a} = \int_{0}^{2 π} \int_{0}^{2 π} \int_{0}^{r_{c}} r^{2} \sin θ W (r_{a}, θ, ϕ) I (r_{a}, θ, ϕ) drd θ d ϕ & Equation 6 \end{matrix}$

where a radial distance from node a can be expressed as r_a=∥x−x_a∥.

The convolutional weight function W(⋅) of Equation 6 can be constructed to be rotationally equivariant. For example, a tensor field networks approach can be used decompose the convolutional weight function W(⋅). Further, the convolutional weight function W(⋅) can be decomposed a product of a radial weight function and a spherical harmonics component. In some embodiments, the radial weight function may be expressed as R_c_i_c_o^(lⁱ^,l^f⁾(r) and the spherical harmonics component may be expressed as Y_m_f^(l^f⁾(θ, ϕ). Further, the radial weight function may be expanded in a set of basis functions. For example, after expansion, the radial weight function may take the form of Equation 7.

$\begin{matrix} R_{c_{i} c_{0}}^{(l_{i}, l_{f})} (r) = \sum_{p = 1}^{p_{\max}} α_{c_{i} c_{o} p}^{(l_{f}, l_{i})} g_{p} (r) & Equation 7 \end{matrix}$

where g_p(r) is a p-th radial basis function, r is a radius position, and α_c_i_c_o_p^(l^f^,lⁱ⁾are trainable parameters. In some embodiments, the set of basis functions of Equation 7 can be based on cosines and Chebyshev polynomials. For example, a possible set of basis functions may take the form of Equation 8.

$\begin{matrix} \begin{matrix} g_{1} (r) = 1 + \cos (\frac{π r}{r_{c}}) \\ g_{p} (r) = \frac{1}{4} [1 - T_{p - 1} (x)] [1 + \cos (\frac{π r}{r_{c}})], p \geq 2 \\ x = 1 - 2 (\frac{\exp (- λ (r / r_{c} - 1) - 1)}{\exp (- λ) - 1}) \end{matrix} & Equation 8 \end{matrix}$

where T_pis a p-th Chebyshev polynomial of a first kind and A is a parameter. In alternative embodiments, the radial weight function, R_c_i_c_o^(lⁱ^,l^f⁾(r) may be any suitable function with adjustable parameters. For example, the radial weight function may be a fully-connected neural network.

To further promote rotational equivariance, the output of the convolution may be a representation in a three-dimensional rotation group. The three-dimensional rotation group (e.g. SO(3)) can be the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition. Further, Clebsch-Gordon coefficients may be used to represent the output from the convolution in SO(3). For example, the Clebsch-Gordon coefficients may be expressed by C_(l_f_,m_f_)(l_i_,m_i₎^(l^o^,m^o⁾.

A rotationally equivariant form of Equation 6 can incorporate the expanded radial weight function of Equation 7, the spherical harmonics component, the decomposition of Equation 8, and the Clebsch-Gordon coefficients. For example, the rotationally equivariant form of Equation 6 may take the form of Equation 9.

$\begin{matrix} O_{a c_{0} m_{0}}^{(l_{0})} = \int_{0}^{2 π} \int_{0}^{π} \int_{0}^{r_{c}} r^{2} \sin θ \cdot C_{(l_{f}, m_{f}) (l_{i}, m_{i})}^{(l_{0}, m_{o})} \cdot R_{c_{i} c_{0}}^{(l_{i}, l_{f})} (r_{a}) \cdot Y_{m_{f}}^{(l_{f})} (θ, ϕ) \cdot I_{c_{i} m_{i}}^{(l_{i})} (r, θ, ϕ) drd θ d ϕ & Equation 9 \end{matrix}$

where O_ac_o_m_o^(l^o⁾denotes an output at node a for an output channel c₀for a rotation order l₀and a representation index m₀. Repeated indices of Equation 9 imply summation. In some embodiments, the number of the input channels, c_i, and the number of the output channels, c_o, of Equation 9 may differ. Equation 9 represents an analytical expression of the rotationally equivariant convolution operation. In alternative embodiments, other suitable analytical expression of the rotationally equivariant convolution operation may be used.

Numerical solution techniques can be used to evaluate the analytical expression of Equation 9. For example, Guassian quadrature can be used to express Equation 9 as an approximate summation. The approximate summation can represent a substantially rotationally equivariant form of the convolution of the CNN, which may take the form of Equation 10.

$\begin{matrix} O_{a c_{0} m_{0}}^{(l_{0})} \approx \frac{π^{2} r_{c}}{4} \sum_{α = 1}^{n_{gp, α}} \sum_{β = 1}^{n_{gp, β}} \sum_{γ = 1}^{n_{gp, γ}} w_{α} w_{β} w_{γ} \cdot {\tilde{r}}_{a, γ}^{2} \sin {\tilde{θ}}_{β} \cdot C_{(l_{f}, m_{f}) (l_{i}, m_{i})}^{(l_{0}, m_{o})} \cdot C_{c_{i} c_{0}}^{(l_{i}, l_{f})} ({\tilde{r}}_{a, γ}) \cdot Y_{m_{f}}^{(l_{f})} ({\tilde{θ}}_{β}, {\tilde{ϕ}}_{a}) \cdot I_{c_{i} m_{i}}^{(l_{i})} ({\tilde{r}}_{a, γ}, {\tilde{θ}}_{β}, {\tilde{ϕ}}_{α}) & Equation 10 \end{matrix}$

where n_gp,γ, n_gp,β, and n_gp,γare numbers of Gauss points with spherical coordinates {tilde over (r)}_a,γ, {tilde over (θ)}_β, and {tilde over (ϕ)}_α, the spherical coordinates can be represented by:

${\tilde{r}}_{a, γ} = \frac{r_{c}}{2} (ξ_{γ} + 1), {\tilde{θ}}_{β} = \frac{π}{2} (ξ_{β} + 1),$

and {tilde over (ϕ)}_α=ϕ(ξ_α+1); w_α, w_β, and w_γ are Gauss weights; and I_c_i_m_i^(lⁱ⁾({tilde over (r)}_a,γ, {tilde over (θ)}_β, {tilde over (ϕ)}_α) is an input field to a particular convolutional neuron. Variables ξ_α, ξ_β, and ξ_γ can represent positions of the Gauss points within an interval of [−1, 1]. Further, a tilde depicted over variables indicates evaluation in a physical domain. The absence of a tile over variables implies evaluation in a parent domain. As discussed above, the stencil tensor, S, may be a collection of points based on Gaussian quadrature. For example, Gauss points n_gp,γ, n_gp,β, and n_gp,γ may be collected together to define an n_gp,γ×n_gp,β×n_gp,γdimensional stencil tensor S_αβγ.

The finite element method can use elemental shape functions. The elemental shape functions can be defined in a parent domain of each element, where the element can have a regular geometry. Alternatively, the elemental shape functions can be defined in the physical domain of a problem to be solved, where each element may have an irregular geometry. A mapping function can translate between the parent domain and the physical domain. For example, an isoparametric mapping function can have the form of Equation 11.

$\begin{matrix} x_{i} (ξ_{i}) = \sum_{e} \sum_{n \in e} X_{i}^{n} N^{ne} (ξ_{i}) & Equation 11 \end{matrix}$

where X_iⁿis the i-th coordinate of node n in the physical domain and N^ne(ξ_i) is an elemental shape function for node n and element e evaluated at the parent domain coordinate ξ_i. The relationship between the parent domain and the physical domain can therefore be represented by Ñ^ne(x_i)=N^ne(ξ_i(x_i)). An inverse isoparametric function can be used to translate from the physical domain to the parent domain. For example, ξ_i(x_i) can represent the inverse isoparamteric mapping function (i.e., an inverse parametric map).

The expression I_c_i_m_i^(lⁱ⁾({tilde over (r)}_a,γ, {tilde over (θ)}_β, {tilde over (ϕ)}_α) of Equation 10 can be input to a particular convolution neuron, for example using Equation 4, and can be further evaluated using the inverse isoparametric mapping function by way of the pre-computed stencil convolution tensor. The method is similar to that described above in relation to Equation 3; however, one difference being that stencil points are chosen based on Gauss quadrature instead of arbitrarily. Additionally, Equation 10 can therefore be implemented to enable a rotationally equivariant finite-element-based physics-informed neural network.

According to various embodiments, training the surrogate model may comprise, at 113, applying and/or training a surrogate model to a plurality of finite elements of the one or more finite elements.

According to various embodiments, training the surrogate model may comprise, at 114, constructing a training case. According to various embodiments, each training case may correspond to one or more ingredients necessary to establish a finite element problem. According to various embodiments, the one or more ingredients may comprise a finite element mesh of a chosen geometry; one or more constitutive models to describe one or more substance behaviors in the one or more constitutive models; a set of boundary conditions; a set of source terms; and/or a set of body forces, among other suitable ingredients.

A backward computation process can be used during training. Further, training the surrogate model can occur at certain intervals. Each interval can be an epoch. For example, backward propagation of errors, also known as backpropagation, can be used to update variables that control learning during neural network. The variables that control learning can be hyperparameters. During backpropagation, neural network training may further utilize the tangent stiffness matrix, K_T. For linear problems, the tangent stiffness matrix, K_T, may be computed once during a training run. For non-linear problems, the tangent stiffness matrix, K_T, may be computed at each of the training epochs. In alternative embodiments, the tangent stiffness matrix, K_T, may be recomputed for a suitable number of the epochs.

During backpropagation, derivatives of the loss function (e.g., Equation 1) may be computed with respect to the plurality of solution vectors, d, for the neural network. For the finite element PINN (FE-PINN), the derivatives may have the form of Equation 12:

$\begin{matrix} \frac{\partial ℒ}{\partial d} = K_{T} \frac{R}{ R } & Equation 12 \end{matrix}$

where R is a residual vector, which may have the form: R=P(d)−F; and K_Tis the tangent stiffness matrix, which may have the form:

$K_{T} = \frac{\partial R}{\partial d} .$

For linear problems, the stiffness matrix, K, may be the tangent stiffness matrix, K_T. (i.e., K=K_T).

According to various embodiments, predictions for new cases which the finite element PINN (FE-PINN) was not trained on may be evaluated to determine a predictive accuracy of the FE-PINN. According to various embodiments, when performance is not satisfactory (e.g., when prediction errors are too large), then the training set may be augmented and training repeated. According to various embodiments, the FE-PINN code may be configured to load training cases into memory and then may be configured to train the neural network until a training error is satisfactory.

This approach does not require generation of training data, decreasing computing power and memory storage requirements to solve for physical problems, thus improving upon existing technologies. This approach is further configured to efficiently leverage existing computational and modeling infrastructure when training neural-network-based surrogate models. According to various embodiments, using this approach, a major advantage of using FEM to train a PINN is that boundary condition enforcement is automatically established using basic FEM approaches. For essential boundary conditions, the associated nodal degrees of freedom may be removed from the system of equations and replaced by a set of conjugate forces. For natural boundary conditions, additional forces may be included in the force vector. According to various embodiments, no additional approaches need to be developed, and no additional loss terms are necessary, as in the case of weak boundary condition enforcement.

According to various embodiments, this approach may be configured to enable training a PINN which allows for variable geometry and boundary conditions, providing a generalized PINN which, after proper training, can provide a solution for a problem of interest. Problem geometry may be specified by providing all nodal coordinates as input to a network. Similarly, boundary conditions may be specified by inputting nodal values as an input.

For example, using this approach, loading a wedged block geometry with varying wedge angle loading by a vertical displacement may be simulated. FIGS. 3A-3C show example results for such a simulation for a range of wedge angles, demonstrating that, by training on three wedged block geometries, reasonable predictions may be made. Further, FIGS. 3A-3C indicate that the FE-PINN may be able to learn multiple geometries at the same time without compromising training convergence.

As shown in FIGS. 3A-3C, loss convergence during training (FIG. 3A), solution fields from FEM and a trained FE-PINN (FIG. 3B), and an FE-PINN error for wedged block geometries using FE-PINNs trained on different combinations of three geometries (FIG. 3C) are illustratively depicted.

Referring now to FIG. 4, an illustration of an example architecture for a computing device 400 is provided.

The hardware architecture of FIG. 4 represents one example implementation of a representative computing device configured to perform one or more methods and means for training PINN surrogate models to model physical problems, as described herein. As such, the computing device 400 of FIG. 4 implements at least a portion of the method(s) described herein (for example, method 100 of FIG. 1).

Some or all components of the computing device 400 can be implemented as hardware, software and/or a combination of hardware and software. The hardware includes, but is not limited to, one or more electronic circuits. The electronic circuits can comprise, but are not limited to, passive components (e.g., resistors and capacitors) and/or active components (e.g., amplifiers and/or microprocessors). The passive and/or active components may be adapted to, arranged to and/or programmed to perform one or more of the methodologies, procedures, or functions described herein.

As shown in FIG. 4, the computing device 400 may comprise a user interface 402, a Central Processing Unit (“CPU”) 406, a system bus 410, a memory 412 connected to and accessible by other portions of computing device 400 through system bus 410, and hardware entities 414 connected to system bus 410. The user interface may comprise one or more input devices and output devices, which facilitate user-software interactions for controlling operations of the computing device 400. The input devices include, but are not limited to, a physical and/or touch keyboard 450. The input devices can be connected to the computing device 400 via a wired or wireless connection (e.g., a Bluetooth® connection). The output devices may comprise, but are not limited to, a speaker 452, a display 454, and/or light emitting diodes 456.

At least some of the hardware entities 414 may be configured to perform actions involving access to and use of memory 412, which can be a Random Access Memory (RAM), a disk driver and/or a Compact Disc Read Only Memory (CD-ROM), among other suitable memory types. Hardware entities 414 can include a disk drive unit 416 comprising a computer-readable storage medium 418 on which is stored one or more sets of instructions 420 (e.g., programming instructions such as, but not limited to, software code) configured to implement one or more of the methodologies, procedures, or functions described herein. The instructions 420 can also reside, completely or at least partially, within the memory 412 and/or within the CPU 406 during execution thereof by the computing device 400.

The memory 412 and the CPU 406 also can constitute machine-readable media. The term “machine-readable media”, as used here, refers to a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions 420. The term “machine-readable media”, as used here, also refers to any medium that is capable of storing, encoding or carrying a set of instructions 620 for execution by the computing device 400 and that cause the computing device 600 to perform any one or more of the methodologies of the present disclosure. According to various embodiments, one or more computer applications 424 may be stored on the memory 412.

The features and functions described above, as well as alternatives, may be combined into many other different systems or applications. Various alternatives, modifications, variations or improvements may be made by those skilled in the art, each of which is also intended to be encompassed by the disclosed embodiments.

Therefore, the foregoing is considered as illustrative only of the principles of the invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation shown and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.

	Number	Date	Country
Parent	PCT/US2024/027843	May 2024	WO
Child	19072424		US

SYSTEMS AND METHODS FOR TRAINING PHYSICS-INFORMED NEURAL NETWORK SURROGATE MODELS TO MODEL PHYSICAL PROBLEMS USING THE FINITE ELEMENT METHOD

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

Provisional Applications (1)

Continuation in Parts (1)