The present disclosure relates to systems and methods for training physics-informed neural network (PINN) surrogate models to model physical problems.
Workhorse convolution operators used in convolutional neural networks assume a uniform, pixelated grid (e.g., an image). Such a grid provides a poor representation of irregular domains, leading to non-conformal representations at the boundaries. Finite element meshes, however, are conformal to boundaries, but elements have irregular shapes.
If the convolution operator is not rotationally equivariant, then any arbitrary rotation of the input can yield inaccurate results. This is an undesirable consequence that limits potential application of the technology. The rotational invariance of the laws of physics demands that solutions to physical problems satisfy rotational equivariance.
The present disclosure describes systems and methods related to the above.
In one aspect of the present disclosure, a method for training physics-informed neural network (PINN) surrogate models to model physical problems is provided. The method may include coupling, using a processor, one or more convolutional neural networks (CNNs) with the finite element method (FEM). The coupling may include calculating, for a finite element mesh comprising a plurality of finite elements, an internal force vector, P, a force vector, F, and a tangent stiffness matrix, KT, using the FEM. Each finite element may include one or more finite element nodes. The coupling may further include applying a CNN to the finite element mesh to obtain a solution to a physical problem. The CNN may be trained on a loss function, and the loss function may incorporate the internal force vector, P, and the force vector, F.
In another aspect of the present disclosure, the plurality of finite elements may represent a spatiotemporal variation of one or more physical quantities into which the physical problem can be divided.
In another aspect of the present disclosure, the one or more physical quantities may include one or more of the following: one or more quantities pertaining to solid mechanics, fluid mechanics, electromagnetic radiation, and heat transfer.
In another aspect of the present disclosure, the loss function may include:
where d denotes a solution vector of the physical problem.
In another aspect of the present disclosure, when the physical problem includes a linear physical problem, the internal force vector, P, may include: P(d)=Kd, where K is a stiffness matrix.
In another aspect of the present disclosure, when the physical problem includes a non-linear physical problem, the applying the CNN to each finite element node may include iteratively applying a CNN for each of a plurality of solution vectors, d.
In another aspect of the present disclosure, backpropagation may be used to update hyperparameters by computing derivatives of the loss function with respect to a plurality of solution vectors, d.
In another aspect of the present disclosure, the derivatives of the loss function with respect to the plurality of solution vectors may include:
where:
R is a residual vector including: R=P(d)−F; and
KT is a tangent stiffness matrix including:
In another aspect of the present disclosure, the applying the CNN may include applying a convolutional operator, using a stencil tensor, S, to perform one or more convolutions.
In another aspect of the present disclosure, the stencil tensor, S, may be a collection of points defined based on Gaussian quadrature, and one or more convolutions of the CNN may be enabled by placing the stencil tensor over top of a finite element node, of one or more finite element nodes, and then evaluating a field at each of one or more stencil points using an inverse isoparametric map.
In another aspect of the present disclosure, convolution operator is substantially rotationally equivariant
In another aspect of the present disclosure, the convolution operator may include:
where:
and {tilde over (ϕ)}α=ϕ(ξα+1);
In another aspect of the present disclosure, the radial weight function may be expanded in a set of basis functions, the basis functions may include:
where:
In another aspect of the present disclosure, the set of basis functions may include cosines and Chebyshev polynomials, the set of basis functions may include:
where:
In another aspect of the present disclosure, the input field may be specified using a finite element approximation, the input field may include:
where:
In another aspect of the present disclosure, the convolution operator may be a numerical approximation of an analytical expression, the analytical expression may include:
where ra is a radial distance from the node a and includes: ra=∥x−xa∥.
In another aspect of the present disclosure, an isoparametric mapping function may include:
where:
In another aspect of the present disclosure, the calculating, for each finite element, the internal force vector, P, the force vector, F, and the tangent stiffness matrix, KT, using the FEM, may include training a surrogate model to a plurality of finite elements of the one or more finite elements.
In another aspect of the present disclosure, the calculating, for the finite element mesh, the internal force vector, P, the force vector, F, and the tangent stiffness matrix, KT, using the FEM, may include constructing a training case, and each training case may correspond to one or more ingredients necessary to establish a finite element problem.
In another aspect of the present disclosure, the one or more ingredients may include one or more of the following: a finite element mesh of a chosen geometry; one or more constitutive models to describe one or more substance behaviors in the one or more constitutive models; a set of boundary conditions; a set of source terms; and a set of body forces.
In another aspect of the present disclosure, a system for training PINN surrogate models to model physical problems is provided. The system may include a computing device, further including a processor and a memory configured to store programming instructions. The programming instructions, when executed by the processor, may be configured to cause the processor to couple one or more CNNs with the FEM. The coupling may include calculating, for a finite element mesh comprising a plurality of finite elements, an internal force vector, P, a force vector, F, and a tangent stiffness matrix, KT, using the FEM. Each finite element may include one or more finite element nodes. The coupling may further comprise applying a CNN to the finite element mesh, to obtain a solution to a physical problem. The CNN may be trained on a loss function, and the loss function may incorporate the internal force vector, P, and the force vector, F.
The following drawings are illustrative of particular embodiments of the present disclosure and therefore do not limit the scope of the present disclosure. Embodiments of the present disclosure will hereinafter be described in conjunction with the appended drawings, wherein like numerals denote like elements.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. These terms are merely intended to distinguish one component from another component, and the terms do not limit the nature, sequence or order of the constituent components. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Throughout the specification, unless explicitly described to the contrary, the word “comprise” and variations such as “comprises” or “comprising” will be understood to imply the inclusion of stated elements but not the exclusion of any other elements. In addition, the terms “unit”, “-er”, “-or”, and “module” described in the specification mean units for processing at least one function and operation and can be implemented by hardware components or software components and combinations thereof.
In this document, when terms such as “first” and “second” are used to modify a noun, such use is simply intended to distinguish one item from another and is not intended to require a sequential order unless specifically stated. In addition, terms of relative position such as “vertical” and “horizontal”, or “front” and “rear”, when used, are intended to be relative to each other and need not be absolute and only refer to one possible position of the device associated with those terms depending on the device's orientation.
An “electronic device” or a “computing device” refers to a device that includes a processor and memory. Each device may have its own processor and/or memory, or the processor and/or memory may be shared with other devices as in a virtual machine or container arrangement. The memory may contain or receive programming instructions that, when executed by the processor, cause the electronic device to perform one or more operations according to the programming instructions.
The terms “memory,” “memory device,” “computer-readable storage medium,” “data store,” “data storage facility” and the like each refer to a non-transitory device on which computer-readable data, programming instructions or both are stored. Except where specifically stated otherwise, the terms “memory,” “memory device,” “computer-readable storage medium,” “data store,” “data storage facility” and the like are intended to include single device embodiments, embodiments in which multiple memory devices together or collectively store a set of data or instructions, as well as individual sectors within such devices.
The terms “processor” and “processing device” refer to a hardware component of an electronic device that is configured to execute programming instructions. Except where specifically stated otherwise, the singular term “processor” or “processing device” is intended to include both single-processing device embodiments and embodiments in which multiple processing devices together or collectively perform a process.
The term “module” refers to a set of computer-readable programming instructions, as executed by a processor, which cause the processor to perform a specified function.
Although exemplary embodiment is described as using a plurality of units to perform the exemplary process, it is understood that the exemplary processes may also be performed by one or plurality of modules. Additionally, it is understood that the term controller/control unit refers to a hardware device that includes a memory and a processor and is specifically programmed to execute the processes described herein. The memory is configured to store the modules, and the processor is specifically configured to execute said modules to perform one or more processes which are described further below.
Further, the control logic of the present disclosure may be embodied as non-transitory computer readable media on a computer readable medium containing executable programming instructions executed by a processor, controller, or the like. Examples of computer readable media include, but are not limited to, ROM, RAM, compact disc (CD)-ROMs, magnetic tapes, floppy disks, flash drives, smart cards and optical data storage devices. The computer readable medium can also be distributed in network-coupled computer systems so that the computer readable media may be stored and executed in a distributed fashion, such as, e.g., by a telematics server or a Controller Area Network (CAN).
Unless specifically stated or obvious from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example, within 2 standard deviations of the mean. About can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value.
With respect to the above description then, it is to be realized that the optimum dimensional relationships for the parts of the invention, to include variations in size, materials, shape, form, function and manner of operation, assembly and use, are deemed readily apparent and obvious to one skilled in the art, and all equivalent relationships to those illustrated in the drawings and described in the specification are intended to be encompassed by the present invention.
The inventive concepts are described with reference to the attached figures, wherein like reference numerals represent like parts and assemblies throughout the several views. Several aspects of the inventive concepts are described below with reference to example applications for illustration. It should be understood that numerous specific details, relationships, and methods are set forth to provide a full understanding of the inventive concepts. One having ordinary skill in the relevant art, however, will readily recognize that the inventive concepts can be practiced without one or more of the specific details or with other methods. In other instances, well-known structures or operation are not shown in detail to avoid obscuring the inventive concepts.
Hereinafter, systems and methods for training physics-informed neural network (PINN) surrogate models to model physical problems, according to embodiments of the present disclosure, will be described with reference tox c the accompanying drawings.
Referring to
At 105, one or more convolutional neural networks (CNNs) may be coupled with a finite element method (FEM). According to various embodiments, the coupling may be performed using a computing device (e.g., computing device 400 as shown in
According to various embodiments, the plurality of finite elements may represent a spatiotemporal variation of one or more physical quantities into which the physical problem can be divided. According to various embodiments, the one or more physical quantities may comprise one or more of the following: one or more quantities pertaining to solid mechanics, fluid mechanics, electromagnetic radiation, and heat transfer. According to various embodiments, d may denote a nodal solution vector. According to various embodiments, d may be a nodal solution vector for whatever field is of interest (e.g., temperature, displacement, etc.). This leads to a “physics-informed” neural network since the network is configured to “learn” one or more governing equations rather than “learning” one or more chosen solutions to the equations. This approach may be implemented for industries including, but not limited to, automotive, aviation, aerospace, defense, energy, oil & gas, and structural engineering industries.
The coupling may further comprise, at 115, applying a CNN to a finite element mesh to obtain a solution to a physical problem. According to various embodiments, the CNN may be trained on a loss function, and the loss function may incorporate the internal force vector, P, and the force vector, F. According to various embodiments, the loss function takes the form of Equation 1, where d denotes a solution vector of the physical problem.
According to various embodiments, the physical problem may comprise a linear physical problem, a non-linear physical problem, and/or a combination of one or more linear physical problems and/or non-linear physical problems. For example, the physical problem may comprise a linear physical problem. According to various embodiments, when the physical problem comprises a linear physical problem, the internal force vector, P, may be computed according to Equation 2.
where K is a stiffness matrix.
According to various embodiments, when the physical problem comprises a non-linear physical problem, the applying the CNN to a finite element node may comprise iteratively applying a CNN for each of a plurality of solution vectors, d.
According to various embodiments, the applying the CNN may comprise, at 117, applying a convolutional operator, using a stencil tensor, S, (e.g., a non-uniform grid that is yet to be defined) to perform one or more convolutions. The stencil tensor, S, may be a regular N×N grid with spacing, Δ, as shown, e.g., in
One or more convolutions of the CNN may be enabled by placing the stencil tensor, S, over top of a finite element node, of one or more finite element nodes, and then evaluating a field at each of the one or more stencil points, may be evaluated using an inverse isoparametric map. According to various embodiments, the convolution operator may take the form of Equation 3.
where Oi is an output of a convolution for node i, xi is a position vector of node i, Wm,n is a weight tensor for the convolution for node i, Pm,n is an additional parameter tensor used in convolution for node i, I(x) is an evaluation of the input field at position x, and α=(N−1)/2.
In alternate embodiments, one or more convolutions of the CNN may be intrinsically equivariant with respect to rotations in three-dimensional space. For example, an input field to a particular convolutional neuron may be represented by Icm(l)(x), where l is a rotation order, c is an input channel index, and m is a representation index. The representation index can span an inclusive range of values from −l to l. Values for the rotation order and the representation index can be assigned depending on a field being characterized by the input field. For example, the rotation index and the representation index are both zero if the input field is a scalar field. If, however, the input field is a vector field, then the rotation index is set to unity and the representation index is assigned values of m=−1; m=0; and m=1 to denote three components of a vector. The input field can be specified using a finite element mesh and expressed using a finite element approximation and may therefore take the form of Equation 4.
where dcmj(l) is a nodal value for the field Icm(l) at a node n, Ñne(⋅) is a shape function evaluated in a physical domain for the node n contained in an element e. Summations are carried out for all of the elements and all of the nodes contained in each element.
A convolution of the CNN can be expressed as an integral and may take the form of Equation 5.
where Oa is an output at node a, xa is a position of node a, Na is a neighborhood of node a used for the convolution, and W(⋅) is a convolutional weight function. To support rotational equivariance, Equation 5 can be limited to a spherical neighborhood of node a with a cut-off radius of rc. Equation 5 can further be expressed in spherical coordinates, which may take the form of Equation 6.
where a radial distance from node a can be expressed as ra=∥x−xa∥.
The convolutional weight function W(⋅) of Equation 6 can be constructed to be rotationally equivariant. For example, a tensor field networks approach can be used decompose the convolutional weight function W(⋅). Further, the convolutional weight function W(⋅) can be decomposed a product of a radial weight function and a spherical harmonics component. In some embodiments, the radial weight function may be expressed as Rc
where gp(r) is a p-th radial basis function, r is a radius position, and αc
where Tp is a p-th Chebyshev polynomial of a first kind and A is a parameter. In alternative embodiments, the radial weight function, Rc
To further promote rotational equivariance, the output of the convolution may be a representation in a three-dimensional rotation group. The three-dimensional rotation group (e.g. SO(3)) can be the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition. Further, Clebsch-Gordon coefficients may be used to represent the output from the convolution in SO(3). For example, the Clebsch-Gordon coefficients may be expressed by C(l
A rotationally equivariant form of Equation 6 can incorporate the expanded radial weight function of Equation 7, the spherical harmonics component, the decomposition of Equation 8, and the Clebsch-Gordon coefficients. For example, the rotationally equivariant form of Equation 6 may take the form of Equation 9.
where Oac
Numerical solution techniques can be used to evaluate the analytical expression of Equation 9. For example, Guassian quadrature can be used to express Equation 9 as an approximate summation. The approximate summation can represent a substantially rotationally equivariant form of the convolution of the CNN, which may take the form of Equation 10.
where ngp,γ, ngp,β, and ngp,γ are numbers of Gauss points with spherical coordinates {tilde over (r)}a,γ, {tilde over (θ)}β, and {tilde over (ϕ)}α, the spherical coordinates can be represented by:
and {tilde over (ϕ)}α=ϕ(ξα+1); wα, wβ, and wγ are Gauss weights; and Ic
The finite element method can use elemental shape functions. The elemental shape functions can be defined in a parent domain of each element, where the element can have a regular geometry. Alternatively, the elemental shape functions can be defined in the physical domain of a problem to be solved, where each element may have an irregular geometry. A mapping function can translate between the parent domain and the physical domain. For example, an isoparametric mapping function can have the form of Equation 11.
where Xin is the i-th coordinate of node n in the physical domain and Nne(ξi) is an elemental shape function for node n and element e evaluated at the parent domain coordinate ξi. The relationship between the parent domain and the physical domain can therefore be represented by Ñne(xi)=Nne(ξi(xi)). An inverse isoparametric function can be used to translate from the physical domain to the parent domain. For example, ξi(xi) can represent the inverse isoparamteric mapping function (i.e., an inverse parametric map).
The expression Ic
According to various embodiments, training the surrogate model may comprise, at 113, applying and/or training a surrogate model to a plurality of finite elements of the one or more finite elements.
According to various embodiments, training the surrogate model may comprise, at 114, constructing a training case. According to various embodiments, each training case may correspond to one or more ingredients necessary to establish a finite element problem. According to various embodiments, the one or more ingredients may comprise a finite element mesh of a chosen geometry; one or more constitutive models to describe one or more substance behaviors in the one or more constitutive models; a set of boundary conditions; a set of source terms; and/or a set of body forces, among other suitable ingredients.
A backward computation process can be used during training. Further, training the surrogate model can occur at certain intervals. Each interval can be an epoch. For example, backward propagation of errors, also known as backpropagation, can be used to update variables that control learning during neural network. The variables that control learning can be hyperparameters. During backpropagation, neural network training may further utilize the tangent stiffness matrix, KT. For linear problems, the tangent stiffness matrix, KT, may be computed once during a training run. For non-linear problems, the tangent stiffness matrix, KT, may be computed at each of the training epochs. In alternative embodiments, the tangent stiffness matrix, KT, may be recomputed for a suitable number of the epochs.
During backpropagation, derivatives of the loss function (e.g., Equation 1) may be computed with respect to the plurality of solution vectors, d, for the neural network. For the finite element PINN (FE-PINN), the derivatives may have the form of Equation 12:
where R is a residual vector, which may have the form: R=P(d)−F; and KT is the tangent stiffness matrix, which may have the form:
For linear problems, the stiffness matrix, K, may be the tangent stiffness matrix, KT. (i.e., K=KT).
According to various embodiments, predictions for new cases which the finite element PINN (FE-PINN) was not trained on may be evaluated to determine a predictive accuracy of the FE-PINN. According to various embodiments, when performance is not satisfactory (e.g., when prediction errors are too large), then the training set may be augmented and training repeated. According to various embodiments, the FE-PINN code may be configured to load training cases into memory and then may be configured to train the neural network until a training error is satisfactory.
This approach does not require generation of training data, decreasing computing power and memory storage requirements to solve for physical problems, thus improving upon existing technologies. This approach is further configured to efficiently leverage existing computational and modeling infrastructure when training neural-network-based surrogate models. According to various embodiments, using this approach, a major advantage of using FEM to train a PINN is that boundary condition enforcement is automatically established using basic FEM approaches. For essential boundary conditions, the associated nodal degrees of freedom may be removed from the system of equations and replaced by a set of conjugate forces. For natural boundary conditions, additional forces may be included in the force vector. According to various embodiments, no additional approaches need to be developed, and no additional loss terms are necessary, as in the case of weak boundary condition enforcement.
According to various embodiments, this approach may be configured to enable training a PINN which allows for variable geometry and boundary conditions, providing a generalized PINN which, after proper training, can provide a solution for a problem of interest. Problem geometry may be specified by providing all nodal coordinates as input to a network. Similarly, boundary conditions may be specified by inputting nodal values as an input.
For example, using this approach, loading a wedged block geometry with varying wedge angle loading by a vertical displacement may be simulated.
As shown in
Referring now to
The hardware architecture of
Some or all components of the computing device 400 can be implemented as hardware, software and/or a combination of hardware and software. The hardware includes, but is not limited to, one or more electronic circuits. The electronic circuits can comprise, but are not limited to, passive components (e.g., resistors and capacitors) and/or active components (e.g., amplifiers and/or microprocessors). The passive and/or active components may be adapted to, arranged to and/or programmed to perform one or more of the methodologies, procedures, or functions described herein.
As shown in
At least some of the hardware entities 414 may be configured to perform actions involving access to and use of memory 412, which can be a Random Access Memory (RAM), a disk driver and/or a Compact Disc Read Only Memory (CD-ROM), among other suitable memory types. Hardware entities 414 can include a disk drive unit 416 comprising a computer-readable storage medium 418 on which is stored one or more sets of instructions 420 (e.g., programming instructions such as, but not limited to, software code) configured to implement one or more of the methodologies, procedures, or functions described herein. The instructions 420 can also reside, completely or at least partially, within the memory 412 and/or within the CPU 406 during execution thereof by the computing device 400.
The memory 412 and the CPU 406 also can constitute machine-readable media. The term “machine-readable media”, as used here, refers to a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions 420. The term “machine-readable media”, as used here, also refers to any medium that is capable of storing, encoding or carrying a set of instructions 620 for execution by the computing device 400 and that cause the computing device 600 to perform any one or more of the methodologies of the present disclosure. According to various embodiments, one or more computer applications 424 may be stored on the memory 412.
The features and functions described above, as well as alternatives, may be combined into many other different systems or applications. Various alternatives, modifications, variations or improvements may be made by those skilled in the art, each of which is also intended to be encompassed by the disclosed embodiments.
Therefore, the foregoing is considered as illustrative only of the principles of the invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation shown and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.
This application is a continuation-in-part of and claims priority under 35 U.S.C. § 120 to International Application No. PCT/US2024/027843, filed May 3, 2024, and published in the English language, which claims the benefit of U.S. Provisional Application No. 63/500,438, filed May 5, 2023. The entire contents of the foregoing applications are incorporated by reference herein.
This invention was made with government support under grant number 2237039 awarded by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63500438 | May 2023 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2024/027843 | May 2024 | WO |
Child | 19072424 | US |