This invention relates generally to the field of geophysical prospecting and, more particularly, to seismic data processing. Specifically, the invention is a method for faster estimation of a quantity known as Hessian times vector which arises in certain methods for numerical solving of partial differential equations, for example iterative inversion of seismic data to infer elastic properties of a medium, involving forward modeling of synthetic data by solving the wave equation in a model medium.
Full wavefield inversion (FWI) in exploration seismic processing relies on the calculation of the gradient of an objective function with respect to the subsurface model parameters [9]. The gradient of the objective function is used to calculate an update to the model. An objective function E is usually given as an L2 norm as
where p and pb are the measured pressure, i.e. seismic amplitude, and the modeled pressure in the background subsurface model at the receiver location rg for a shot located at rs. In iterative inversion processes, the background medium is typically the medium resulting from the previous inversion cycle. In non-iterative inversion processes or migrations, the background medium is typically derived using conventional seismic processing techniques such as migration velocity analysis. The objective function is integrated over all time t, and the surfaces Sg and Ss that are defined by the spread of the receivers and the shots. We define Kd(r)=K(r)−Kb(r) and ρd(r)=ρ(r)−ρb(r), where K(r) and ρ(r) are the true bulk modulus and density, and Kb(r) and ρb(r) are the bulk modulus and the density of the background model at the subsurface location r. (Bulk modulus is used as an example here, but any of the 21 elastic constants might be used instead.) We also define the difference between the measured and the modeled pressure to be pd(rg,rs;t)=p(rg,rs;t)−pb(rg,rs;t).
The measured pressure p, satisfies the wave equation
where q(t) is the source signature. It can be shown that the gradient of the objective function E with respect to the bulk modulus Kb(r), for example, is given as
where gb is the Green's function in the background medium, and dV is an infinitesimal volume around r [9, 3]. The equations for the gradients of other subsurface medium parameters in general elastic cases can be found in Refs. [10, 6, 1]. One can then perform full wavefield inversion by minimizing the value of the objective function E in an iterative manner, using gradient equations for the medium parameters such as that in Eq. 3.
The convergence rate of full wavefield inversion can be improved when information on the Hessian of the objective function E is employed in the inversion process [7, 5]. The Hessian is a matrix of second partial derivatives of a function. The Hessian (with respect to the physical property bulk modulus) of the objective function E in Eq. 1 is given as
The second term in the right-hand side is the term responsible for multiple scattering, and is often ignored due to difficulty in evaluation [11]. By dropping this second term, one obtains the equation for the Gauss-Newton Hessian,
Once the Hessian matrix is computed, the medium parameter update required for the minimization of E can be obtained by multiplying the inverse of the Hessian matrix and the gradient using the Newton's method [5]:
Direct computation of the inverse of the Hessian matrix, however, often requires prohibitively large memory space in full wavefield inversion, and so the inverse of the Hessian is computed iteratively using the conjugate gradient (CG) method. This iterative scheme is often referred to as Newton-CG method, which may be used on either the full Hessian from equation 4 or the Gauss-Newton Hessian from equation 5. An example of this Newton-CG method can be found in Algorithm 7.1 of Ref [5], which is reproduced below. For notational convenience, we use H for the Hessian matrix, and ∇E for the gradient vector.
The algorithm above requires repeated evaluation of the Hessian times vector H{tilde over (K)}j in a loop, where {tilde over (K)}j for j=0 is set as the negative of the gradient. The value of H{tilde over (K)}j is then used to update zj, which eventually becomes the inverse of the Hessian times gradient Kd in 3.a.ii.1 or 3.e.i.
The Newton-CG method requires evaluation of the Hessian matrix times (i.e. multiplied by) a medium perturbation vector {tilde over (K)}, which may be referred to hereafter in this document as the (Gauss-Newton) “Hessian times vector,”
where the Hessian has been approximated as the Gauss-Newton Hessian. For example, in the estimation of parameters of a physical property model by iterative inversion of geophysical data, the medium perturbation vector will typically initially be the gradient (in model parameter space) of the objective function. As the iterations progress, this vector will gradually diverge from being the gradient, and lose physical meaning as such. Equation (6) may be evaluated by performing two forward-wave-propagation and two reverse-wave-propagation computations. The present invention is a method that allows this equation to be evaluated by performing only one forward-wave-propagation (one forward solve of a partial differential equation, such as the wave equation) and one reverse-wave-propagation computation (i.e., one gradient computation of the objective function), resulting in valuable saving of computer time.
In one embodiment, the invention is a method for determining a discrete physical properties model of a subsurface region, which may be referred to as the “model,” by iteratively inverting measured geophysical data acquired from the subsurface region, comprising using a Hessian matrix of an objective function, then times a vector, called “Hessian times vector,” to determine an update for the model, wherein the Hessian times vector is approximated, using a computer, with a single forward-wave simulated propagation and a single computation of gradient of the objective function, in a modified subsurface model.
The present invention and its advantages will be better understood by referring to the following detailed description and the attached drawings in which:
The invention will be described in connection with example embodiments. However, to the extent that the following detailed description is specific to a particular embodiment or a particular use of the invention, this is intended to be illustrative only, and is not to be construed as limiting the scope of the invention. On the contrary, it is intended to cover all alternatives, modifications and equivalents that may be included within the scope of the invention, as defined by the appended claims. Persons skilled in the technical field will readily recognize that all practical applications of the present inventive method are performed using a computer programmed according to the disclosure herein.
Let
It may be noted that Eq. 7 is the equation for the Born scattered field {tilde over (p)}d, where {tilde over (K)} behaves as a scatterer distribution in the background medium with bulk modulus Kb. The Gauss-Newton Hessian times vector can then be seen as
Therefore, computation of the Gauss-Newton Hessian times vector is equivalent to gradient computation using artificial residual {tilde over (p)}d. As mentioned above, the artificial residual {tilde over (p)}d is computed using the Born approximation in the background model with scatterers {tilde over (K)}. If the wave equation is linear so that the Born approximation is exact, and if {tilde over (K)}=K−Kb=Kd, then the artificial residual {tilde over (p)}d=pd. Therefore, the negative of Hessian times vector above should be equal to the gradient in this case.
It may be noted that the Born scattered field {tilde over (p)}d in Eq. 7 can be approximated as
where {tilde over (p)}F is the solution to the wave equation
If ε in Eq. 10 is sufficiently small, {tilde over (p)}F should be approximately equal to the summation of pb and the Born scattered field due to ò{tilde over (K)}. Since the amplitude of Born scattering is linear with respect to the medium property perturbation of the scatterers, {tilde over (p)}d,F in Eq. 9 should be approximate equal to {tilde over (p)}d in Eq. 7.
The Hessian times vector operation can be approximated as
The equation above shows that one can compute the Hessian times vector by (1) creating a new subsurface model Kb+ε{tilde over (K)}, (2) computing the received field {tilde over (p)}F in the new subsurface model, (3) computing the gradient in the background model Kb by treating {tilde over (p)}F to be a field measurement, and (4) scaling the gradient by 1/ε. Since these operations uses only forward wave propagation and gradient computation, this method eliminates the need to implement the Hessian times vector operators, i.e. eliminates the need to implement a computer program that computes the Hessian times vector. Instead, one can reuse already existing forward propagation and gradient computation routines to obtain the Hessian times vector. Of course, this advantage disappears if one has already implemented the Hessian times vector operation. Furthermore, the operation disclosed herein requires roughly 3.5 wave propagations compared to 4 wave propagations in Eq. 6, and so is computationally more efficient.
One can further improve the convergence to minima by increasing the value of ε in Eq. 10. When the value of ε is increased, the scattered field {tilde over (p)}d,F in Eq. 9 departs from the Born scattered field, and it includes nonlinear effects such as multiple scattering, travel time change, and nonlinear amplitude scaling with respect to {tilde over (K)}. One special case of a large ε value is when ε=1, and {tilde over (K)}=Kd. In this case, the wave equation 10 is the wave equation for the true subsurface medium, and so {tilde over (p)}F=p. The Hessian times vector in Eq. 11 is then exactly equal to the gradient, which is never achieved in Eq. 6 even when {tilde over (K)}=Kd due to the neglected higher order terms in Eq. 6.
For practical purposes, we may define c to be
where ∥Kb(r)∥max and ∥{tilde over (K)}(r)∥max are the maximum absolute values of Kb and {tilde over (K)} in space, respectively. The parameter α then represents approximate fractional change of ò{tilde over (K)}(r) in Eq. 10 with respect to Kb(r). Thus, α represents the ratio of the magnitude of the vector to be added to the model to the magnitude of the model vector, where the model's medium parameters are the components of the model vector in model space. One can then choose the value of α to control the behavior of the Hessian times vector in Eq. 11. When the value of α is fairly small, on the order of 0.01, the Hessian operator in Eq. 11 mimics the behavior of Gauss-Newton Hessian in Eq. 5. When the value of α is relatively large and reaches on the order of 0.1, on the other hand, the Hessain operator in Eq. 11 starts to include the effect of nonlinearity and multiple scattering, and so behaves similar to that in Eq. 4.
While the equation above was derived specifically for the bulk modulus K, this method can be applied for the computation of the Hessian times vector of any general elastic parameters such as density ρ or any of the 21 elastic stiffness constants Cij. Let mb(r) and {tilde over (m)}(r) be the background medium properties and the multiplication vector of any of these elastic parameters. Then the Hessian times vector of any of these properties can be computed as
where {tilde over (p)}d now can be computed using the same method as that in Eqs. 9 and 10. Note this derivation is generally applicable to any type of objective functions such as L2 objective function given in Eq. 1 or cross-correlation objective function in Ref 8. Note also that the method presented here is a special case of PDE (Partial Differential Equation) constrained optimization problems, and so the method is in general applicable to any PDE constrained optimization problems where Hessian times vector needs to be computed. For example, this method can also be applied to problems relating to electro-magnetic wave propagation.
Below is the procedure of the present invention as applied to, for example, determining the model update in a method for physical property parameter estimation by data inversion using the gradient of an objective function, with the steps as shown in the flowchart of
Step 21 Form a multi-component vector whose components are related to the values of a selected physical property at discrete cells in a model of the subsurface according to a current subsurface model 20 of that property (for the present example, the vector would typically be the gradient of the objective function, but in a general application of the inventive method, this would be whatever vector is later to be multiplied by the Hessian);
Step 22 Scale the vector by a small scalar constant (ε), i.e. by multiplying by ε;
Step 23 Add the scaled vector to the current subsurface model;
Step 24 Forward propagate the wavefields in the new model to compute a synthetic seismic dataset;
Step 25 Compute the gradient of the objective function measuring misfit (typically a selected norm of the data residual) between forward modeled data using the current subsurface model and “measured” data, where the synthetic seismic dataset from step 24 is treated as the measured data for purposes of this step;
Step 26 Divide the gradient from step 25 by the scalar constant ε from step 22,
Step 27 Use this scaled gradient as an estimate of the Hessian times vector in optimization routines that require evaluation of that Hessian times vector.
Then, steps 21 to 27 may be repeated iteratively, using a Newton-CG algorithm for example, to obtain the inverse of the Hessian times the gradient of the objective function, which gives the model (medium parameter) update. Here, the objective function measures misfit between the actual measured field data and the data modeled from the current model.
The sum in step 23 is performed, cell-by-cell, in the discrete subsurface property model. For example, for application to model parameter estimation by geophysical data inversion, the “vector” will typically start out being the multi-dimensional gradient of the objective function with respect to each model parameter. Thus, the gradient of the objective function is a vector with as many components as there are cells in the model, times the number of subsurface parameters, such as bulk modulus or density. For each cell, the corresponding component of the vector, after being scaled, is added to the model value, sometimes called parameter, for that cell.
The method of steps 21-27 may be used in the iterative solving of any partial differential equation involving the Hessian times vector operation.
For an illustrative example, we use the Marmousi II model [4] shown in
One can see from
The foregoing application is directed to particular embodiments of the present invention for the purpose of illustrating it. It will be apparent, however, to one skilled in the art, that many modifications and variations to the embodiments described herein are possible. All such modifications and variations are intended to be within the scope of the present invention, as defined in the appended claims.
This application claims the benefit of U.S. Provisional Patent Application 61/564,669, filed Nov. 29, 2011 entitled METHODS FOR APPROXIMATING HESSIAN TIMES VECTOR OPERATION IN FULL WAVEFIELD INVERSION, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61564669 | Nov 2011 | US |