Not applicable.
The disclosed embodiments relate generally to seismic imaging using techniques for determining subsurface velocities from seismic data and, in particular, to a method of determining subsurface velocities via full waveform inversion using a tree-based Bayesian approach which leads to a reduced number of parameters and basis functions with which to describe subsurface velocity (or other seismic properties), thereby reducing the computational cost.
Seismic exploration involves surveying subterranean geological media for hydrocarbon deposits. A survey typically involves deploying seismic sources and seismic sensors at predetermined locations. The sources generate seismic waves, which propagate into the geological medium creating pressure changes and vibrations. Variations in physical properties of the geological medium give rise to changes in certain properties of the seismic waves, such as their direction of propagation and other properties.
Portions of the seismic waves reach the seismic sensors. Some seismic sensors are sensitive to pressure changes (e.g., hydrophones), others to particle motion (e.g., geophones), and industrial surveys may deploy one type of sensor or both. In response to the detected seismic waves, the sensors generate corresponding electrical signals, known as traces, and record them in storage media as seismic data. Seismic data will include a plurality of “shots” (individual instances of the seismic source being activated), each of which are associated with a plurality of traces recorded at the plurality of sensors.
Seismic data is processed to create seismic images that can be interpreted to identify subsurface geologic features including hydrocarbon deposits. This process may include determining the velocities of the subsurface formations in order to perform the imaging. Determining the velocities may be done by a number of methods, such as semblance analysis, tomography, or full waveform inversion. Full waveform inversion (FWI) is a computationally expensive process that requires a huge amount of model parameterization. Some conventional FWI methods assume an optimal parameterization and do not try and sample over a variable number of parameters. None use a tree based probabilistic approach. A similar idea has been used by Hawkins et al. (2017) for airborne electromagnetic inversion, Dettmer et al. (2016) to quantify uncertainty for tsunami sea surface displacement, Hawkins & Sambridge (2015) for 2D ambient noise and 3D teleseismic tomography. However, these works are based on assumptions that are not valid for seismic data.
Improved seismic images from improved subsurface velocities allow better interpretation of the locations of rock and fluid property changes. The ability to define the location of rock and fluid property changes in the subsurface is crucial to our ability to make the most appropriate choices for purchasing materials, operating safely, and successfully completing projects. Project cost is dependent upon accurate prediction of the position of physical boundaries within the Earth. Decisions include, but are not limited to, budgetary planning, obtaining mineral and lease rights, signing well commitments, permitting rig locations, designing well paths and drilling strategy, preventing subsurface integrity issues by planning proper casing and cementation strategies, and selecting and purchasing appropriate completion and production equipment.
There exists a need for more accurate, cost-efficient FWI methods to allow better seismic imaging that will in turn allow better seismic interpretation of potential hydrocarbon reservoirs for hydrocarbon exploration and production.
In accordance with some embodiments, a method of transdimensional seismic full waveform inversion (FWI) using a tree-based Bayesian approach is disclosed. In this method, the observed seismic data inform the model likelihood. A mildly informative prior about subsurface structure also needs to be specified as input. The resulting posterior model distribution of seismic velocity (or other seismic properties) is then sampled using a trans-dimensional or Reversible Jump Markov chain Monte Carlo (RJ-McMC) method. Sampling is carried out in the wavelet transform domain of the seismic properties of interest, using a tree based structure to represent seismic velocity models. Convergence to a stationary distribution of posterior models is rapidly attained, while requiring a limited number of wavelet coefficients to define a sampled model. Better convergence from distant starting models as well as the ability to quantify model uncertainty are thus provided by this method. The subsurface velocities determined via the method of FWI may be used for seismic imaging.
In another aspect of the present invention, to address the aforementioned problems, some embodiments provide a non-transitory computer readable storage medium storing one or more programs. The one or more programs comprise instructions, which when executed by a computer system with one or more processors and memory, cause the computer system to perform any of the methods provided herein.
In yet another aspect of the present invention, to address the aforementioned problems, some embodiments provide a computer system. The computer system includes one or more processors, memory, and one or more programs. The one or more programs are stored in memory and configured to be executed by the one or more processors. The one or more programs include an operating system and instructions that when executed by the one or more processors cause the computer system to perform any of the methods provided herein.
Like reference numerals refer to corresponding parts throughout the drawings.
Described below are methods, systems, and computer readable storage media that provide a manner of seismic imaging using full waveform inversion (FWI). These embodiments are designed to be of particular use for seismic imaging of subsurface volumes in geologically complex areas.
Reference will now be made in detail to various embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure and the embodiments described herein. However, embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures, components, and mechanical apparatus have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
Seismic imaging of the subsurface is used to identify potential hydrocarbon reservoirs. Seismic data is acquired at a surface (e.g. the earth's surface, ocean's surface, or at the ocean bottom) as seismic traces which collectively make up the seismic dataset. The seismic data can be used in a full waveform inversion (FWI) method to determine subsurface velocities so that the seismic data can be properly imaged.
Advantageously, those of ordinary skill in the art will appreciate, for example, that the embodiments provided herein may be utilized to generate a more accurate digital seismic image (i.e., the corrected digital seismic image). The more accurate digital seismic image may improve hydrocarbon exploration and improve hydrocarbon production. The more accurate digital seismic image may provide details of the subsurface that were illustrated poorly or not at all in traditional seismic images. Moreover, the more accurate digital seismic image may better delineate where different features begin, end, or any combination thereof. As one example, the more accurate digital seismic image may illustrate faults more accurately. As another example, assume that the more accurate digital seismic image indicates the presence of a hydrocarbon deposit. The more accurate digital seismic image may delineate more accurately the bounds of the hydrocarbon deposit so that the hydrocarbon deposit may be produced.
Those of ordinary skill in the art will appreciate, for example, that the more accurate digital seismic image may be utilized in hydrocarbon exploration and hydrocarbon production for decision making. For example, the more accurate digital seismic image may be utilized to pick a location for a wellbore. Those of ordinary skill in the art will appreciate that decisions about (a) where to drill one or more wellbores to produce the hydrocarbon deposit, (b) how many wellbores to drill to produce the hydrocarbon deposit, etc. may be made based on the more accurate digital seismic image. The more accurate digital seismic image may even be utilized to select the trajectory of each wellbore to be drilled. Moreover, if the delineation indicates a large hydrocarbon deposit, then a higher number of wellbore locations may be selected and that higher number of wellbores may be drilled, as compared to delineation indicating a smaller hydrocarbon deposit.
Those of ordinary skill in the art will appreciate, for example, that the more accurate digital seismic image may be utilized in hydrocarbon exploration and hydrocarbon production for control. For example, the more accurate digital seismic image may be utilized to steer a tool (e.g., drilling tool) to drill a wellbore. A drilling tool may be steered to drill one or more wellbores to produce the hydrocarbon deposit. Steering the tool may include drilling around or avoiding certain subsurface features (e.g., faults, salt diapirs, shale diapirs, shale ridges, pockmarks, buried channels, gas chimneys, shallow gas pockets, and slumps), drilling through certain subsurface features (e.g., hydrocarbon deposit), or any combination thereof depending on the desired outcome. As another example, the more accurate digital seismic image may be utilized for controlling flow of fluids injected into or received from the subsurface, the wellbore, or any combination thereof. As another example, the more accurate digital seismic image may be utilized for controlling flow of fluids injected into or received from at least one hydrocarbon producing zone of the subsurface. Chokes or well control devices, positioned on the surface or downhole, may be used to control the flow of fluid into and out. For example, certain subsurface features in the more accurate digital seismic image may prompt activation, deactivation, modification, or any combination thereof of the chokes or well control devices so as control the flow of fluid. Thus, the more accurate digital seismic image may be utilized to control injection rates, production rates, or any combination thereof.
Those of ordinary skill in the art will appreciate, for example, that the more accurate digital seismic image may be utilized to select completions, components, fluids, etc. for a wellbore. A variety of casing, tubing, packers, heaters, sand screens, gravel packs, items for fines migration, etc. may be selected for each wellbore to be drilled based on the more accurate digital seismic image. Furthermore, one or more recovery techniques to produce the hydrocarbon deposit may be selected based on the more accurate digital seismic image.
In short, those of ordinary skill in the art will appreciate that there are many decisions (e.g., in the context of (a) steering decisions, (b) landing decisions, (c) completion decisions, (d) engineering control systems and reservoir monitoring in the following but not limited to: Tow Streamer, Ocean Bottom Sensor, VSP, DASVSP, and imaging with both primaries and free surface multiple, etc.) to make in the hydrocarbon industry and making proper decisions based on more accurate digital seismic images should improve the likelihood of safe and reliable operations. For simplicity, the many possibilities, including wellbore location, component selection for the wellbore, recovery technique selection, controlling flow of fluid, etc., may be collectively referred to as managing a subsurface reservoir.
The present invention includes embodiments of a method and system for FWI using a tree-based Bayesian approach which automatically selects the model complexity, allowing appropriate parameterization. Limited illumination, insufficient offset, noisy data and poor starting models can pose challenges for seismic full waveform inversion. The present invention includes a tree based Bayesian inversion scheme which attempts to mitigate these problems by accounting for data uncertainty while using a mildly informative prior about subsurface structure. The method samples the resulting posterior model distribution of compressional velocity using a trans-dimensional (trans-D) or Reversible Jump Markov chain Monte Carlo method in the wavelet transform domain of velocity. This allows rapid convergence to a stationary distribution of posterior models while requiring a limited number of wavelet coefficients to define a sampled model. The trans-D tree based approach together with parallel tempering for navigating rugged likelihood (i.e. misfit) topography provides a promising, easily generalized method for solving large-scale geophysical inverse problems which are difficult to optimize, but where the true model contains a hierarchy of features at multiple scales. In addition to the improvements to digital seismic imaging, this computer-implemented approach is significantly more computationally efficient than conventional methods.
The active source seismic full waveform inversion (FWI) method is, in principle, a simple idea. With minimal processing or manual intervention, it aims to provide not just an image of the subsurface, but a velocity model which when put though a forward operator, ‘closely’ matches the observed seismic field. This entails the solution of an inverse problem, with the forward physics governed by the seismic wave equation. However, such inverse problems with limited receiver coverage as well as frequency bandwidth are extremely nonlinear and thus very challenging to solve. Further, the presence of noise at inopportune frequencies confounds many optimization methods, and complicated earth models make for a very high dimensional model space that is difficult to work with in a computationally efficient manner. The nonlinearity alluded to manifests as local misfit minima, leading to models that are not optimally converged or are ‘cycle skipped’ in FWI parlance. Various promising methods to improve convergence exist, such as the estimation of time shifts to minimize the kinematic differences between initially modelled and observed data, the use of extended model domains and/or non-local wave physics. Another approach is to solve a sequence of constrained, locally convex subproblems. Yet other methods seek to improve the convexity of the misfit function through the use of an optimal transport distance, via the addition of artificial low frequencies to data, the iterative use of Wiener filters, or the use of quadratic penalty methods. One commonality of all these methods is an effort to make the misfit topography easier for optimization algorithms to navigate. To varying degrees, all of these methods work well under different circumstances, but cannot guarantee convergence. Further, given the various steps involved, these methods are not easily amenable to solution appraisal or uncertainty estimation. The present invention quantifies the credibility (in the Bayesian sense) with which we provide solutions to the FWI problem, when such solutions themselves are not easy to find. Further, the algorithm automatically selects and operates with a limited set of discrete wavelet transform coefficients of the velocity model. This leads to a reduced number of unknowns than cells in the forward modelling finite difference grid, thus allowing for tractable uncertainty estimation in 2-D and potentially 3-D FWI with minimal assumptions being made a priori.
In most conventional schemes for geophysical inversion, the model grid geometry is fixed, that is, the size of the cells and their number is not allowed to vary during inversion. Traditionally, solutions have focused on minimizing the following objective function:
arg min ∅(m)=∥W(d−f(m))∥22+λ2∥Rm∥pp, (1)
where m is a model vector, d is the observed data and f(m) provides the forward modelled prediction due to m. λ2 is the regularization parameter, R is any operator which once applied to m produces a measure of length in the p norm that is deemed sensible to keep small. The first term in (1) is the data misfit (weighted by the data precision W), and the second is the regularization term designed to keep the model (or deviations of the model from a preferred model) small. The trade-off between the two is controlled by the so-called Tikhonov regularization parameter λ2. This is akin to the statistical technique of ridge regression, that is, depending on the value of λ2, for a linear problem and the p=2 norm, the solution to (1) lies on the ‘ridge’ between the minimum of the data misfit term and the minimum of the model length term in order to simultaneously minimize both. Clearly, choices need to be made regarding the operator R, the weight given to the model length, and the selection of model norm. Nonlinear least squares FWI solutions involving gradient descent or the use of the Jacobian matrix (or its inverse) in conjunction with Tikhonov regularization are easy enough to conceptualize, well understood, but notoriously slow to converge if R is poorly chosen. Choosing a smaller number of parameters, or using a p=1 norm in conjunction with a sparse model ‘frame’ does away with some of this hyper-parameter selection.
Of course, the use of sparse model representations with small measures of length not only aid FWI convergence to a suitable solution of (1), there is another observation which can be made regarding parsimonious model parametrizations—simpler theories (models in our case) offer clearer insight. This is the approach used by Occam's inversion, which aims to produce the smoothest model or the sparsest model compatible with the data noise. However, these models are extremal models, and should not be looked at as being truly representative of the earth. To wit, we should consider models which are suitably simple, but also fit the data appropriately. Statistically, this is known as the model selection problem. The goal is to avoid producing simple models which have low variance but high bias, or complicated models with high variance but low bias. Ideally for geophysical inversion, we should be sampling over not one, but a range of models compatible with our data as well as our prior notions of the earth and its complexity.
In the methods outlined so far, the goal has been to find a minimum of (1), with the hope that it is a global minimum. As mentioned previously, no such convergence guarantee exists. Further, even if a global minimum were to be found, it would not preclude the existence of other models with similar misfits which fit within the data noise. These models will likely exhibit very different velocity structure, typical of a nonlinear problem. Continuing with the geophysical ideal mentioned previously, it is desirable to sample with a range of hyper-parameters (such as regularization parameters, number of cells, etc.), a range of models such that the models themselves are of an appropriate complexity, with seismic velocities that conform to log data, outcrops, and laboratory experiments while being compatible with the noisy seismic observations. Attempting to do this manually by trial and error would prove impossible due to the huge number of possibilities. However, even using a systematic approach would be complex since it would still need to quantitatively weight the outcomes due to each combination of hyper-parameters and inverted models.
The present invention accomplishes this task by re-examining (1) in a Bayesian sense. For every sampled model m, loosely speaking, the misfit term provides a measure of the likelihood of the model, while the length of the model vector encapsulates our prior knowledge about the model, including its complexity. More rigorously speaking, a Bayesian formulation is
p(m|d)∝p(d|m)p(m). (2)
which for our purposes is better read from right to left as follows: p(m) is the prior probability of m, which we know independent of the observations d. We re-assess our prior notion of m by carrying out a seismic experiment which shows us how likely it is that m fits the observations. This weight is given by the likelihood function p(d|m). The result of re-weighting or updating our prior notion by the likelihood provides the posterior probability of observing the model m. The posterior probability is represented by the termp(m|d). We then repeat this process for various models m admissible by our prior notions until we obtain an ensemble of models represented by the probability density function or PDF p(m|d). We can thus turn the optimization problem (1) with many possible solutions into a sampling problem (2). Those of skill in the art will note that (2) is missing a normalization constant which ensures it integrates to unity, and thus is not truly a probability density function. Indeed, (2) is more representative of a multidimensional histogram until we normalize it by integrating over all models on the right-hand side:
p(d)=∫mp(d|m)p(m)dm, (3)
where p(d) is known as the evidence. However, for model appraisal we are only interested in the relative probabilities of various models. We can thus sample up to a constant of proportionality using (2) for our purposes. It is important to note that our prior in (2) includes a specification over various levels of complexity (including parametrizations with different numbers of variables) and p(d) is therefore the ‘total’ evidence.
For the optimization problem (1), as applicable to any geophysical problem, model regularization is necessary from a number of different viewpoints, be it for improving the stability of a matrix inverse, for keeping model fluctuations (or the update) small, or for keeping the model (or the update) close to a preferred model. However, the number of parameters with which to describe the model, a measure of the complexity of the model, can also be treated as an unknown to sample, without explicitly requiring regularization. With this approach, we consider not only the simplest or smoothest model with which to describe the data, but a collection of models with a different number of parameters which are compatible with the observed data. The trans-dimensional inversion method based on birth/death Monte Carlo and the more general Reversible Jump Markov chain Monte Carlo (RJ-McMC) method accomplishes this task. For a 1-D model, this would mean sampling over a variable number of layers. For 2-D models, Voronoi cells with different numbers of cells have been widely used. In effect, the trans-D algorithm via Bayes' theorem performs the task of model selection, with regard to the complexity of the model. The fact that models are neither overfit nor underfit is based on the idea of Bayesian parsimony. An ‘Occam factor’ which penalizes overly complicated models, is built into the framework of Bayes' theorem when formulated appropriately. To examine this argument, we note that a trans-D model vector is defined as m=[mk, k], where mk is a model with k parameters that describe compressional velocity (for the FWI application of the present invention). It is possible to derive from the joint probability of the data and models, that
Treating the total evidence p(d) as a constant, we get
The term on the left-hand side of (5) is the posterior probability (after performing the experiment), on inferring the number of parameters k. The first term on the right is the likelihood of k parameters fitting the data adequately. To examine the bracketed second term on the right, we first note from the definition of joint and conditional probability that p(mk,k)=p(mk|k)p(k). Therefore, the bracketed term on the right-hand side is the ratio of prior model probability to posterior probability for a k-parameter model. The more number of parameters k there are, the more thinly spread (i.e. lower) the prior probability is, since the prior PDF needs to integrate to 1 over a larger volume. Since acceptable k-parameter models occupy a posteriori a tiny amount of the prior space, the k-parameter posterior probability is generally higher (i.e. peakier) than the prior. The more parameters k are used, the less therefore is the bracketed fraction. However, the likelihood of the k-parameter fit increases the more number of parameters k we use. In a trans-D formulation, the bracketed factor and the data likelihood trade off, automatically providing a solution akin to regularization, depending largely on the data. With a uniform probability for p(k), and some simplifying assumptions discussed in (Ray et al. 2016), the bracketed fraction can be interpreted as the ratio of the posterior accessible volume to the prior accessible volume, sometimes known as the ‘Occam Factor.’ This formulation allows an inversion to self-parameterize to good effect, providing higher model resolution in areas with better data coverage and low noise.
Now that we have interpreted the trans-D formulation, lest it appear that the right-hand side of (5) depends on mk while the left does not, we can simply use the definition of conditional probabilities again to verify that the right-hand side of (5) equals p(k, d). This is entirely consistent with (4), since by definition, p(k|d)=p(k,d)/p(d). Trans-D outperforms inversion based on subspace transformations using B-splines for a seismic surface wave tomography application. Alternatives to a trans-D formulation based on evaluating the evidence for different parameterizations via the marginal likelihood p(d|k) or the evidence for a given hypothesis (in our case a k-parameter model) are known. However, this involves the computationally prohibitive task of finding the evidence for each k-parameterization, and is only feasible for certain kinds of geophysical inversion.
For the exploration seismic FWI problem, solutions to characterize the full nonlinear uncertainty have only recently been put forward, owing to the huge computational cost of a forward evaluation. Some methods use a Bayesian solution based on randomized source subsampling but make use of a fixed parameterization while assuming a Gaussian distribution about the maximum a posteriori (MAP) model. Others use a genetic algorithm (GA) in conjunction with model resampling using the neighbourhood algorithm followed again by Gibbs sampling. They use a two grid approach, coarse for the inverse model, and fine for the forward model. However, the data do not determine the coarseness of the inverse model grid, and the quality of the estimated uncertainty also depends on the input ensemble from the GA to the NA+GS algorithm. Other methods present a two grid approach which involves operator upscaling though the inverse model grid is fixed. All of these methods are promising efforts to quantify seismic FWI uncertainty but do not address the model selection problem. The only efforts we are aware of which have attempted this with trans-D inversions are for the vertical seismic profile (VSP) inversion problem and for the elastic FWI problem, but both assume a laterally invariant earth model which is, of course, not representative of the true earth model that must be obtained for the purpose of hydrocarbon exploration and production. In theory, the Bayesian model selection principles demonstrated for 1-D and 2-D earth models are equally applicable for 3-D inversion. However, as pointed out by Hawkins & Sambridge (2015), computationally efficient parameterizations for trans-D problems in 3-D are not easy to construct, and the inclusion of prior knowledge about geometric structure is difficult.
The recent work of Hawkins & Sambridge (2015) has demonstrated that any basis function set which can be represented by a tree based structure can be used as a valid model representation for trans-D inversion. A major advantage of using this formulation is that from both a theoretical and practical efficiency point of view, it is agnostic to the spatial dimensionality of the earth model, be it 1-D, 2-D or 3-D. In an embodiment, we specifically use wavelet basis functions and the discrete wavelet transform(DWT), which is readily amenable to a hierarchical tree based representation. Wavelet transforms with a suitable basis set (e.g. CDF 9/7) are routinely used to compress image information (e.g. JPEG 2000). This makes the transform domain attractive for parsimonious geophysical model representations, as will be demonstrate with synthetic examples. As mentioned previously, curvelet or wavelet basis sets have been used for exploration seismic FWI, but in an optimization set up. As discussed by Hawkins & Sambridge (2015), a valid wavelet tree which is incompletely filled can represent a hierarchy of features from low to high spatial wavenumbers. In conjunction with the trans-D algorithm, this provides a multiresolution approach which adaptively parameterizes according to the observed data. Adaptive inversion grid meshing has been carried out, but these used fixed criterions for the adaptation rather than sample over a range of parameterizations where model complexity is dictated by the data. Successful recent applications of such a trans-D tree based approach can be found in Hawkins et al. (2017) for airborne electromagnetic inversion, Dettmer et al. (2016) to quantify uncertainty for tsunami sea surface displacement, Hawkins & Sambridge (2015) for 2-D and 3-D seismic tomography, and the present invention is the first use of the approach for seismic FWI.
For 1-D, 2-D and 3-D models, the tree representation requires use of modified binary tree, quaternary tree and octree structures respectively. For all these representations in the wavelet transform domain, the first node coefficient (which is at the top level of the tree) represents the average value of velocities in the model (to be presented to the finite difference operator). This node branches into 1, 3 and 7 nodes (again, for 1-D, 2-D and 3-D models respectively) at the second level, with coefficients at this level representing the strength of basis functions with wavelengths of roughly half the length scale of the total model. From this level downwards, each node branches into a pure binary tree, quadtree or octree where each child has 2, 4 and 8 children exactly. The tree depth is restricted by the size of the forward model grid. Each successive depth level (in the inverse wavelet transform domain) represents finer scaled features in the velocity model. In all the work presented here, we use the modified restricted quaternary trees as we are working in 2-D but those of skill in the art will recognize that the methods are equally applicable in 1-D or 3-D by using the appropriate tree structure.
Another advantage of working with the tree based wavelet transform representation is that different wavelet bases can be used, depending on the problem at hand. For transmission dominated problems, smooth basis functions such as CDF 9/7 may be appropriate. For reflection dominated problems, sharp basis functions such as the Haar wavelets could be used.
Sampling the posterior model PDF (2) is done via the trans-D McMC algorithm, details of which are provided below. In particular, we sample different wavelet trees, adding nodes, deleting nodes or modifying node coefficients according to a prior specification and the likelihood function.
We start the algorithm with a very simple model, typically a tree with only one root node. We then allow the algorithm to iteratively add active nodes to the tree (‘birth’), prune them (‘death’), or simply modify the coefficient value at an existing active node (‘update’). This is all done as the data may demand via the acceptance probability α. This process is repeated until the McMC chain converges to a stationary chain of samples. Details of convergence monitoring for the trans-D inversion and the parallel tempering algorithm used to escape local likelihood maxima (misfit minima) are detailed in Ray et al. (2016).
Following the notation of Hawkins & Sambridge (2015), we need to keep track of the set of active nodes Sv, the set of nodes from which to give birth Sb, and the set of active nodes which have no children (‘leaves’ of the tree) for death Sd. An example tree model with k=2 active nodes and the active, birth and death sets illustrated is shown in
At every step of the McMC, one of three possible moves is randomly chosen with equal probability: update a node coefficient, birth, or death. For a node update, a node is selected at random from the sets of nodes Sv, and the coefficient value is perturbed using a Gaussian proposal. Typically, we set the standard deviation of the update to be 5 percent of the width of the uniform bounds at the particular node's depth. This move does not change the model dimension.
A birth move involves the following steps:
A death move involves the following steps, and is the reverse of the birth step:
The probability that the McMC chain moves from a model m to m′ is given by the acceptance probability α. For tree based trans-D McMC, it takes different forms for each of the three different move types, and the expressions given below are derived in detail by Hawkins & Sambridge (2015).
For the update move, there is no change in dimension, and when proposing from a uniform prior coefficient range as we have done, it is simply the likelihood ratio:
For the birth move, the acceptance probability is
where |Sx| is the number of elements in set Sx and h is the maximum depth level restriction. For the death move, the acceptance probability is
If the prior probability on the number of nodes is uniform then
However, if a Jeffrey's prior has been used, as done in an embodiment, then
If a proposed model is accepted with probability a, it is stored as the next sample. If the proposal is rejected, then the previous model in the McMC chain is retained as the next sample.
The most difficult part, conceptually, of this algorithm is the counting of the number of possible arrangements of a tree given the number of active nodes k, required to calculate α for birth and death proposals. For a binary tree, if there are n nodes, then for node i, say we can have Ci−1 arrangements of the nodes preceding it. This leaves Cn−i arrangements possible for the remaining nodes. Since the arrangements are independent, the total number of arrangements for node i is Ci−1·Cn−1. But since there are n nodes we have to sum over all i and so the total number of arrangements for n nodes is
For n=1, we set C0=1 as there is exactly one way to make a tree with only 1 node. This defines the Catalan number sequence via a recurrence relation, with a base case defining C0=1. One can use this logic to construct the number of arrangements of higher order and more general trees as well (Hawkins & Sambridge 2015). Cn can easily be solved via recursion, but on closer examination we see that to obtain C3 we need to compute C2 and C1. But if we have already computed C2, we can store this value and re-use it without another recursive call. This is known as memoization, a technique extensively used in dynamic programming. This becomes very useful when there are many recursive calls made, as in the case of a pure quaternary tree, where the number of arrangements Yn can be written thus
In addition to memoizing Yn we can memoize each of the partial sums over j and k, as the partial sums are functions of the sum upper limit. The modified quaternary tree required for the Cartesian DWT has one root node and three children, each of these three children follow pure quaternary tree structures. We can write the number of arrangements thus:
T
n=Σi=1nYi−1Σj=1n−i+1Yj−1Yn−i−j+1,
taking advantage of the fact that we can again memoize partial sums. Finally, we can treat srestricted tree depths with another index representing the depth level restriction. For the case of binary trees, a restriction to a depth h is given by
We can apply exactly the same restricted binary tree arrangement logic to the modified restricted quaternary tree arrangement count. All we need to do is modify the numbers of arrangements at any level h by simply making the calculation depend on the previous level h−1.
For additive noise, which by central limiting is asymptotically Gaussian (especially in the frequency domain, as shown in Ray et al. 2016), we define the model likelihood function (m)=p(d|m) as
where Cd is the covariance matrix of data errors. Since the DWT is a linear transformation, we can write
f(m)=F(Hm), (7)
where F is the seismic forward operator, H is the inverse DWT operator and m is a model vector represented by coefficient values on a wavelet tree. In other words, Hm is the 2-D velocity model fed to a variable density, acoustic and isotropic finite difference engine. The source signature is assumed known, or it can be derived as a maximum likelihood estimate as a function of the model, as shown in Ray et al. (2016).
In this embodiment, we only concern ourselves with changes in velocity in the earth, assuming that density changes are known or that there are no changes in density. This is not a limitation of the method, which easily generalizes to more variables, as will be recognized by those skilled in the art. The prior models need to specify the probabilities of nodes on a tree. Hence we can write
p(m)=p(v, T, k), (8)
where v is a vector of velocities (in this embodiment) in the wavelet transform domain, which is a point to note, that makes the tree based formulation different from layer or cell based trans-D. T is a particular type of wavelet tree (modified restricted quaternary trees for our 2-D application) and k is the number of active nodes representing a valid tree structure. Using the chain rule of probabilities, we can write:
p(v,T,k)=p(v|T,k)p(T|k)p(k),
p(v,T,k)=p(T|k)p(k)Πi=1kp(vi|T,k). (9)
The last term under the product assumes that the wavelet coefficients at each node, given k active nodes for the specific tree type T, are independent of each other. Hawkins & Sambridge (2015) and Dettmer et al. (2016) simply use wide uniform bounds at each node position. However, as can be seen in
Conventional methods have no way of counting the number of arrangements of a modified, restricted tree. For general restricted trees, there is an efficient recursive method to calculate Nk, presented in Hawkins & Sambridge (2015). In the present invention, we provide a less general, probably easier to implement, efficient recursive pseudo-code for the 2-D wavelet tree structure. It can be modified easily for the 1-D or 3-D wavelet trees for the DWT.
In another embodiment, obtaining the posterior model PDF requires sampling (2) using the Metropolis-Hastings-Green algorithm. The criterion to accept or reject a model proposal is given by the probability
where q (m′|m) is the proposal probability of stepping from model m to m′ and |J| is the determinant of the Jacobian of transformation of variables while changing dimension. It computes to unity for the Birth-Death algorithm used in this case. To escape local misfit minima (likelihood maxima), various interacting McMC chains are run in parallel at different ‘temperatures’ τ using the Parallel Tempering algorithm. Posterior inference is carried out using the unbiased τ=1 chain. Details of the sampling methodology and model proposals were previously provided.
In an embodiment for a transmission-dominated use, the model and noisy synthetic data are shown in
A CDF 9/7 basis was chosen for the inversion as it provided a lower ψ2 misfit at level 5 truncation than the Haar basis (see
The algorithm very quickly reduces misfit till it reaches RMS (root mean square) misfits close to 2, within just 400 iterations (
When we allowed the wavelet tree models to occupy level 5, for a total of 256 possible active nodes, we sample for far longer and arrive at the situation described in
In another embodiment, the method may be applied to a surface reflection problem. This example is based on a scaled version of the Marmousi model. It is 128×128 pixels, with a grid spacing of 20 m. The source wavelet, assumed known, is a Ricker with peak frequency at 3.75 Hz. Two shots were modelled, with uncorrelated Gaussian noise at 0.2 percent of the maximum amplitude added to all traces. The model and noisy data (minus the direct wave) are shown in
Similar to the previous embodiment, prior bounds were obtained by finding the minimum and maximum DWT coefficients at each level, and going above and below these bounds by 2 percent of the value.
For posterior inference, we used only the last 700,000 iterations to obtain samples from a stationary Markov chain unbiased by poorly fitting models. Only the target chain was used for posterior inference. Similar to the previous example, we can create probability cubes with marginal PDFs of velocity at every subsurface location, and the results are shown in
An important check after any inversion is an examination of the data fit and residuals. With real data, correlated residuals are indicative of theory error, an incorrect likelihood function, coherent noise, or some combination of the above. These cannot be always be avoided, but residuals can tell us how much an inversion can be trusted—for example, in Ray et al. (2016) it was expected that the residuals would be correlated (due to processing/acquisition artefacts) but Gaussian, and indeed they were. For the synthetic examples herein, we added uncorrelated Gaussian random noise and expect that our residuals should therefore also be uncorrelated and Gaussian. For our reflection experiment, we selected 100 random models from the posterior PDF and forward calculated the shot gathers. We have plotted all 100 modelled responses at select traces as shown in
We can examine the data fit for 100 random posterior models for both shots, as shown in
We have demonstrated with two synthetic examples, the feasibility of carrying out a fully nonlinear, 2-D Bayesian inversion with adaptive model complexity in a tree based framework. There are numerous advantages to doing this, chief among them being an easy to use parametrization which works equally well across 1-D, 2-D and 3-D earth models. Using the tree based parametrization, we easily obtain acceptance rates for birth and death as high as 25 percent, ensuring good mixing of the McMC, which is very difficult with a Voronoi cell parameterization (Hawkins & Sambridge 2015). Specifying prior coefficient bounds as we have done here, restricts prior models to being within only a certain range of feasible models, while not being an overly restrictive constraint. The use of Parallel Tempering enables us to escape local misfit minima, a major hindrance for reflection based FWI. Finally, the DWT provides an easy means of switching to the model basis most appropriate for solving the current problem. Of course, there is an inherent subjectivity in the use of Bayesian priors and different basis functions (Hawkins & Sambridge 2015). However, for practical purposes, almost all geophysical inversion via optimization takes advantage of sensible constraints. Bayesian inversion methods as demonstrated here are naturally able to incorporate multiple structural constraints as prior information. While it is undoubtedly true that a Bayesian appraisal is more time consuming than optimization, fast methods to speed up sampling by an order of magnitude are being researched actively in both the geophysics and particularly the statistics communities, coupled with increasingly easy availability of parallel computing from commercial vendors. In this context, our analysis can be extended to higher frequencies and more shots. The fact that a Bayesian inversion of geophysical data provides an uncertainty analysis is invaluable, as it can be a risk mitigation factor for many decisions informed by geophysical data.
To that end, the seismic imaging system 500 includes one or more processing units (CPUs) 502, one or more network interfaces 508 and/or other communications interfaces 503, memory 506, and one or more communication buses 504 for interconnecting these and various other components. The seismic imaging system 500 also includes a user interface 505 (e.g., a display 505-1 and an input device 505-2). The communication buses 504 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. Memory 506 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 506 may optionally include one or more storage devices remotely located from the CPUs 502. Memory 506, including the non-volatile and volatile memory devices within memory 506, comprises a non-transitory computer readable storage medium and may store seismic data, velocity models, seismic images, and/or geologic structure information.
In some embodiments, memory 506 or the non-transitory computer readable storage medium of memory 506 stores the following programs, modules and data structures, or a subset thereof including an operating system 516, a network communication module 518, and a seismic imaging module 520.
The operating system 516 includes procedures for handling various basic system services and for performing hardware dependent tasks.
The network communication module 518 facilitates communication with other devices via the communication network interfaces 508 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on.
In some embodiments, the seismic imaging module 520 executes the operations of the seismic imaging method including the FWI using the tree-based Bayesian approach. Seismic imaging module 520 may include data sub-module 525, which handles the seismic dataset including seismic gathers 525-1 through 525-N. This seismic data is supplied by data sub-module 525 to other sub-modules.
FWI sub-module 522 contains a set of instructions 522-1 and accepts metadata and parameters 522-2 that will enable it to execute operations for full waveform inversion. The Bayesian sub-module 523 contains a set of instructions 523-1 and accepts metadata and parameters 523-2 that will enable it to perform the tree-based Bayesian approach for the FWI method. The imaging sub-module 524 contains a set of instructions 524-1 and accepts metadata and parameters 524-2 that will enable it to execute seismic imaging using the velocities determined by FWI sub-module 522 and Bayesian sub-module 523. Although specific operations have been identified for the sub-modules discussed herein, this is not meant to be limiting. Each sub-module may be configured to execute operations identified as being a part of other sub-modules, and may contain other instructions, metadata, and parameters that allow it to execute other operations of use in processing seismic data and generate the seismic image. For example, any of the sub-modules may optionally be able to generate a display that would be sent to and shown on the user interface display 505-1. In addition, any of the seismic data or processed seismic data products may be transmitted via the communication interface(s) 503 or the network interface 508 and may be stored in memory 506.
Method 100 is, optionally, governed by instructions that are stored in computer memory or a non-transitory computer readable storage medium (e.g., memory 506 in
While particular embodiments are described above, it will be understood it is not intended to limit the invention to these particular embodiments. On the contrary, the invention includes alternatives, modifications and equivalents that are within the spirit and scope of the appended claims. Numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that the subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.
Although some of the various drawings illustrate a number of logical stages in a particular order, stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art and so do not present an exhaustive list of alternatives. Moreover, it should be recognized that the stages could be implemented in hardware, firmware, software or any combination thereof.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
This application claims the benefit of U.S. Provisional Application No. 62/529,297, filed Jul. 6, 2017.
Number | Date | Country | |
---|---|---|---|
62529297 | Jul 2017 | US |