In the context of oil and gas exploration and production, a variety of tools and methods are employed to model subsurface regions. An accurate seismic velocity model is critical for geophysical exploration and oil and gas field planning. Generally, layers of rock in the subsurface of the Earth are formed through deposits of sediment over time and under a variety of environmental conditions. As such, layers of rock may be composed of different constituents and may have different physical and/or chemical properties. A velocity model maps the speed at which seismic waves travel through the subsurface. Consequently, a velocity model may be used, among other things, to identify the structure of the subsurface (e.g. the depth of subsurface formations).
This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.
One or more embodiments disclosed herein generally relate to a method. The method includes obtaining an initial velocity model and perturbing the initial velocity model to form a first plurality of velocity models. The method further includes using a forward model to simulate a first plurality of seismic data sets from the first plurality of velocity models and transforming the first plurality of seismic data sets to the wavenumber-time domain to form a first plurality of transformed seismic data sets. The method further includes training a machine-learned model using the first plurality of velocity models and the first plurality of transformed seismic data sets, wherein the machine-learned model is configured to accept transformed seismic data. The method further includes obtaining a second seismic data set for a subsurface region of interest, wherein the second seismic data set is acquired according to a second survey configuration and transforming the second seismic data set to the wavenumber-time domain to form a second transformed seismic data set and processing the second transformed data set with the trained machine-learned model to predict a second velocity model for the subsurface region of interest.
One or more embodiments disclosed herein generally relate to a non-transitory computer readable medium storing instructions executable by a compute processor. The instructions include functionality for obtaining an initial velocity model, perturbing the initial velocity model to form a first plurality of velocity models, and using a forward model to simulate a first plurality of seismic data sets from the first plurality of velocity models. The instruction further include functionality for transforming the first plurality of seismic data sets to the wavenumber-time domain to form a first plurality of transformed seismic data sets. The instructions further include functionality for training a machine-learned model using the first plurality of velocity models and the first plurality of transformed seismic data sets, wherein the machine-learned model is configured to accept transformed seismic data. The instructions further include functionality for obtaining a second seismic data set for a subsurface region of interest, wherein the second seismic data set is acquired according to a second survey configuration and transforming the second seismic data set to the wavenumber-time domain to form a second transformed seismic data set. The instructions further include functionality for processing the second transformed data set with the trained machine-learned model to predict a second velocity model for the subsurface region of interest.
One or more embodiments disclosed herein generally relate to a system which includes an initial velocity model, a forward modelling procedure, a machine-learned model, a second seismic data set for a subsurface region of interest, wherein the second seismic data set is acquired according to a second survey configuration, and a computer. The computer includes one or more computer processors, and a non-transitory computer readable medium storing instructions executable by a computer processor. The instructions include functionality for perturbing the initial velocity model to form a first plurality of velocity models and using the forward modelling procedure to simulate a first plurality of seismic data sets from the first plurality of velocity models. The instructions further include functionality for transforming the first plurality of seismic data sets to the wavenumber-time domain to form a first plurality of transformed seismic data sets and training the machine-learned model using the first plurality of velocity models and the first plurality of transformed seismic data sets, wherein the machine-learned model is configured to accept transformed seismic data. The instructions further include functionality for transforming the second seismic data set to the wavenumber-time domain to form a second transformed seismic data set and processing the second transformed data set with the trained machine-learned model to predict a second velocity model for the subsurface region of interest.
Other aspects and advantages of the claimed subject matter will be apparent from the following description and the appended claims.
Generally, layers of rock in the subsurface of the Earth are formed through deposits of sediment over time and under a variety of environmental conditions. As such, layers of rock may be composed of different constituents and may have different physical and/or chemical properties. Subsurface rock properties may be anisotropic. In order to describe and model a subsurface region of the Earth, a variety of data collection methods may be employed. These methods may include, but are not limited to: collecting data from one or more wells disposed throughout the subsurface, which may include subsurface logs and/or petrophysical logs; conducting a seismic survey; collecting data from previously drilled, nearby wells, sometimes called “offset” wells; and collecting so-called “soft” data, such as outcrop information and data describing analogous modern geological or depositional environments. The collected data may be used to construct, or otherwise inform, a subsurface model. Once constructed, subsurface models may include information about the spatial distribution of subsurface formation properties such as, but not limited to: porosity; mineral content; chemical makeup; and density. Additionally, the modeled subsurface region may include information about the subsurface formation geological unit thicknesses.
An accurate subsurface model is critical for geophysical exploration, such as the identification of reservoirs, and for oil and gas field planning and lifecycle management. One such subsurface model is a seismic velocity model (“velocity model”). A velocity model maps the speed at which seismic waves travel through the subsurface. Consequently, a velocity model may be used, among other things, to identify the structure of the subsurface (e.g. the depth of subsurface formations), to aid in imaging seismic pre-stack data, and to monitor carbon dioxide (CO2) distributions and retention. Further, a velocity model may be integrated with, or inform, other subsurface models. Typically, the velocity at which seismic waves travel through the subsurface cannot be directly measured. As such, a velocity model is generally constructed by processing recorded seismic data. Seismic data may be obtained through a seismic survey, which will be described in greater detail below. Processing seismic data to obtain a velocity model may be considered an inverse problem, where the applied process must determine the subsurface velocity model that resulted in the recorded seismic data.
The various processes and techniques used to process seismic data to form a velocity model may generally be categorized as either a “data-domain approach”, such as full-waveform inversion (FWI), or an “image-domain approach”, such as migration velocity analysis. Among these processes and techniques to construct a subsurface velocity model from seismic data, FWI is considered the state-of-the-art industry practice. However, FWI suffers because many different velocity models can be formed for a seismic data set. That is, FWI solutions are non-unique. Consequently, an FWI solution is sensitive to aspects of the recorded data, such as the lack of low frequencies, the initial starting model, or the acquisition method and configuration of the seismic survey.
Turning to
In
Likewise, as shown in
Returning to
However, a seismic survey (100) may include recordings of seismic waves generated by a seismic source (106) sequentially activated at a plurality of seismic source locations denoted (xs,ys). In some cases, a single seismic source (106) may be activated sequentially at each source location. In other cases, a plurality of seismic sources (106) each positioned at a different location may be activated sequentially. In accordance with one or more embodiments a plurality of seismic sources (106) may be activated during the same time period, or during overlapping time periods.
A seismic survey (100) may further be specified by its configuration. For example, the configuration of a seismic survey (100) may dictate the spacing between adjacent seismic receivers (120), the number of seismic receivers (120), the locations (xr, yr) of the seismic receivers (120), and the signature (i.e., the characteristics) of the initial radiated seismic wave (108) emitted from the seismic source (106).
Δxj,i=xR
Δyj,i=yR
In one aspect, embodiments disclosed herein generally relate to a deep learning (DL)-based framework to construct a velocity model from seismic data. The DL-based framework is highly generalizable and robust to various configurations under which seismic data may be acquired. For the present discussion, deep learning (DL) may be considered a subset of machine learning (ML). Machine learning (ML), broadly defined, is the extraction of patterns and insights from data. The phrases “artificial intelligence”, “machine learning”, “deep learning”, and “pattern recognition” are often convoluted, interchanged, and used synonymously throughout the literature. This ambiguity arises because the field of “extracting patterns and insights from data” was developed simultaneously and disjointedly among a number of classical arts like mathematics, statistics, and computer science. For consistency, the term machine learning, or machine-learned, will be adopted herein and deep learning (DL) will refer to a subset of machine learning (ML) which deals with so-called “deep” models. For example, a deep model may be a neural network with one or more hidden layers. However, one skilled in the art will recognize that the concepts and methods detailed hereafter are not limited by this choice of nomenclature.
Machine-learned model types may include, but are not limited to, generalized linear models, Bayesian regression, random forests, and deep models such as neural networks, convolutional neural networks, and recurrent neural networks. Machine-learned model types, whether they are considered deep or not, are usually associated with additional “hyperparameters” which further describe the model. For example, hyperparameters providing further detail about a neural network may include, but are not limited to, the number of layers in the neural network, choice of activation functions, inclusion of batch normalization layers, and regularization strength. Commonly, in the literature, the selection of hyperparameters surrounding a model is referred to as selecting the model “architecture”. As such, a DL-based framework consists of methods and systems to transform data, or otherwise determine a quantity, which leverage at least one machine-learned model which may be considered deep. A DL-based framework may include methods and processes to select a machine-learned model type and associated architecture, evaluating said machine-learned model, and using the machine-learned model in a production setting (also known as deployment of the machine-learned model).
In accordance with one or more embodiments,
Based on the machine-learned model being developed, seismic data are selected from the seismic data database (302) according to a data type (such as streamer data (304)), as shown in block 310. The selected seismic data are referenced as Data A (311). Continuing with the DL-based framework (300) depicted in
m=m(x), EQ 2
where x represents a spatial coordinate, such as a location in a subterranean region of interest (102) defined by an x-axis coordinate, a y-axis coordinate, and a depth, d, (e.g. (x, y, d)), and m is a vector indicating the directional velocities at the spatial coordinate x. In some implementations, the subterranean region of interest (102) may be isotropic such that the velocity m at a spatial coordinate may be represented as a scalar.
To build the synthetic velocity models, as depicted in block 317, the initial velocity model B (315) is perturbed according to perturbation parameters (316). The perturbation parameters may indicate the number of synthetic velocity models to produce and a set of parameters governing the likelihood and magnitude of variation to be applied to the initial velocity model B (315). The resulting perturbed synthetic velocity models are known as velocity models A (318). To be concrete, if K synthetic velocity models are generated by perturbing the initial velocity model B (315), then velocity models A (318) is composed of K velocity models, which may be represented as m1, m2, . . . , mK-1, mK.
Continuing with
In EQ. 3, m is the velocity vector at a spatial coordinate x as given by a supplied velocity model, ∇2 is the Laplacian operator, p represents the seismic wave wavefield, xs is the spatial coordinate for a seismic source (106), and ƒ(t) is the signature of the seismic source (106) (e.g., a Ricker wavelet (204)). Thus, seismic data can be simulated at an arbitrary seismic receiver (120) location xs. The recorded simulated seismic data may be obtained through the expression:
d(xr,t;xs)=p(x,t;xs)δ(x−xr). EQ 4
The entire forward modelling process may be represented as F, such that
D=F(m;survey configuration). EQ 5
The forward modelling process, F, accepts a velocity model of the subterranean region of interest (102) (e.g., one of the velocity models from velocity models A (318)), and a survey configuration (321). The survey configuration (321) includes, at a minimum, information about the seismic source (106) location, the emitted source signature, and the location of the seismic receivers (120). D represents the recorded data, or the simulated recorded data at each seismic receiver (120). In other words, D is a collection of traces, herein referred to as a seismic data set. Because a seismic data set D is a collection of traces, where each trace is a record in time of the amplitude of ground motion at a location of a seismic receiver (120), the seismic data set D can be said to be in the space-time domain (“X-t domain”). The forward modelling process is depicted in block 320 of
As depicted in block 324, each seismic data set in the simulated seismic data (322) is transformed from the space-time domain (“X-t domain”) to the wavenumber-time domain (“K-t domain”). In accordance with one or more embodiments, the transformation is applied by first sorting a seismic data set D to the common middle point (CMP) domain, D′(h, t; xm), where xm is the surface middle point defined as
and h is the offset, or h=xr−xs. Once a seismic data set D is in the CMP domain, D′, a Fourier transform () is applied along the offset axis to obtain the seismic data set in the K-t domain, {circumflex over (D)}:
{circumflex over (D)}(k,t;xm)=(D′(h,t;xm)). EQ 6
Seismic data that has been transformed from the X-t domain to the K-t domain is referenced as transformed seismic data (326).
Generally, training a machine-learned model requires that pairs of inputs and one or more outputs are passed to the machine-learned model. More details surrounding the training process will be provided below, however, suffice to say that during training the machine-learned model “learns” a representative model which maps the received inputs to the associated outputs. In the DL-based framework (300), each transformed seismic data set in the transformed seismic data (326) is associated with a velocity model from the velocity models A (318). A transformed seismic data set may be considered an input to the machine-learned model and the associated velocity model (from velocity models A (318)) may be considered the output. As shown in block 332, a machine-learned model (such as a deep model) is trained using pairs of inputs (shown with the directed line labelled 328) and outputs (shown with the directed line labelled 330). In summary, a machine-learned model is trained (block 332) using the transformed seismic data (326) and the velocity models A (318). The resulting machine-learned model is referred to as a trained machine-learned model (334) and is the ultimate product of the DL-based framework (300).
In accordance with one or more embodiments, the machine-learned model of the DL-based framework (300) is a long short-term memory (LSTM) network, which is a deep model. To best understand a LSTM network, it is helpful to describe the more general recurrent neural network, for which an LSTM may be considered a specific implementation.
Output=RNN Block(Input,State). EQ 7
The RNN Block (510) generally comprises one or more matrices and one or more bias vectors. The elements of the matrices and bias vectors are commonly referred to as “weights” or “parameters” in the literature such that the matrices may be referenced as weight matrices or parameter matrices without ambiguity. However, it is noted that for problems with higher dimensional inputs (e.g. inputs with a tensor rank greater than or equal to 2), the weights of an RNN Block (510) may be contained in higher order tensors, rather than in matrices or vectors. For clarity, the present example will consider Inputs (520) as vectors such that the RNN Block (510) comprises one or more weight matrices and bias vectors, however, one with ordinary skill in the art will appreciate that this choice does not impose a limitation on the present disclosure. Typically, an RNN Block (510) has two weight matrices and a single bias vector which are distinguished with an arbitrary naming nomenclature. A commonly employed naming convention is to call one weight matrix W and the other U and to reference the bias vector as {right arrow over (b)}.
An important aspect of an RNN is that it is intended to process sequential, or ordered, data; for example, a time-series. Consequently, an Input (520) may be considered a single sequential part. As an illustration, consider a sequence composed of S parts. Each part may be considered an input, indexed by t, such that the sequence may be written as sequence=[input1, input2, inputt, . . . , inputS-1, inputS]. Each Input (520) (e.g., input1 of a sequence) may be a scalar, vector, matrix, or higher-order tensor. For the present example, as previously discussed, each Input (520) is considered a vector with m elements. To process a sequence, an RNN receives the first ordered Input (520) of the sequence, input1, along with a State (530), and processes them with the RNN Block (510) according to EQ. 7 to produce an Output (540). The Output (540) may be a scalar, vector, matrix, or tensor of any rank. For the present example, the Output (540) is considered a vector with n elements. The State (530) is of the same type and size as the Output (540) (e.g., a vector with n elements). For the first ordered input, the State (530) is usually initialized with all of its elements set to the value zero. For the second ordered Input (520), input2, of the sequence, the Input (520) is processed similarly according to EQ. 7, however, the State (530) received by the RNN Block (510) is set to the value of the Output (540) determined when processing the first ordered Input (520). This process of assigning the State (530) the value of the last produced Output (540) is depicted with the recurrent connection (550) in
In greater detail, the process of the RNN Block (510), or EQ. 7, may be generally written as
Output=RNN Block(input,state)=ƒ(U·state+W·input+{right arrow over (b)}), EQ 8
where W, U, and {right arrow over (b)} are the weight matrices and bias vector of the RNN Block (510), respectively, and ƒ is an “activation function”. Some functions for ƒ may include the sigmoid function
and rectified linear unit (ReLU) function ƒ(x)=max(0, x), however, many additional functions are commonly employed.
To further illustrate an RNN,
As previously stated, generally, training a machine-learned model requires that pairs of inputs and one or more outputs are passed to the machine-learned model. During this process the machine-learned model “learns” a representative model which maps the received inputs to the associated outputs. In the context of an RNN, the RNN receives a sequence, wherein the sequence can be partitioned into one or more sequential parts (Inputs (520) above), and maps the sequence to an overall output, which may also be a sequence. To remove ambiguity and distinguish the overall output of an RNN from any intermediate Outputs (540) produced by the RNN Block (510), the overall output will be referred to herein as a RNN result. In other words, an RNN receives a sequence and returns a RNN result. The training procedure for a RNN comprises assigning values to the weight matrices and bias vector of the RNN Block (510). For brevity, the elements of the weight matrices and bias vector will be collectively referred to as the RNN weights. To begin training the RNN weights are assigned initial values. These values may be assigned randomly, assigned according to a prescribed distribution, assigned manually, or by some other assignment mechanism. Once the RNN weights have been initialized, the RNN may act as a function, such that it may receive a sequence and produce a RNN result. As such, at least one sequence may be propagated through the RNN to produce a RNN result. For training, a given data set will be composed of one or more sequences and desired RNN results, where the desired RNN results represent the “ground truth”, or the true RNN results that should be returned for given sequences. For clarity, the desired or true RNN results will be referred to as “targets”. When processing sequences, the RNN result produced by the RNN is compared to the associated target. The comparison of a RNN result to the target(s) is typically performed by a so-called “loss function”; although other names for this comparison function such as “error function” and “cost function” are commonly employed. Many types of loss functions are available, such as the mean squared error function, however, the general characteristic of a loss function is that the loss function provides a numerical evaluation of the similarity between the RNN result and the associated target(s). The loss function may also be constructed to impose additional constraints on the values assumed by RNN weights, for example, by adding a penalty term, which may be physics-based, or a regularization term. Generally, the goal of a training procedure is to alter the RNN weights to promote similarity between the RNN results and associated targets over a provided data set, known as the “training data set” or “training set”. Thus, the loss function is used to guide changes made to the RNN weights, typically through a process called “backpropagation” or “backpropagation through time”.
While a full review of the backpropagation process exceeds the scope of this disclosure, a brief summary is provided. Backpropagation consists of computing the gradient of the loss function over the RNN weights. The gradient indicates the direction of change in the RNN weights that results in the greatest change to the loss function. Because the gradient is local to the current RNN weights, the RNN weights are typically updated by a “step” in the direction indicated by the gradient. The step size is often referred to as the “learning rate” and need not remain fixed during the training process. Additionally, the step size and direction may be informed by previously seen RNN weights or previously computed gradients. Such methods for determining the step direction are usually referred to as “momentum” based methods.
Once the RNN weights have been updated, or altered from their initial values, through a backpropagation step, the RNN will likely produce different RNN results. Thus, the procedure of propagating at least one sequence through the RNN, comparing the RNN result with the associated target(s) with a loss function, computing the gradient of the loss function with respect to the RNN weights, and updating the RNN weights with a step guided by the gradient, is repeated until a termination criterion is reached. Common termination criteria are: reaching a fixed number of RNN weight updates, otherwise known as an iteration counter; a diminishing learning rate; noting no appreciable change in the loss function between iterations; reaching a specified performance metric as evaluated on the data or a separate hold-out data set (e.g., a “validation” or “test” data set composed of sequence and target pairs not used during training). Once the termination criterion is satisfied, and the RNN weights are no longer intended to be altered, the RNN is said to be “trained”.
A long short-term memory (LSTM) network may be considered a specific, and more complex, instance of a recurrent neural network (RNN).
(outputt,carryt)=LSTM Block(inputt,carryt-1,statet)=LSTM Block(inputt,carryt-1,outputt-1), EQ 9
where the LSTM Block, like the RNN Block, comprises one or more weight matrices and bias vectors and the processing steps necessary to transform an input, state, and carry to an output and new carry.
LSTMs may be configured in a variety of ways, however, the processes depicted in
{right arrow over (ƒ)}=a1(Uƒ·statet+Wƒ·inputt+{right arrow over (b)}ƒ),
where a1 is an activation function applied elementwise to the result of the parenthetical expression and the resulting vector is {right arrow over (ƒ)}. Block 565 implements the following operation
{right arrow over (i)}=a
2(Ui·statet+Wi·inputt+{right arrow over (b)}i),
where a2 is an activation function which may be the same or different to a1 and is applied elementwise to the result of the parenthetical expression. The resulting vector is {right arrow over (i)}. Block 570 implements the following operation
{right arrow over (c)}=a
3(Uc·statet+Wc·inputt+{right arrow over (b)}c),
where a3 is an activation function which may be the same or different to either a1 or a2 and is applied elementwise to the result of the parenthetical expression. The resulting vector is {right arrow over (c)}. In block 575, vectors {right arrow over (i)} and {right arrow over (c)} are multiplied according to
{right arrow over (z)}
3
={right arrow over (i)}⊙{right arrow over (c)},
where ⊙ indicates the Hadamard product (i.e., elementwise multiplication). Likewise, in block 585 the carry vector from the previous sequential input (carryt-1) vector and the vector {right arrow over (ƒ)} are multiplied according to
{right arrow over (z)}
4=carryt-1⊙{right arrow over (ƒ)}.
The results of the operations of blocks 575 and 585 ({right arrow over (z)}3 and {right arrow over (z)}4, respectively) are added together in block 580 to form the new carry (carryt);
carryt={right arrow over (z)}3+{right arrow over (z)}4.
In block 590, the current input and state vectors are processed according to
{right arrow over (o)}=a
4(Uo·statet+Wo·inputt+{right arrow over (b)}o),
where a4 is an activation function which may be unique or identical to any other used activation function and is applied elementwise to the result of the parenthetical expression. The result is the vector {right arrow over (o)}. In block 595, the new carry (carryt) is passed through an activation function a5. The activation a5 is usually the hyperbolic tangent function but may be any known activation function. The operations of block 595 may be represented as
{right arrow over (z)}
5
=a
5(carryt).
Finally, the output of the LSTM Block (outputt) is determined in block 598 by taking the Hadamard product of {right arrow over (z)}5 and {right arrow over (o)}, shown as
outputt={right arrow over (z)}5⊙{right arrow over (o)}.
The output of the LSTM Block is used as the state vector for the subsequent input. Again, as in the case of the RNN, the outputs of the LSTM Block applied to a sequence of inputs may be stored and further processed or, in some implementations, only the final output is retained. While the processes of the LSTM Block described above used vector inputs and outputs, it is emphasized that an LSTM network may be applied to sequences of any dimensionality. In these circumstances the rank and size of the weight tensors will change accordingly. One with ordinary skill in the art will recognized that there are many alterations and variations that can be made to the general LSTM structure described herein, such that the description provided does not impose a limitation on the present disclosure.
In accordance with one or more embodiments, the RNN result, or the final result of an LSTM, may be further processed with a neural network. A diagram of a neural network is shown in
Nodes (602) and edges (604) carry additional associations. Namely, every edge is associated with a numerical value. The edge numerical values, or even the edges (604) themselves, are often referred to as “weights” or “parameters” and are analogous to the weights of a RNN. While training a neural network (600), numerical values are assigned to each edge (604). Additionally, every node (602) is associated with a numerical variable and an activation function. Activation functions are not limited to any functional class, but traditionally follow the form
where i is an index that spans the set of “incoming” nodes (602) and edges (604) and ƒ is a user-defined function. Incoming nodes (602) are those that, when viewed as a graph (as in
and rectified linear unit (ReLU) function ƒ(x)=max(0, x), however, many additional functions are commonly employed. Every node (602) in a neural network (600) may have a different associated activation function. Often, as a shorthand, activation functions are described by the function ƒ by which it is composed. That is, an activation function composed of a linear function ƒ may simply be referred to as a linear activation function without undue ambiguity.
When the neural network (600) receives a network input (e.g., the final output of an LSTM), the network input is propagated through the network according to the activation functions and incoming node (602) values and edge (604) values to compute a value for each node (602). That is, the numerical value for each node (602) may change for each received input. Occasionally, nodes (602) are assigned fixed numerical values, such as the value of 1, that are not affected by the input or altered according to edge (604) values and activation functions. Fixed nodes (602) are often referred to as “biases” or “bias nodes” (606), displayed in
In some implementations, the neural network (600) may contain specialized layers (605), such as a normalization layer, or additional connection procedures, like concatenation. One skilled in the art will appreciate that these alterations do not exceed the scope of this disclosure.
As noted, the training procedure for the neural network (600) comprises assigning values to the edges (604). The training procedure for the neural network (600) is substantially similar to the training process for an RNN (or LSTM), where initial values are assigned the edges (604) and these values are updated via backpropagation according to a loss function. When a neural network (600) receives as a network input the RNN result (or final output of an LSTM), the neural network (600) is often considered part of the RNN (or LSTM). In other words, a RNN (or LSTM) may include a neural network (600). It is noted that when a RNN (or LSTM) includes a neural network (600), the weights and edge (604) values are learned together through a joint training process. A machine-learned model may be composed of both an RNN (e.g., a LSTM) and a neural network (600) and this machine-learned model may be referenced simply as a RNN (or LSTM) with implicit inclusion of the neural network (600).
In accordance with one or more embodiments,
A machine-learned model is trained using the first plurality of velocity models and the first plurality of transformed seismic data sets, as shown in block 710. Training the machine-learned model may encompass splitting the seismic data set and velocity model pairs into training, validation, and test sets. In accordance with one or more embodiments, the machine-learned model is trained using the training set and the hyperparameters of the machine-learned model are tuned by evaluating the machine-learned model on the validation set. Further, the generalization performance of the machine-learned model may be estimated by evaluating the model on the test set. In some implementations, the validation set and test set are the same. Further, one with ordinary skill in the art will appreciate that other common training procedures and techniques, such as cross-validation, may be employed without exceeding the scope of the present disclosure. In accordance with one or more embodiments, the seismic data set and velocity model pairs are split into training, validation, and test sets such that there is a balanced representation of velocity models between each respective set. Sets of velocity models may be compared for similarity through statistical descriptors such as the distribution (mean, standard deviation) of the velocity models contained within a set.
Keeping with
While the various blocks in
v=m(d)
or v=m(t), EQ10
where v is the scalar velocity (isotropic) which may be related to either a depth d in the subsurface region of interest or a time t (e.g., converted from d via depth-to-time conversion). Upon receiving an initial velocity model, a first plurality of velocity models is generated through perturbations.
Using the forward modelling process described above, the first plurality of velocity models (
Given that four survey configurations are used in the present characterization test, four pluralities of seismic data sets are created through the forward modelling process; namely, a first, second, third, and fourth plurality of seismic data sets each corresponding to the first plurality of velocity models (2000 perturbed models) and the four survey configurations, respectively. It is emphasized that in practice only a single survey configuration (321) is required and that the use of four configurations herein is to demonstrate the robustness of the DL-based framework (300).
Turning to
For the present example, the first, second, third, and forth pluralities of seismic data sets are used to train four machine-learned models; one per plurality of seismic data sets. For example, the first plurality of seismic data sets is used with the first plurality of velocity models to train a first machine-learned model. Likewise, the second plurality of seismic data sets are used with the first plurality of velocity models to train a second machine-learned model. These models, because they accept and act on untransformed seismic data sets, and therefore do not follow the DL-based framework (300) described herein, may be considered benchmark models (312). Further, for the purposes of the characterization test, the performance may be used as a useful baseline for models developed under the DL-based framework (300).
In a similar fashion, four additional machine-learned models are trained using the first, second, third, and fourth pluralities of transformed seismic data sets. For example, the first plurality of transformed seismic data sets is used with the first plurality of velocity models to train a fifth machine-learned model. In total, eight machine-learned models are trained, one for each plurality of seismic data sets and each plurality of transformed seismic data sets. It is emphasized that in practice eight machine-learned models would not be trained and that these models are only developed in the present example to compare the models and subsequently characterize the performance and robustness models developed under the DL-based framework (300).
For training the eight machine-learned models, the loss function employed is
L=∥y−ŷ∥
p
+α∥∇ŷ∥
p
+β∥ŷ∥
p, EQ 11
where y is the true (or target) velocity model, ŷ is the predicted velocity model determined by a machine-learned model, and the ∥·∥p operator indicates a mathematical norm of order p, where p is a hyperparameter. The term ∥y−ŷ∥p quantifies the difference, or error, between the predicted velocity model and the true velocity model. The expression ∥∇ŷ∥p quantifies the gradient of the predicted velocity model. Predicted velocity models with abrupt changes in velocity through the depth of the subsurface region of interest will result in a relatively large value for ∥∇ŷ∥p. Likewise, ∥ŷ∥p quantifies the overall magnitude of the velocities throughout the depth of the subsurface region of interest as predicted by the machine-learned model. Because, conventionally, loss functions are sought to be minimized, the latter two terms of EQ. 11 act as regularization terms where predicted velocity models with large gradients or large velocity values are penalized. α and β are hyperparameters and their values indicate the regularization strength of their associated terms. For the present example, the following values were used for the hyperparameters: p=1, α=1e−4 and β=1e−4.
As shown in
The plots
Additional MAE plots may be constructed for other combinations of training-test configurations. For example, training a machine-learned model on seismic data sets simulated under configuration (b) and testing the model's accuracy on seismic data sets generated with the remaining configurations. For brevity, these plots are not shown, however, the results are similar to that of
As stated, the examples of
Embodiments of the present disclosure may provide at least one of the following advantages. In accordance with one or more embodiments, the DL-based framework (300) described herein produces a machine-learned model that may determine a velocity model from seismic data. The machine-learned model is robust and may generalize to seismic data acquired using survey configurations that differ from the survey configurations (321) used to simulate the training data. As such, a single machine-learned model may generalize to many seismic data sets (of the same data type (streamer data, etc.)) without needing to re-train, or otherwise tailor, the machine-learned model to a specific survey configuration (321). This represents a significant reduction in cost, in terms of both time and computational resources, to produce one or more accurate machine-learned models.
The computer (1402) can serve in a role as a client, network component, a server, a database or other persistency, or any other component (or a combination of roles) of a computer system for performing the subject matter described in the instant disclosure. In some implementations, one or more components of the computer (1402) may be configured to operate within environments, including cloud-computing-based, local, global, or other environment (or a combination of environments).
At a high level, the computer (1402) is an electronic computing device operable to receive, transmit, process, store, or manage data and information associated with the described subject matter. According to some implementations, the computer (1402) may also include or be communicably coupled with an application server, e-mail server, web server, caching server, streaming data server, business intelligence (BI) server, or other server (or a combination of servers).
The computer (1402) can receive requests over network (1430) from a client application (for example, executing on another computer (1402) and responding to the received requests by processing the said requests in an appropriate software application. In addition, requests may also be sent to the computer (1402) from internal users (for example, from a command console or by other appropriate access method), external or third-parties, other automated applications, as well as any other appropriate entities, individuals, systems, or computers.
Each of the components of the computer (1402) can communicate using a system bus (1403). In some implementations, any or all of the components of the computer (1402), both hardware or software (or a combination of hardware and software), may interface with each other or the interface (1404) (or a combination of both) over the system bus (1403) using an application programming interface (API) (1412) or a service layer (1413) (or a combination of the API (1412) and service layer (1413). The API (1412) may include specifications for routines, data structures, and object classes. The API (1412) may be either computer-language independent or dependent and refer to a complete interface, a single function, or even a set of APIs. The service layer (1413) provides software services to the computer (1402) or other components (whether or not illustrated) that are communicably coupled to the computer (1402). The functionality of the computer (1402) may be accessible for all service consumers using this service layer. Software services, such as those provided by the service layer (1413), provide reusable, defined business functionalities through a defined interface. For example, the interface may be software written in JAVA, C++, or other suitable language providing data in extensible markup language (XML) format or another suitable format. While illustrated as an integrated component of the computer (1402), alternative implementations may illustrate the API (1412) or the service layer (1413) as stand-alone components in relation to other components of the computer (1402) or other components (whether or not illustrated) that are communicably coupled to the computer (1402). Moreover, any or all parts of the API (1412) or the service layer (1413) may be implemented as child or sub-modules of another software module, enterprise application, or hardware module without departing from the scope of this disclosure.
The computer (1402) includes an interface (1404). Although illustrated as a single interface (1404) in
The computer (1402) includes at least one computer processor (1405). Although illustrated as a single computer processor (1405) in
The computer (1402) also includes a memory (1406) that holds data for the computer (1402) or other components (or a combination of both) that can be connected to the network (1430). The memory may be a non-transitory computer readable medium. For example, memory (1406) can be a database storing data consistent with this disclosure. Although illustrated as a single memory (1406) in
The application (1407) is an algorithmic software engine providing functionality according to particular needs, desires, or particular implementations of the computer (1402), particularly with respect to functionality described in this disclosure. For example, application (1407) can serve as one or more components, modules, applications, etc. Further, although illustrated as a single application (1407), the application (1407) may be implemented as multiple applications (1407) on the computer (1402). In addition, although illustrated as integral to the computer (1402), in alternative implementations, the application (1407) can be external to the computer (1402).
There may be any number of computers (1402) associated with, or external to, a computer system containing computer (1402), wherein each computer (1402) communicates over network (1430). Further, the term “client,” “user,” and other appropriate terminology may be used interchangeably as appropriate without departing from the scope of this disclosure. Moreover, this disclosure contemplates that many users may use one computer (1402), or that one user may use multiple computers (1402).
Although only a few example embodiments have been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the example embodiments without materially departing from this invention. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the following claims. In the claims, means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures. Thus, although a nail and a screw may not be structural equivalents in that a nail employs a cylindrical surface to secure wooden parts together, whereas a screw employs a helical surface, in the environment of fastening wooden parts, a nail and a screw may be equivalent structures. It is the express intention of the applicant not to invoke 35 U.S.C. § 112(f) for any limitations of any of the claims herein, except for those in which the claim expressly uses the words ‘means for’ together with an associated function.