Embodiments of the subject matter disclosed herein generally relate to a system and method for processing seismic data for extracting information related to a formation in the subsurface, and more particularly, to identifying and/or characterizing subsurface resources for exploitation based on Full Waveform Inversion (FWI) performed in parallel on plural graphic processing units of a computing system.
The FWI is a powerful computational technique widely employed in various fields, for example, seismic data analysis for determining the location of subsurface resources, or a subsurface formation that is appropriate for the storage of carbon, or a solid subsurface formation that is appropriate for supporting a wind turbine. In seismic exploration, FWI plays a pivotal role in characterizing subsurface geological structures with unparalleled precision. By comparing observed seismic waveforms with simulated waveforms generated from theoretical models, FWI enables geophysicists to infer detailed information about subsurface properties such as velocity, density, and impedance. This information is invaluable in the oil and gas industry for optimizing reservoir management, reducing exploration risks, and improving hydrocarbon recovery. Beyond geophysics, FWI finds application in diverse scientific domains, including medical imaging, non-destructive testing, and materials science. Its ability to reconstruct high-resolution images or models by iteratively minimizing the misfit between observed and predicted data makes it a versatile tool for enhancing our understanding of complex systems and advancing various scientific endeavors.
Hydrocarbon exploration and development uses waves (e.g., seismic waves or electromagnetic waves) to explore the structure of underground formations on land and/or at sea (i.e., formations under the seafloor). As schematically illustrated in
In order to understand the structure of the explored underground formation (layers 121, 123, 125, and 127 and interfaces 112, 122, 124, and 126 in the specific example of
The misfit is commonly defined by the cost or objective function J, which measures, for example, the least-squares data difference between the observed d and synthetic data u (i.e., data calculated, for example, as discussed in [1] or [2]). The synthetic data u may be obtained from the reflectivity model r. FWI is an iterative approach requiring an a priori initial model, which is then repeatedly updated via an inversion algorithm. In an ideal case, the updated velocity model V will converge to the true model representing the observed data d. An exemplary flow of a conventional FWI process is presented in
The inversion process is computationally very demanding on a corresponding compute device and thus, it is of great interest to identify the most effective compute device for FWI. Traditionally, powerful computers (or supercomputers) are increasingly using powerful Central Processing Units (CPUs) with multiple cores and higher processing frequencies. Recently, Graphics Processing Unit (GPU) accelerators have proved to be much more effective than CPUs for wave-propagation, which is the most compute intensive part of FWI. For this reason, GPUs are becoming a preferred compute device for FWI algorithms.
Acoustic FWI has proved to be effective to automatically build a velocity model and improve the sub-surface imaging for different data types and geologic settings [3]. However, the missing elasticity in acoustic FWI may result in non-negligible effects in current FWI results. The authors in [4] use synthetic examples to show that, even in marine settings, accounting for elastic effects is important. Therefore, elastic FWI is gradually becoming the method of choice for automatic velocity model building.
However, the compute cost and memory requirement for elastic seismic wave simulation needed for elastic FWI are much more demanding than those for acoustic FWI. Therefore, there is a need for developing an elastic FWI algorithm that is GPU based and can overcome the memory limitation of one single GPU card during processing the seismic wave propagation.
According to an embodiment, there is a method for imaging a formation of a subsurface, and the method includes receiving input data d related to the subsurface, generating synthetic data u related to the subsurface, by applying an implicit finite-difference approach to a reflectivity model r, updating a velocity model V based on the input data d and the synthetic data u, and generating an image of the formation in the subsurface based on the updated velocity model, wherein the formation is used to locate natural resources.
According to another embodiment, there is a computing system for imaging a formation of a subsurface, and the computing system includes plural compute nodes, each compute node including an interface configured to receive input data d related to the subsurface and plural graphics processor units, GPUs connected to the interface. The GPUs are configured to generate synthetic data u related to the subsurface, by applying an implicit finite-difference approach to a reflectivity model r, update a velocity model V based on the input data d and the synthetic data u, and generate an image of the formation in the subsurface based on the updated velocity model, wherein the formation is used to locate natural resources.
According to yet another embodiment, there is a non-transitory computer readable medium including computer executable instructions, where the instructions, when executed by one or more graphics processor units, implement the methods discussed herein.
For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
The following description of the embodiments refers to the accompanying drawings. The same reference numbers in different drawings identify the same or similar elements. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims. The following embodiments are discussed with regard to an elastic FWI process ran on plural GPUs via domain decomposition. However, the embodiments to be discussed next are not limited to seismic data, but may be applied to other types of data, for example, electromagnetic wave data or acoustic data. Also, the embodiments discussed herein are not limited to applying the elastic FWI processing to the plural GPUs for detecting a formation in the subsurface, but may be applied to an elastic FWI processing in other fields, for example, medical imaging.
Reference throughout the specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with an embodiment is included in at least one embodiment of the subject matter disclosed. Thus, the appearance of the phrases “in one embodiment” or “in an embodiment” in various places throughout the specification is not necessarily referring to the same embodiment. Further, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
The FWI method uses forward modeling for calculating/estimating the seismic waves at a given point of interest in the subsurface, based on the known seismic waves generated by one or more sources, which are typically located at the earth's surface. The forward modeling is the method that governs partial differential equations for seismic wave simulation. It is used to generate the synthetic dataset u, and is the main engine in FWI application. FWI is typically running on a computing system 300 that includes multiple compute nodes 310, as illustrated in
Because a seismic source generates seismic waves periodically, for example, every couple of seconds or any other time interval, sometime for days if not weeks during a seismic survey campaign, a large number of waves is generated and all these waves need to be propagated (mathematically, i.e., through calculations) into the subsurface and reflected from many interfaces. Thus, wave simulation for each shot gather is performed through a numerical implementation such as finite-difference (FD) method, which discretizes the wavefield in the subsurface offset domain. An example of the discretization of the subsurface offset domain 400 is shown in
The compute node 310 memory requirement for such wave simulation is proportional to the modeling size, i.e., Nx×Ny×Nz. The maximum frequency in FWI is determined by GPU memory capacity in one compute node. As higher frequency FWI with more accurate physics as elastic FWI is gaining popularity, the memory requirement of seismic wave modeling could exceed the GPU memory capacity of a single compute node. Thus, it is not possible to perform wave simulation using a single compute node 310. For high frequency FWI, the modeling size is generally in the range of 1,0003≤Nx×Ny×Nz≤2,0003. Note that the term “high frequency” refers to the focus on capturing and analyzing high-frequency components of the seismic data. Seismic waves consist of various frequencies, and high-frequency waves can provide finer details about the subsurface than low-frequency waves. High-frequency FWI emphasizes the use of these higher-frequency components to improve the resolution and accuracy of subsurface imaging.
To overcome the memory limitation faced by a single compute node, multiple compute nodes can be used for FWI wavefield simulation via domain decomposition. The whole subsurface offset domain (corresponding to a volume of the surveyed subsurface) is decomposed into multiple subdomains, where the required memory for each subdomain can be fit into a single compute node. For example, the decomposition process is illustrated in
The FWI calculations can then be performed in parallel, for each subdomain 510 and 520, each compute node calculating wavefields in a corresponding subdomain. In this way, no matter how large the whole domain is, it can be decomposed into as many subdomains as necessary so that each subdomain can be handled by a compute node. To ensure accuracy of the finite-difference method, sufficient overlapping areas 512 and 522 are needed between neighboring subdomains 510 and 520, respectively. An overlapping area 512/522 is defined as a stencil size, which is discussed in detail later. The overlapping area is transferred across the GPU nodes via network switch 320. This action can add significant overhead to the simulation process because the network bandwidth is often multiple-orders smaller than the GPU bandwidth.
Domain decomposition is an approach that overcomes the memory limitation of the single compute node by utilizing multiple compute nodes for one wavefield simulation. Although domain decomposition is already used in the field, it still has challenges that prevent the method to be feasible and affordable, see, for example, the discussion in [5]. The most challenging issue for the existing approach is the extreme imbalance between the GPU bandwidth and the network bandwidth. In this regard, the network bandwidth is currently in the range of 10 Gbit/sec (1.25 GByte/sec)-200 Gbit/sec (25 GByte/sec), which is much slower than the GPU bandwidth, which is in the range of 1,000-8,000 GByte/sec. This large difference in bandwidth makes data exchange prohibitively expensive compared to the GPU computation cost and thus, makes the domain decomposition approach less attractive.
The inventors have found that reducing the amount of data exchange between the compute nodes will balance the bandwidth data exchange among the various compute nodes and within each compute node. According to an embodiment, such a method uses implicit finite-differences for solving the differential equations for the domain 400 in
More specifically, the finite-difference method is utilized for solving differential equations by approximating the derivatives with finite differences. Mathematically, the first order derivative of a quantity, P, for example, the pressure, with respect to the spatial variable, x, can be approximated using explicit finite-difference as follows.
where aj with −N≤j≤N is an explicit finite-difference coefficient and N, known as the stencil size, represents the order of accuracy. Explicit finite-difference is widely used for wave simulation because of its straightforward implementation, see, for example, [6]. However, this approach requires a large stencil size to obtain a high order of accuracy and therefore, it is not a good candidate for domain decomposition.
Implicit finite-difference is another approach that produces high accuracy with a smaller stencil size, see, for example, [7]. Its mathematical formulation can be expressed as follows.
Here bj, with −M≤j≤M is an implicit finite difference coefficient and M+N represents the total stencil size. Note that the explicit finite difference of equation (1) is a reduced form of the implicit finite difference when M=0 and b0=1.
In addition, the direction with the greatest number of grid points is chosen as the domain splitting direction to minimize the overall data exchange. In one embodiment, it is possible to parallelize network communication and GPU computation as they can be performed independently. Therefore, the total running time is that of the slower process, as opposed to the sum of both processes.
The method of domain decomposition using the implicit finite-difference scheme is now discussed in more detail with regard to
The first novel scheme is related to using optimized implicit finite-difference scheme to reduce the overlapping area. Explicit finite-difference scheme is widely used for wave equation simulation because of its straightforward implementation (see, [6]). However, this approach requires a large stencil size to obtain high accuracy. Therefore, the data exchange for the overlapping area (which is proportional to the stencil size) is substantially large.
In this regard, note that a connection between the stencil size and the overlapping area of two subdomains in domain decomposition is related to the communication requirements and the numerical accuracy of a parallel simulation. The stencil size determines how many neighboring grid points or nodes (in
In domain decomposition, when a computational domain is divided into smaller subdomains for parallel processing, an overlapping area 512/522 is introduced at the boundaries of adjacent subdomains 510 and 520. This overlapping area 512/522 is used to exchange information between neighboring subdomains to ensure that boundary conditions and interactions between subdomains are properly handled.
Thus, the choice of stencil size can affect the communication requirements between subdomains in a domain decomposition approach. Specifically, a larger stencil size can require more neighboring grid points to be accessed for computation, potentially leading to more data exchange between subdomains. The stencil size may influence the number of cells needed in the overlapping area. A larger stencil often requires a wider overlapping region to accommodate the data needed for computations. There is a trade-off between stencil size and communication overhead. While a larger stencil may improve numerical accuracy, it can also increase the amount of data that needs to be exchanged between subdomains during parallel computations. This can lead to increased communication costs and reduced parallel efficiency.
The specific choice of stencil size and overlapping area depends on the nature of the problem, the available computational resources, and the goals of the parallel simulation. Achieving the right balance is often a part of the optimization process when parallelizing numerical simulations to ensure both accuracy and efficiency in a parallel computing environment.
In this embodiment, to reduce the data exchange cost, according to this first novel scheme, an optimized implicit finite-difference approach is used. The stencil size for implicit finite-difference can be generally 3-5 times smaller than the one from explicit finite-difference for the same accuracy. For this implementation, the modeling grid spacing requires the accuracy of spatial derivatives calculation up to 92% of the maximum frequency. With this accuracy level, the stencil size of explicit finite-difference requires 31 grid points for data exchange whereas the proposed implicit finite-difference requires 7 grid points, as previously shown in
To demonstrate the importance of data reduction of the overlapping area 512/522, consider the domain decomposition of the whole modeling medium 500 of size of Nx×Ny×Nz into two subdomains 510 and 520, where data is split along the x direction (see
For this discussion, it is assumed that the number of grid points for the overlapping area 512/522 is equal to Ax. BWGPU and BWnetwork are the GPU and network bandwidth, which represent the performance of GPU computation and data transfer through network, respectively.
The GPU computation cost, CGPU, for each subdomain can be approximated as:
while the data transfer cost, Cnetwork, can be approximated as:
To minimize overhead from data transfer, the data transfer cost must be less than or equal to the GPU computation cost. Thus, the ideal stencil size for the overlapping area can be computed as:
For high frequency elastic FWI, Nx is approximately 1,600 grid points. The average network bandwidth, BWnetwork, in the current industry is 12.5 Gbyte/see and the GPU bandwidth, BWGPU, is approximately 4,000 Gbyte/sec. From equation (3), the ideal overlapping area can be computed as follows:
Thus, it is not possible to completely hide the data transfer cost since the implicit finite-difference requires a stencil size of 7, which is larger than the calculated Ax. In fact, the data transfer cost is approximately 3 to 4 times more expensive than the GPU computation cost. This high data transfer cost may be addressed with the second novel scheme, which is designed to mitigate the imbalance of GPU computation and network communication.
To further minimize the amount of data exchange, according to an embodiment, the direction with the greatest number of point grids is chosen as the domain splitting direction. For example, assuming the number of grid points in x direction is the largest, i.e., Nx≥Ny and Nx≥Nz, the direction of splitting in x direction will give the least data exchange (which is proportional to Ny×Nz) since Ny×Nz≤Ny×Nx and Ny×Nz≤Nx×Nz.
Thus, the second novel scheme is related to parallelizing computation and network communication. Note that each of the first and second novel schemes discussed herein may be applied independent of each other to a given system. However, in one application, both schemes are applied to the same system.
The GPU accelerator 314 and network interface card 318 (see
In step 702, the full portion of the y derivative calculation is performed and in step 704, the full portion of the z derivative calculation is performed, independent of the y derivative calculation because the spatial derivative calculation in y and z directions does not need data from the neighbor subdomain. For the spatial derivative in the x direction, most of the grid points can be computed independently in step 706 of the neighbor subdomain. Only a portion of grid point that equals the stencil size next to the subdomain boundary 512/522, as shown
As shown in the first proposed scheme, the data exchange cost is 3 to 4 times more expensive than the spatial derivative cost. The total cost of spatial derivative calculation in x, y and z directions is still less than the data exchange cost. With the proposed concurrent scheme illustrated in
A method that integrates both schemes is now discussed with regard to
In step 804, the spatial domain and the time domain are discretized. This means that the spatial domain (e.g., offset domain 400 in
i.e., a partial derivative of the estimated data u with the time, becomes
where uin+1 represents the value of the solution at spatial point i and time step n+1, and Δt is the time step. If the estimated data u varies in space, i.e.,
this term becomes
where Δx is the spatial grid spacing along the x axis. Note that the time and space are now discrete, i.e., they do not vary in a continuous way. Also note that for the last term, the time n+1 is the same, but the points where the term u is calculated are different, i.e., i+1 and i. For the FWI approach, a cost function J is used to evaluate a difference between the input data d and the estimated data u, and thus, the example PDEs discussed above may also be applied to the cost function, not only to the reflectivity model.
In step 808, the linear system of equations (for the reflectivity model, or cost function, or velocity model or any combination of these elements) is solved using the implicit finite-difference approach. The implicit finite-difference method introduces such a system of equations at each time step. Note that the values at time step n+1 appear on both sides of the equations for the linear system. This leads to a system of equations, which can be written as a matrix equation Aun+1=un+Δtb, where un+1 is a vector containing the values of the solution at all spatial points at time step n+1, un contains the values at time step n, and b is a vector determined by the boundary conditions. A is a matrix that depends on the spatial discretization and the PDEs. Solving the system of linear equations is the most computationally intensive part of the implicit method. Improvements for the calculations at this step may be added, for example, the method discussed in
After solving the linear system in step 808, the values of the solution are obtained at the next time step, un+1, and they are used in step 810 for updating the velocity model. A cost function that describes the misfit between the input data d and the synthetic data u may be used in this step. This process is repeated for subsequent time steps by updating the vectors and matrix according to the new time step, until a certain condition is met in step 812. The condition may be related to a certain number of iterations, or to a given time period, etc.
Note that the velocity model is a representation of the subsurface that describes how the speed of seismic waves changes with depth. It provides information about the variation in seismic wave velocities within the Earth's layers. Velocity models are needed for seismic data processing and imaging. They help seismic data processing software to accurately calculate the travel times of seismic waves as they propagate through the subsurface. This information is required for creating seismic images as the velocity model indicates the locations in the subsurface where the seismic waves will be reflected, refracted, and those locations form the image of the subsurface. Velocity models are constructed by analyzing seismic data and using various techniques, such as seismic tomography, to estimate the velocities of subsurface materials at different depths.
Once the final solution is obtained, the results are transformed, in step 814, into an image of the subsurface, which shows one or more features indicative of an oil or gas reservoir, an appropriate CO2 storage reservoir, ore resources, or a strong layer that can support wind turbines.
The implicit finite-difference method is particularly useful for problems with stiff differential equations, where explicit methods require extremely small-time steps for stability. However, it does require solving a system of equations at each time step, which can be computationally expensive for large grids. Nonetheless, it is a powerful technique for accurately simulating a wide range of physical phenomena.
The results achieved with the method of
When compared to the single-node simulation, the 2-node simulation through domain decomposition can use twice the memory, and therefore, increase the modeling frequency by 25%. To evaluate the performance of the 2-node scheme, the inventors first selected a modeling frequency such that the memory is almost fully utilized for a single node, and then increased the frequency by 25% for the 2-node simulation. The inventors observed that, after considering the difference in the node number and frequency, the normalized cost of the 2-node simulation is approximately 30% higher than the single-node simulation. This cost overhead can be reduced by utilizing network cards of a higher bandwidth. It is noted that more GPU nodes can be used to further increase the modeling frequency with a potentially higher overhead, which is due to the larger data exchange.
With the domain decomposition approach proposed above, high frequency elastic FWI could be performed using ocean bottom node (OBN) data. As shown in
High-frequency elastic FWI requires substantial memory resources that could exceed the capacity of a single GPU node. This limitation was successfully overcome in the above embodiments by using multiple nodes, for one wavefield simulation, through domain decomposition. One or more of the above discussed embodiments reduced the overhead from the cross-node data exchange by optimizing the finite-difference scheme, for example, choosing a preferred splitting direction, and parallelizing communication and computation. This approach makes high frequency elastic FWI possible and affordable with average GPU cards, which is desirable for achieving the full potential of the recorded seismic data.
The method described above with regard to
The above-discussed procedures and methods may be implemented in a compute device 310 having the structure illustrated in
Server 1101 may also include one or more data storage devices, including hard drives 1112, CD-ROM drives 1114 and other hardware capable of reading and/or storing information, such as DVD, etc. In one embodiment, software for carrying out the above-discussed steps may be stored and distributed on a CD-ROM or DVD 1116, a USB storage device 1118 or other form of media capable of portably storing information. These storage media may be inserted into, and read by, devices such as CD-ROM drive 1114, disk drive 1112, etc. Server 1101 may be coupled to a display 1120, which may be any type of known display or presentation screen, such as LCD, plasma display, cathode ray tube (CRT), etc. A user input interface 1122 is provided, including one or more user interface mechanisms such as a mouse, keyboard, microphone, touchpad, touch screen, voice-recognition system, etc.
Server 1101 may be coupled to other devices, such as sources, sensors, or any other seismic data system. The server may be part of a larger network configuration as in a global area network (GAN) such as the Internet 1128, which allows ultimate connection to various landline and/or mobile computing devices.
As described above, the apparatus 1100 may be embodied by a computing device. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single “system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
In an example embodiment, the processor 1102 may be configured to execute instructions stored in the memory device 1104 or otherwise accessible to the processor. Alternatively, or additionally, the processor may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor is embodied as an ASIC, FPGA or the like, the processor may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor is embodied as an executor of software instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor may be a processor of a specific device (e.g., a pass-through display or a mobile terminal) configured to employ an embodiment of the present invention by further configuration of the processor by instructions for performing the algorithms and/or operations described herein. The processor may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor.
The term “about” is used in this application to mean a variation of up to 20% of the parameter characterized by this term.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first object or step could be termed a second object or step, and, similarly, a second object or step could be termed a first object or step, without departing from the scope of the present disclosure. The first object or step, and the second object or step, are both, objects or steps, respectively, but they are not to be considered the same object or step.
The terminology used in the description herein is for the purpose of describing particular embodiments and is not intended to be limiting. As used in this description and the appended claims, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Further, as used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context.
The disclosed embodiments provide a methodology for performing domain decomposition for high-frequency elastic FWI. It should be understood that this description is not intended to limit the invention. On the contrary, the embodiments are intended to cover alternatives, modifications and equivalents, which are included in the spirit and scope of the invention as defined by the appended claims. Further, in the detailed description of the embodiments, numerous specific details are set forth in order to provide a comprehensive understanding of the claimed invention. However, one skilled in the art would understand that various embodiments may be practiced without such specific details.
Although the features and elements of the present embodiments are described in the embodiments in particular combinations, each feature or element can be used alone without the other features and elements of the embodiments or in various combinations with or without other features and elements disclosed herein.
This written description uses examples of the subject matter disclosed to enable any person skilled in the art to practice the same, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the subject matter is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims.
The entire content of all the publications listed herein is incorporated by reference in this patent application.
Number | Date | Country | |
---|---|---|---|
63490269 | Mar 2023 | US |