Geospatial maps and models may be utilized for the discovery and exploitation of desirable subterranean fluids (e.g., hydrocarbons). In particular, geological and petrophysical data related to said maps and models may aid in optimizing the development of hydrocarbon-bearing subterranean formations, estimating the total volume of recoverable hydrocarbons, forecasting production volumes, and identifying future targets for hydrocarbon exploration and development. The geological and petrophysical data derived from said maps and models may be utilized for independent assessments or may function as an input to other models including reservoir flow simulations, hydraulic fracturing models, pre-drill production estimates, subsidence models, data augmentation algorithms, and machine learning (ML). Developing such models may involve a variety of data, including, the collection and utilization of core data. For data-driven or ML-based petrophysical interpretation models, the amount of core data needed for training may be directly related to the complexity of the model. Some ML models, such as deep learning models, have many hyper-parameters. Having access to a large repository of core data when training an associated ML model may be beneficial with respect to avoiding model overfitting.
Core data provides a high level of detail regarding the geological and petrophysical properties of the target formation, however core samples and the associated data may be expensive to procure. Therefore, core data may only be gathered on a few select wells which have been identified by technical specialists (ex: geologists, geophysicists, petrophysicists, and petroleum engineers) as being located in a particular area of interest. Additionally, core data is often treated as a confidential or proprietary asset wherein such data may not be frequently shared between companies. Given the limited number of core samples collected, it may be challenging to extrapolate and generalize the core dataset across a large geospatial area. Thus, a low or insufficient quantity of core samples may be prohibitive to generating useful or functional geospatial maps and models due to the lack of data across a geospatial area of interest.
These drawings illustrate certain aspects of some examples of the present disclosure and should not be used to limit or define the disclosure.
This disclosure details a method and system for augmenting a data set comprised of measured data collected from rock samples known as cores. The quantity and geographic distribution of the collected data may be sparse in comparison to the geographic area over which the data needs to be applied. Generally, the systems and methods discussed below relate to a system and method for utilizing Radial Basis Mapping Function (RBF) to augment a core sample dataset. RBF iterates through the obtained core sample dataset, estimates a kernel function, and estimates a corresponding synthetic target value. In another example, Principal Component Analysis may be utilized to generate synthetic data of the obtained core sample dataset. In both examples, the synthetic target value or synthetic data is joined into the originally obtained dataset resulting in an augmented dataset.
As illustrated in
Borehole 104 may extend through subterranean formations 100. As illustrated in
As illustrated, a drilling platform 110 may support a derrick 112 having a traveling block 114 for raising and lowering drill string 116. Drill string 116 may include, but is not limited to, drill pipe and coiled tubing, as generally known to those skilled in the art. A kelly 118 may support drill string 116 as it may be lowered through a rotary table 120. A drill bit 122 may be attached to the distal end of drill string 116 and may be driven either by a downhole motor and/or via rotation of drill string 116 from surface 108. Without limitation, drill bit 122 may include, roller cone bits, PDC bits, natural diamond bits, any hole openers, reamers, coring bits, and the like. As drill bit 122 rotates, it may create and extend borehole 104 that penetrates various subterranean formations 100. Proximally disposed to the drill bit may be a bottom hole assembly (BHA) 117 which without limitation may comprise stabilizers, reamers, mud motors, logging while drilling (LWD) tools, measurement while drilling (MWD) or directional drilling tools, heavy-weight drill pipe, drilling collars, jars, coring tools, and underreaming tools. A pump 124 may circulate drilling fluid through a feed pipe 126 through kelly 118, downhole through interior of drill string 116, through orifices in drill bit 122 back to surface 108 via annulus 128 surrounding drill string 116, and into a retention pit (not shown).
With continued reference to
Drill string 116, drill bit 122 and drilling BHA 117 may be removed from the well, through a process called “tripping out of hole,” or a similar process. A coring bit 122 and coring BHA 117 are installed on drill string 116 which is then run back into borehole 104 through a process which may be called “tripping in hole,” or a similar process. The face of coring bit 122 may consist of a toroidal cutting edge with a hollow center that extends full-bore through the body of coring bit 122. With coring bit 122 being the endmost piece of equipment in BHA 117, disposed proximally thereto is a rock sample containment vessel which may be known as a core barrel 130. Once coring bit 122 is in contact with the bottom of the borehole 107 it is rotationally engaged with target subterranean formation 102 to cut and disengage a portion of target subterranean formation 102 in the form of a core. As coring bit 122 progresses further into target subterranean formation 102, the portion of the rock that is disengaged from target subterranean formation 102 is progressively encased in a core barrel 130 until the entirety of the sample is disengaged from target subterranean formation 102 and encased within core barrel 130. In some embodiments the core sample is relayed from core barrel 130 to the rig floor 115 by removing drill string 116 from borehole 104. In non-limiting alternate embodiments, a wireline truck 150 and a wireline, electric line, braided cable, or slick line 152 may be used to relay core barrel 130 through the center of drill string 116 to rig floor 115.
As illustrated, communication link 140 (which may be wired or wireless, for example) may be provided that may transmit data during the coring operation from BHA 117 to an information handling system 138 at surface 108. Information handling system 138 may include a personal computer 141, a video display 142, a keyboard 144 (i.e., other input devices.), and/or non-transitory computer-readable media 146 (e.g., optical disks, magnetic disks) that may store code representative of the methods described herein. In addition to, or in place of processing at surface 108, processing may also occur downhole as information handling system 138 may be disposed on BHA 117. As discussed above, the software, algorithms, and modeling are performed by information handling system 138. Information handling system 138 may perform steps, run software, perform calculations, and/or the like automatically, through automation (such as through artificial intelligence (“AI”), dynamically, in real-time, and/or substantially in real-time.
Once retrieved from borehole 104, the at least one core may be packaged and transported to a core laboratory 160 where a multitude of tests may be performed to identify create a core sample data set which may be populated with geological and petrophysical features wherein some non-limiting examples include formation sedimentology, mineralogy, formation wettability, fluid saturations and distributions, formation factor, pore structure and pore volume, capillary pressure behavior, sediment grain density, horizontal and vertical permeability and relative permeabilities, porosity, and presence of diagenesis. Communication link 170 may be configured to transmit data during core analysis operations in core laboratory 160 to an information handling system 138. The data obtained during the petrophysical analysis in core laboratory 160 may be stored in a structured database or in an unstructured form on an information handling system 138 which may include a personal computer 141, a video display 142, a keyboard 144 (i.e., other input devices.), and/or non-transitory computer-readable media 146 (e.g., optical disks, magnetic disks) that may store code representative of the methods described herein. In addition to, or in place of processing at core laboratory 160, processing related to the collection of the core data set may also take place offsite from core laboratory 160. As discussed above, the software, algorithms, and modeling are performed by information handling system 138. Information handling system 138 may perform steps, run software, perform calculations, and/or the like automatically, through automation (such as through artificial intelligence (“AI”), dynamically, in real-time, and/or substantially in real-time.
Each individual component discussed above may be coupled to system bus 204, which may connect each and every individual component to each other. System bus 204 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. A basic input/output (BIOS) stored in ROM 208 or the like, may provide the basic routine that helps to transfer information between elements within information handling system 138, such as during start-up. Information handling system 138 further includes storage devices 214 or computer-readable storage media such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive, solid-state drive, RAM drive, removable storage devices, a redundant array of inexpensive disks (RAID), hybrid storage device, or the like. Storage device 214 may include software modules 216, 218, and 220 for controlling processor 202. Information handling system 138 may include other hardware or software modules. Storage device 214 is connected to the system bus 204 by a drive interface. The drives and the associated computer-readable storage devices provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for information handling system 138. In one aspect, a hardware module that performs a particular function includes the software component stored in a tangible computer-readable storage device in connection with the necessary hardware components, such as processor 202, system bus 204, and so forth, to carry out a particular function. In another aspect, the system may use a processor and computer-readable storage device to store instructions which, when executed by the processor, cause the processor to perform operations, a method or other specific actions. The basic components and appropriate variations may be modified depending on the type of device, such as whether information handling system 138 is a small, handheld computing device, a desktop computer, or a computer server. When processor 202 executes instructions to perform “operations”, processor 202 may perform the operations directly and/or facilitate, direct, or cooperate with another device or component to perform the operations.
As illustrated, information handling system 138 employs storage device 214, which may be a hard disk or other types of computer-readable storage devices which may store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, digital versatile disks (DVDs), cartridges, random access memories (RAMs) 210, read only memory (ROM) 208, a cable containing a bit stream and the like, may also be used in the exemplary operating environment. Tangible computer-readable storage media, computer-readable storage devices, or computer-readable memory devices, expressly exclude media such as transitory waves, energy, carrier signals, electromagnetic waves, and signals per se.
To enable user interaction with information handling system 138, an input device 222 represents any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech and so forth. Additionally, input device 222 may receive core samples or data derived from core samples obtained in core laboratory 160, discussed above. An output device 224 may also be one or more of a number of output mechanisms known to those of skill in the art. In some instances, multimodal systems enable a user to provide multiple types of input to communicate with information handling system 138. Communications interface 226 generally governs and manages the user input and system output. There is no restriction on operating on any particular hardware arrangement and therefore the basic hardware depicted may easily be substituted for improved hardware or firmware arrangements as they are developed.
As illustrated, each individual component describe above is depicted and disclosed as individual functional blocks. The functions these blocks represent may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software and hardware, such as a processor 202, that is purpose-built to operate as an equivalent to software executing on a general purpose processor. For example, the functions of one or more processors presented in
Chipset 300 may also interface with one or more communication interfaces 226 that may have different physical interfaces. Such communication interfaces may include interfaces for wired and wireless local area networks, for broadband wireless networks, as well as personal area networks. Some applications of the methods for generating, displaying, and using the GUI disclosed herein may include receiving ordered datasets over the physical interface or be generated by the machine itself by processor 202 analyzing data stored in storage device 214 or RAM 210. Further, information handling system 138 receive inputs from a user via user interface components 304 and execute appropriate functions, such as browsing functions by interpreting these inputs using processor 202.
In examples, information handling system 138 may also include tangible and/or non-transitory computer-readable storage devices for carrying or having computer-executable instructions or data structures stored thereon. Such tangible computer-readable storage devices may be any available device that may be accessed by a general purpose or special purpose computer, including the functional design of any special purpose processor as described above. By way of example, and not limitation, such tangible computer-readable devices may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other device which may be used to carry or store desired program code in the form of computer-executable instructions, data structures, or processor chip design. When information or instructions are provided via a network, or another communications connection (either hardwired, wireless, or combination thereof), to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable storage devices.
Computer-executable instructions include, for example, instructions and data which cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, components, data structures, objects, and the functions inherent in the design of special-purpose processors, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
In additional examples, methods may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Examples may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof) through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
During drilling operations information handling system 138 may process different types of the real time data originated from varied sampling rates and various sources, such as diagnostics data, sensor measurements, operations data, and or the like through core laboratory 160. (e.g., referring to
A data agent 402 may be a desktop application, website application, or any software-based application that is run on information handling system 138. As illustrated, information handling system 138 may be disposed at any rig site (e.g., referring to
Secondary storage computing device 404 may operate and function to create secondary copies of primary data objects (or some components thereof) in various cloud storage sites 406A-N. Additionally, secondary storage computing device 404 may run determinative algorithms on data uploaded from one or more information handling systems 138, discussed further below. Communications between the secondary storage computing devices 404 and cloud storage sites 406A-N may utilize REST protocols (Representational state transfer interfaces) that satisfy basic C/R/U/D semantics (Create/Read/Update/Delete semantics), or other hypertext transfer protocol (“HTTP”)-based or file-transfer protocol (“FTP”)-based protocols (e.g., Simple Object Access Protocol).
In conjunction with creating secondary copies in cloud storage sites 406A-N, the secondary storage computing device 404 may also perform local content indexing and/or local object-level, sub-object-level or block-level deduplication when performing storage operations involving various cloud storage sites 406A-N. Cloud storage sites 406A-N may further record and maintain DTC code logs for each downhole operation or run, map DTC codes, store repair and maintenance data, store operational data, and/or provide outputs from determinative algorithms that are located in cloud storage sites 406A-N. In a non-limiting example, this type of network may be utilized as a platform to store, backup, analyze, import, preform extract, transform and load (“ETL”) processes, mathematically process, apply machine learning algorithms, and augment a core sample data set.
For the methods and systems discussed above, let {{right arrow over (x)}i,{right arrow over (y)}i}i=1i=N be the input and target data in the training data set, where each {right arrow over (x)}i and {right arrow over (y)}i represent the petrophysical properties, which may be identified as a parameter, of each core sample obtained by core laboratory 160. For example, {right arrow over (x)}i and {right arrow over (y)}i may be vectors with one or more parameters or numerical values with a single parameter. Additionally, {right arrow over (x)}i may be a single core sample and a vector of (Vp, Vs, ϕ, T2,gm). Where Vp is acoustic P-wave velocity, Vsis acoustic S-wave velocity, ϕ is total porosity, and T2,gm is NMR T2 log mean. It should be noted, in regard to T1 and T2, the decay of RF-induced NMR spin polarization is characterized in terms of two separate processes, each with their own time constants. One process, called T1, is responsible for the loss of resonance intensity following pulse excitation. The other process, called T2, characterizes the width or broadness of resonances. Stated more formally, is the time constant for the physical processes responsible for the relaxation of the components of the nuclear spin magnetization vector M parallel to the external magnetic Field, B0 (which is conventionally designated as the z-axis). T2 relaxation affects the coherent components of M perpendicular to B0. In conventional NMR spectroscopy, T1 limits the pulse repetition rate and affects the overall time an NMR spectrum can be acquired. Values of T1 range from milliseconds to several seconds, depending on the size of the molecule, the viscosity of the solution, the temperature of the sample, and the possible presence of paramagnetic species (e.g., O2 or metal ions).
Furthermore, formation factor {right arrow over (y)}i is the ratio of the resistivity of the core sample filled with water Ro to the resistivity of the water Rw wherein the core is the rock sample procured during the coring process previously described in
where Kh is a kernel and may be a symmetric function that integrates to one and h is the kernel size or bandwidth which may be predefined and/or adjustable. Kh may be any type of kernel including but not limiting a Gaussian kernel, linear kernel, or cosine kernel. In a non-limiting example, a kernel function may be defined as: Kh({right arrow over (x)})=N(0, h2), with normal distribution of mean=0 and standard deviation h.
In block 506 the kernel density estimation {circumflex over (f)}h({right arrow over (x)}) calculated in block 504 is compared to a threshold δ. Where δ is a predefined parameter to ensure the synthetic input data {right arrow over (x)} in the applicable ranges is defined by the input data in the training dataset {{right arrow over (x)}i}i=1i=N. If {circumflex over (f)}h({right arrow over (x)})>δ then RBF continues to block 508, otherwise primary data augmentation technique 500 moves back to block 502 and iterates a new {right arrow over (x)}i. In examples, δ may be altered to allow different applications of RBF.
As previously stated, if {circumflex over (f)}h({right arrow over (x)})>δ block 508 a corresponding synthetic target value is created with the RBF mapping function is defined in the following form:
{right arrow over (F)}({right arrow over (x)})=Σi=1N{right arrow over (c)}iϕ(∥{right arrow over (x)}−{right arrow over (x)}i∥) (2)
where {{right arrow over (c)}i
where {si}k=1N are the width of the Gaussian function and represent the nearest-neighbor distances of the inputs of the samples. However, other examples may apply different variations of ϕ.
Based on RBF convergence theory, if the synthetic input x is close to the input data in the training dataset {{right arrow over (x)}i}i=1i=N, the output from Eq. 2 {right arrow over (F)}({right arrow over (x)}) may be a satisfactory approximation to the true target value corresponding to the input data {right arrow over (x)}i. Subsequently, after being calculated in block 508, {right arrow over (F)}({right arrow over (x)}) is augmented into core sample data set obtained by core laboratory 160 (e.g., referring to
In different examples, a principal component analysis (PCA) may be performed as a data augmentation technique. PCA may augment the core sample data set obtained by core laboratory 160 (e.g., referring to
T
2 distribution=ΣPCAi*PCi (5)
where PCi is a vector as shown in
A synthetic T2 distributions is created with the following:
Synthtic T2 distribution=ΣPCi*ci (6)
where ci, i=1, . . . , N are random positive values, and N is the number of principal components used to represent the T2 distributions. The synthetic T2 distribution created in Equation (6) may be augmented to core sample data set obtained by core laboratory 160 as a linear combination of PCi. The augmented core sample data set may be applied to petrophysical interpretation machine learning models.
Utilizing these systems methods may be beneficial for modeling machine learning petrophysical models. Additionally, the disclosed systems and methods are improvements over the current art. For example, the synthetic data maintain the underline relationship between input and target data embedded in the original training data set as previously described in
Statement 1: The method may comprise forming a data set from one or more measurements of core samples, selecting one or more parameters from the data set, inputting the one or more parameters into a kernel estimation function, determining a kernel density estimation from the kernel estimation function based at least in part on the one or more parameters, and selecting an input value based at least in part on the kernel density estimation. The method may further comprise creating a corresponding synthetic target value based at least in part on the input value, augmenting the data set with the corresponding synthetic target value and input value to form a synthetic data set, and training a petrophysical interpretation machine learning model from the data set and the synthetic data set.
Statement 2. The method of statement 1, wherein the corresponding synthetic target value is created using a Radial Basis Function.
Statement 3. The method of statement 2, wherein the Radial Basis Function utilizes a vector formed from one or more constraints on a training data set.
Statement 4. The method of any preceding statements of claim 1 or 2, further comprising comparing the kernel density estimation to a threshold.
Statement 5. The method of statement 4, further comprising discarding the kernel density estimation if it is less than the threshold.
Statement 6. The method of statement 5, wherein the threshold is predefined and adjustable.
Statement 7. The method of any preceding statements of claim 1, 2, or 4, wherein the kernel density estimation comprises a kernel.
Statement 8. The method of claim 7, wherein the kernel is a Gaussian kernel, a linear kernel, or a cosine kernel.
Statement 9: A non-transitory computer-readable tangible medium comprising executable instructions that cause a computer device to form a data set from one or more measurements of core samples, select one or more parameters from the data set, input the one or more parameters into a kernel estimation function, determine a kernel density estimation from the kernel estimation function based at least in part on the one or more parameters, and select an input value based at least in part on the kernel density estimation. The executable instructions further cause the computer device to create a corresponding synthetic target value based on the input value, augment the data set with the corresponding synthetic target value and input value to form a synthetic data set, and train a petrophysical interpretation machine learning model from the data set and the synthetic data set.
Statement 10. The non-transitory computer-readable tangible medium of statement 9, wherein the corresponding synthetic target value is created using a Radial Basis Function.
Statement 11. The non-transitory computer-readable tangible medium of statement 10, wherein the Radial Basis Function utilizes a vector formed from one or more constraints on a training data set.
Statement 12. The non-transitory computer-readable tangible medium of any preceding statements 9 or 10, wherein the executable instructions further cause the computer device to compare the kernel density estimation to a threshold.
Statement 13. The non-transitory computer-readable tangible medium of statement 12, wherein the executable instructions further cause the computer device to discard the kernel density estimation if it is less than the threshold.
Statement 14. The non-transitory computer-readable tangible medium of statement 13, wherein the threshold is predefined and adjustable.
Statement 15. The non-transitory computer-readable tangible medium of any preceding statements 9, 10, or 12, wherein the kernel density estimation comprises a kernel.
Statement 16. The non-transitory computer-readable tangible medium of statement 15, wherein the kernel is a Gaussian kernel, a linear kernel, or a cosine kernel.
Statement 17. A method may comprise performing a principal component analysis (PCA) on one or more measurements of core samples to produce a set of vectors, combining each of the set of vectors to form a synthetic data, and augmenting the one or more measurements of core samples with the synthetic data.
Statement 18. The method of statement 17, further comprising eliminating multiple dominant peaks in a latent space with the PCA.
Statement 19. The method of any preceding statements 17 or 18, wherein the set of vectors are principal components of the (PC).
Statement 20. The method of any preceding statements 17-19, further comprising performing a linear combination of principal components.
It should be understood that, although individual examples may be discussed herein, the present disclosure covers all combinations of the disclosed examples, including, without limitation, the different component combinations, method step combinations, and properties of the system. It should be understood that the compositions and methods are described in terms of “comprising,” “containing,” or “including” various components or steps, the compositions and methods may also “consist essentially of” or “consist of” the various components and steps. Moreover, the indefinite articles “a” or “an,” as used in the claims, are defined herein to mean one or more than one of the element that it introduces.
For the sake of brevity, only certain ranges are explicitly disclosed herein. However, ranges from any lower limit may be combined with any upper limit to recite a range not explicitly recited, as well as, ranges from any lower limit may be combined with any other lower limit to recite a range not explicitly recited, in the same way, ranges from any upper limit may be combined with any other upper limit to recite a range not explicitly recited. Additionally, whenever a numerical range with a lower limit and an upper limit is disclosed, any number and any included range falling within the range are specifically disclosed. In particular, every range of values (of the form, “from about a to about b,” or, equivalently, “from approximately a to b,” or, equivalently, “from approximately a-b”) disclosed herein is to be understood to set forth every number and range encompassed within the broader range of values even if not explicitly recited. Thus, every point or individual value may serve as its own lower or upper limit combined with any other point or individual value or any other lower or upper limit, to recite a range not explicitly recited.
Therefore, the present examples are well adapted to attain the ends and advantages mentioned as well as those that are inherent therein. The particular examples disclosed above are illustrative only and may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Although individual examples are discussed, the disclosure covers all combinations of all of the examples. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. Also, the terms in the claims have their plain, ordinary meaning unless otherwise explicitly and clearly defined by the patentee. It is therefore evident that the particular illustrative examples disclosed above may be altered or modified and all such variations are considered within the scope and spirit of those examples. If there is any conflict in the usages of a word or term in this specification and one or more patent(s) or other documents that may be incorporated herein by reference, the definitions that are consistent with this specification should be adopted.