A geological formation is a rock with geological properties that characterize the formation. To obtain the geological properties of the formation, a well is drilled that penetrates the formation. A detailed record of the well is obtained by well logging.
Geologic data from well logging (well log data) are used to explore wells and oil production in the petroleum industry. Well log data are obtained by running various logging devices in wells to detect the geological properties of the formation and/or properties of a fluid within the formation. The properties of a formation may be naturally occurring radioactivity, e.g. gamma ray, or other natural and induced formation signals such as spontaneous potential, bulk density, neutron porosity, acoustic, and resistivity. The well log data is used for interpretations for quantitative formation evaluation and commonly annotations on logs such as stratigraphic tops, which are widely used as the standard graphic base for subsurface cross sections of properties.
Heterogeneity is the variation of rock properties as function of the location in a reservoir or formation. The qualitative and quantitative analysis of the properties in the well logs may effectively reflect the gross geologic variations and thus the heterogeneous characters of the formation. Due to the horizontal and vertical geologic variations in a formation, there is a need to group the wells with those variations based on the nature of the properties in the well logs. Different groups of wells may reflect different underground heterogeneous zones or formations.
Accordingly, there exists a need for a method for obtaining geological heterogeneity trends of a geological formation.
This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.
In one aspect, embodiments disclosed herein relate to a method for obtaining geological heterogeneity trends of a geological formation, comprising the steps: drilling wells that penetrate the formation, acquiring well logs for each well as function of depth intervals of the respective well, determining a third degree tensor, where a z-dimension denotes the depths, a x-dimension denotes the well logs, and a y-dimension denotes the wells, extracting matrices from the tensor, clustering the matrices based on the characteristics of the corresponding well logs to a clustering result matrix, aggregating the clustering result matrix to a cluster ensemble, and spatial partitioning the cluster ensemble to a map that shows the geological heterogeneity trends associated with cluster types of the wells.
Other aspects and advantages of the claimed subject matter will be apparent from the following description and the appended claims.
Embodiments disclosed herein relate to a clustering ensemble method for quick well clustering based on well log data. Clustering wells based on their similar log curve features can reveal hidden heterogeneous characteristics. The results can facilitate further reservoir studies, such as petrophysical and geological modeling and reservoir simulation. Multiple logs or log curves are usually measured along the well path. Extracting patterns from these log curves for a certain formation is a challenge using the traditional clustering methods.
In embodiments disclosed herein, wells are clustered based on multiple well logs data along certain well path intervals defined by stratigraphic tops. The method for doing this involves three major steps: base clustering, clustering ensemble and spatial partitioning at the reservoir scale. Base clustering is to cluster all the wells based on their individual well log curve. Then, the clustering ensemble clusters the wells by considering multiple well log curves together. The spatial partitioning is to perform a 3D spatial compartmentalization of the underlying domains according to well cluster types. Finally, clustering of all the wells in a target area may be obtained.
In one or more embodiments, the method for obtaining geological heterogeneity trends of a geological formation is performed by computer software programs for fast and robust well clustering. Such computer software programs may be executed on any suitable device, such as shown in
In step 202, N wells w1, w2, . . . , wN are drilled that penetrate the formation.
In step 204, well log curves log 1, log 2, . . . , log M are acquired for each well w1, w2, . . . , wN as function of a depth D1, D2, . . . , DK of the respective well w1, w2, . . . , wN. The well log curves are measured along the depth of each well, even if a well is embodied vertical, deviated, or horizontal. The value of a property of the formation is recorded as a function of the depth of the logging device in the well. The well log data are then plotted in well log curves that show the value of the properties versus the depth.
In step 206, a third-degree Tensor Tk,m,n is determined, where k=1, 2, . . . , K is an index of the depths with D1, D2, . . . , DK, m=1, 2, . . . , M is an index for the well logs L1, L2, . . . , LM, and n=1, 2, . . . , N is an index of the wells w1, w2, . . . , wN. The well logs L1, L2, . . . , LM are values of the well log curve log 1, log 2, . . . , log M at the depth D1, D2, . . . , DK of the respective well wn.
In step 208, matrices L1k,n, L2k,n, . . . , LMk,n are extracted from the tensor Tk,m,n. Each matrix Lmk,n is extracted from the tensor Tk,m,n. For example, Lin is equal to Tk,1,n, L2k,n is equal to Tk,2,n, . . . , and LMk,n is equal to Tk,M,n.
In step 210, the matrices L1k,n, L2k,n, . . . , LMk,n are clustered based on the characteristics of the corresponding well logs L1, L2, . . . , LM (base clustering) to a clustering result matrix. The base clustering of the matrices L1k,n, L2k,n, . . . , LMk,n, is a matrix feature extraction and clustering task, which is a challenge to the clustering. In one or more embodiments, the base clustering comprises k-means algorithm. The base clustering of the matrices L1k,n, L2k,n, . . . , LMk,n is based on the similarity of the well logs and reveals hidden heterogeneous characteristics. The results facilitate further reservoir studies, such as petrophysical and geological modeling and reservoir simulation.
Extracting patterns from the well logs while drilling the well is a challenge for the base clustering. The wells are clustered based on multiple well logs along certain depth intervals defined by historic analysis of layers of sedimentary rock called strata (stratigraphic analysis) of markers of geologic layers (tops).
The base clustering of the wells is conducted based on the analysis of the type of the well log along the depth. The challenge is that the number N of wells is much smaller than the depth intervals Dk. For example, a value of a well log is measured at depths of every 0.125 meter (0.5 feet) of the well. In one or more embodiments, the depth intervals Dk is around 1000s, while the well number is around 100s. An unsupervised multivariate data reduction, such as principal component analysis (PCA), is used to reduce the dimension of the depth intervals Dk. Furthermore, the matrices are huge mathematical matrices, even for vertical wells. Furthermore, the number N of the wells is much smaller than the numbers of the well logs M.
Since the base clustering is done according to the different types of well logs Lm, the N×K matrix LmK,n is to be to be base clustered for each type of well log Lm. Each of the N wells comprises K depths.
In one or more embodiments, the base clustering is a machine learning (ML) technique. ML clusters the matrices Lmk,n according to the well logs Lm as unsupervised learning for statistical data analysis. Data points in the same group have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features.
Table 1 shows an example of a N×K matrix Lmk,n of a well log Lm, where m is the type of the well log, e.g., gamma radiation (GR), N=15 is the number of wells to be clustered, and K=89 is the depth of the wells.
The N×K matrix Lmk,n, according to table 1 needs to be clustered as a base for final clustering (for base clustering at a later stage). In one or more embodiments, the base clustering comprises K-means algorithm. Table 2 shows a matrix resulting from base clustering of the N×K matrix Lmk,n, according to table 1 (with M=1) for M=4 different well logs L1, L2, L3, L4 using K-means algorithm.
In step 212, the clustering result matrix is aggregated to a cluster ensemble π1, π2, . . . , πM. The cluster ensemble combines multiple clustering result matrices of the well logs to yield a single overall clustering. The cluster ensemble π1, π2, . . . , πM clusters the well logs by considering multiple well logs together.
In step 214, the cluster ensemble π1, π2, . . . , πM is spatial partitioned resulting in a map that shows the geological heterogeneity trends associated with cluster types of the wells. The domain of the well log data, acquired from the cluster ensemble, is extended to a 2D spatial domain. The 2D spatial domain reveals an insight in how similar the wells w1, w2, . . . , wN are clustered. The spatial partitioning comprises a 3D spatial compartmentalization of the underlying domains according to the types of the well logs. Finally, target areas are obtained for integrated reservoir studies by clustering the wells.
In step 502, the tensor Tk,m,n is obtained (see description of
In step 504, matrices L1k,n, L2k,n, . . . , LMk,n are extracted from the Tensor Tk,m,n. The matrices L1k,n, L2k,n, . . . , LMk,n represent the heterogeneous physical properties of the formation.
In step 506, the matrices L1k,n, L2k,n, . . . , LMk,n are base clustered to a base clustering result. The base clustering results are shown in
In step 508, the base clustering results are obtained from the base clustering.
In step 510, the cluster ensemble π1, π2, . . . , πM is combined to a single overall cluster.
In step 512, a 2D spatial map from the overall cluster is obtained. The 2D spatial map shows the geological heterogeneity trends associated with cluster types of all the wells w1, w2, . . . , wN. The 2D spatial map reveals the heterogeneity not only within the same well, but also in the spatial domain between the wells. Thereby, it provides a pseudo-2D heterogeneity reservoir model for enhanced geological and engineering studies. Each measured type of well log reveals its heterogeneity.
In step 602, the well logs L1, L2, . . . , LM are transformed to vectors Dm,n.
In step 604, the vectors Dm,n are folded to folding matrices. The description of
In step 606, the matrices are 2D EM (2D expectation maximization) clustered.
One of the base clustering techniques is 2D EM clustering. The 2D EM clustering characterizes the features of the well logs. The 2D EM clustering is especially suited for small well numbers N and the high numbers of depth K>>N.
In one or more embodiments, the 2D EM clustering comprises spectral clustering because the number N of the wells is much smaller than the number of types (including the total number of logs, excluding the total number of log types) of well logs M. Spectral clustering uses the spectrum (eigenvalues) of a similarity matrix to perform dimensionality reduction before clustering in fewer dimensions. The 2D EM clustering is performed for all the wells N based on the characteristics of each type of well log L1, L2, . . . , LM.
The 2D EM clustering is unsupervised and determines factors of the probability distribution by a maximum likelihood estimation. Specifically, random values are selected by the maximum likelihood estimation to estimate the best fit for the petrophysical well logs and wells. The maximum likelihood estimation is then obtained using the 2D EM clustering.
Assuming a vector Y={y1, y2, . . . , yk} represents unlabeled wells with a number k of the unlabeled wells. Let the class label of the n-th cluster be denoted as Yn for (n=1, . . . , C) being αn.
The probability of the mixture component is:
In the maximization step, the probabilities are used to perform re-estimation of the parameter. The likelihood clusters are evaluated using Eq. (4) to (6).
After a vector is folded into a matrix, the above introduced maximum likelihood estimation of EM algorithm is implemented for matrix clustering. The core of the method for obtaining geological heterogeneity trends of a geological formation penetrated by the wells Y={y1, y2, . . . , yk} is to cluster a set of 2D matrices through using this 2D EM clustering.
During clustering, the distance between each matrix pairs y; and y; is calculated using a Hausdorff distance. The Hausdorff distance measures how large a metric space is from each other.
More formally, the Hausdorff distance from set A to set B is a maximum function, defined as:
The 2D EM clustering is performed for all types of well logs m that are different from each other. Therefore, the matrices used for 2D EM clustering are N×K matrices for each type of well log clustering. In other words, each well w1, w2, . . . , wN has K measuring points along the depth of the well wn.
The results from the 2D EM clustering characterize the vertical shape information of the well logs L1, L2, . . . , LM.
Entering different data of well logs in the 2D EM clustering results in a matrix of different wells with different types of well logs.
In step 608, the 2D EM clustering outputs the clustering results. The space of the clustering results is a latent space which has no spatial meaning and no relationship with the well locations. The clustering results are just for easy understanding. Actually, there is no need to do such a 2D projection from a hyper-dimensional latent space.
Using different well log data as the input for clustering of the matrices L1k,n, L2k,n, . . . , LMk,n, the result is a table of different wells with different types of well logs.
In the second step of the procedure for folding a vector, the vector {tilde over (D)}k is transformed from Cartesian coordinate system to polar coordinate system. The rescaled vector {tilde over (D)}x follows polar coordinates by encoding the values Dk as the angular cosine and depth step as the radius with the equation below:
The transformation from Cartesian coordinate to polar coordinate through Eq. (9) has two important properties. The first property is that Eq. (9) is bi-ejective, because cos (ϕ) is monotonic when ϕ∈[0, 1]. The character encoding map produces only one result in the polar coordinate system with a unique inverse function. The second property is that as opposed to Cartesian coordinates, polar coordinates preserve absolute temporal relations of the vector. Thus, the corresponding area from depth step ti to depth step tj is not only dependent on the depth interval |ti−tj|, but also determined by the absolute value of ti and tj.
After transforming the rescaled vector into the polar coordinate system, the angular cosine is easily exploited by considering the trigonometric sum between each point to identify the temporal correlation of the well log values within different measured depth intervals.
In case the radius r is known, the fold-matrix is defined as follows:
The fold-matrix in Eq. (10) has several advantages. First advantage is that the matrix G provides a way to preserve the temporal dependency. The third-degree matrix G contains temporal correlations because G (i,j∥i−j|=k) represents the relative correlation by superposition of directions with respect to depth interval k. The main diagonal Gi,i of the matrix G is the special case when k=0, which contains the original value/angular information.
For example, assuming the vector Dm,n describes k=10 observed real values along the well. In a first step the vector is normalized to an interval [−1,1] (see Eq. (8)). In a second step the angular coordinates are calculated (see Eq. (9)). In a third step, the matrix G is calculated by Eq. (10). The vector is transformed into a data 10×10 matrix. In a fourth step a 2D image is determined.
Generating a cluster ensemble is described in the description of
In step 1002, the matrices L1k,n, L2k,n, . . . , LMk,n are extracted from the tensor Tk,m,n.
In step 1004, the matrices L1k,n, L2k,n, . . . , LMk,n are aggregated into a cluster ensemble π1, π2, . . . , πM.
An algorithm of the cluster ensemble combines different datasets with various clustering algorithms to achieve better accuracy than the individual clustering algorithms. The matrices Lmk,n={L1k,n, L2k,n, . . . , LMk,n} are a set of M data points Lmk,n and Π={π1, π2, . . . , πM} is the cluster ensemble with M ensemble members πm. Each ensemble member πm of the cluster ensemble Π={π1, π2, . . . , πM} returns a set of cluster ensembles πi={C1i, C2i, . . . , Ck
In step 1006, the cluster ensemble π1, π2, . . . , πM is entered into a consensus function that performs spatial partitioning of the cluster ensemble π1, π2, . . . , πM resulting in a map that shows the geological heterogeneity trends associated with cluster types of the wells.
The ensemble members π1, π2, . . . , πm are inputted in the consensus function. The ensemble members π1, π2, . . . , πm are aggregated to form a final data partition. There are two main stages: (i) generating the similarity matrix through cluster ensemble, and (ii) producing the final partition by a consensus function.
Consensus clustering is a method of ensemble clustering the clustering results from multiple clustering algorithms. Also called aggregation of partitions, the consensus clustering refers to the situation in which a number of different clusterings are obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition. Consensus clustering for unsupervised learning is analogous to ensemble learning in supervised learning.
There are many clustering techniques, either supervised or unsupervised learning, which are widely used in the case of unlabeled data, i.e., data without defined categories or groups.
After the Ensemble operation, the Base clustering result matrix in table 2 is spatial partitioned into one final clustering, as shown in table 3.
The base clustering of well logs only accounts for the well log data and is, therefore, one dimensional. However, the spatial partitioning of the well logs is two-dimensional. In one or more embodiments, the spatial partitioning is based on a simple inverse distance algorithm or an advanced spatial interpolation method, such as indicator kriging using an indicator function, is implemented.
Multivariate interpolation is interpolation on functions when the variates are spatial coordinates (spatial interpolation). Multivariate interpolation creates a digital elevation model from a set of points on the Earth's surface.
The indicator function of a subset A of a set X is a function defined from X to the two-element set {0,1}, denoted as 1A:X→{0,1}. The indicator function indicates whether an element in X belongs to A or not. The indicator function of a subset A of a set X is a function 1A:X→{0,1}, defined as
The clustering results in ensemble result column of table 3 using the 15 wells, are used as an example to illustrate the 2D spatial clustering algorithm. The first step is to prepare data files for 15 wells, comprising well logs, spatial x and y coordinates, and lowest depth D1 and highest depth DK of the respective well. Table 4 lists well number, spatial x and y coordinates of the well, lowest depth D1, and highest depth DK of the respective well.
The second step for spatial partitioning is to perform an indicator transformation to the clustering results for the wells in table 4. Assuming the locations are uα, α=1, . . . , N, the cluster results for those locations are denoted as z(uα)=k, α=1, . . . , N, z=1, . . . , K. The indicator transformation is:
From information theory, it is derivable that the indicator transformation results as the probability of each cluster at the well sites. If the indicator value is one, it means the probability to find the specific cluster at this site is 100 percent. Otherwise, the probability is zero.
The third step is to do an indicator interpolation.
Higher distances result in lower weight. So, for each cluster, the inverse distance weighted interpolation is calculated as:
After performing spatial indicator interpolation for all the clusters, a maximum probability threshold algorithm is implemented as given in the following equation:
It assigns a specific cluster to each cell based on the probability of each cluster.
The ML engine 1500 also comprises an interface 1504. The interface 1504 comprises software supporting one or more communication protocols. The interface 1504 further comprises hardware that receives the well log curves.
In one or more embodiments, the interface 1504 is wirelessly connected to ML engine 1500. In other embodiments, the interface 1504 comprises a wired connection to the ML engine 1500.
Furthermore, the ML engine 1500 comprises one or more ML algorithms 1508 for performing the method steps for determining properties of a formation. The ML algorithm 1508 is a software component of the ML engine 1500. Although illustrated as an internal part of the ML engine 1500, in alternative embodiments, the ML algorithm 1508 is an external component of the ML engine 1500.
The ML engine 1500 comprises a processor 1506. The processor 1506 executes instructions according to the ML algorithm 1508 and manipulates the well logs to perform the method for obtaining geological heterogeneity trends of a geological formation, according to the ML algorithm 1508.
The ML engine 1500 further comprises a database 1520. The existing well log curves are stored in the database 1520. While the database 1520 is illustrated as an integral component of the ML engine 1500, in alternative embodiments, the database 1520 is external to the ML engine 1500. The database 1520 may be any repository capable of storing data, including but not limited to data structures such as tables, lists, arrays, etc.
The interface 1504, the processor 1506, the ML algorithm 1508, and the database 1520 communicate via a system bus 1514. In one or more embodiments, any or all of the interface 1504, the processor 1506, the ML algorithm 1508, and the database 1520, communicate with each other over the system bus 1514 using an application programming interface (API) 1510 or a service layer 1512 or a combination of the API 1510 and service layer 1512.
In one or more embodiments, the ML algorithm 1508 creates a ML model with an artificial neural network (ANN). The ANN comprises neurons, wherein each neuron is connected to every other neuron in the ANN. A neuron receives data then processes it and sends the data to all the other neurons. The neurons are aggregated and organized into layers. The neurons of a layer are connected to all the neurons of the neighboring layers. A first layer is the input layer that receives the existing data logs. The last layer is the output layer that outputs the estimated pore pressure log. The mineral composition of the formation is predicted from the digital images of the drill cuttings using ANN.
Although only a few example embodiments have been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the example embodiments without materially departing from this invention. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the following claims. In the claims, means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures. Thus, although a nail and a screw may not be structural equivalents in that a nail employs a cylindrical surface to secure wooden parts together, whereas a screw employs a helical surface, in the environment of fastening wooden parts, a nail and a screw may be equivalent structures. It is the express intention of the applicant not to invoke 35 U.S.C. § 112 (f) for any limitations of any of the claims herein, except for those in which the claim expressly uses the words ‘means for’ together with an associated function.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/082500 | 3/23/2022 | WO |