This description relates generally to hydrocarbon wells, for example, to determining a leak in a wellbore of a hydrocarbon well using machine learning.
Hydrocarbon recovery from oil wells poses a challenge in the presence of leaks. A leak can develop at a location on a wellbore of a hydrocarbon well. For example, a leak can develop in a tubing or a casing. The leak causes fluids to leak into areas where pressure is less. Such a leak affects the integrity of the hydrocarbon well, and poses challenges to hydrocarbon recovery and a potential danger to the environment.
A computer system receives data obtained from multiple hydrocarbon wells. The data includes a first set of downhole temperature logs recorded before detection of one or more wellbore leaks in the hydrocarbon wells. A second set of downhole temperature logs is recorded after detection of the one or more wellbore leaks. The computer system extracts multiple features from the data to generate an N-dimensional feature space. The computer system performs dimensionality reduction on the N-dimensional feature space to generate an M-dimensional feature space, wherein M is less than N. The computer system generates one or more machine learning models trained to determine the one or more wellbore leaks in the hydrocarbon wells based on the M-dimensional feature space.
In some implementations, the features include, for each downhole temperature log of the first set of downhole temperature logs and the second set of downhole temperature logs, an absolute energy determined using the downhole temperature log.
In some implementations, the features include, for each downhole temperature log of the first set of downhole temperature logs and the second set of downhole temperature logs, an absolute sum of temperature changes determined using the downhole temperature log.
In some implementations, the features include, for each downhole temperature log of the first set of downhole temperature logs and the second set of downhole temperature logs, an aggregation of an autocorrelation function determined using the downhole temperature log.
In some implementations, the features include, for each downhole temperature log of the first set of downhole temperature logs and the second set of downhole temperature logs, a complexity metric of the downhole temperature log.
In some implementations, the features include, for each downhole temperature log of the first set of downhole temperature logs and the second set of downhole temperature logs, a Fourier transform performed on the downhole temperature log.
In some implementations, the computer system extracts one or more features from a third set of downhole temperature logs obtained from a hydrocarbon well. The one or more features indicate a location of a wellbore leak in the hydrocarbon well. The computer system determines the location of the wellbore leak using the one or more machine learning models based on the one or more features.
The implementations disclosed provide methods, apparatus, and systems for wellbore leak determination using machine learning. The implementations perform automatic wellbore leak determination using downhole temperature logs in a methodology based on machine learning. A dedicated machine learning model is constructed that automatically pinpoints a wellbore leak in a tubing or a casing using the temperature logs. The machine learning model is trained on multiple surveys to uncover the patterns that indicate a wellbore leak. The machine learning model further enables an automated advisory system to pinpoint wellbore leak locations.
Among other benefits and advantages, the methods provide a flexible and integrated framework for wellbore leak determination. The implementations analyze temperature logs using machine learning and novel feature extraction techniques. Moreover, the feature extraction techniques disclosed enable multiple other cases within the petroleum industry. For example, the implementations can be used to create an assessment phase that explores the potential of other models in the domain of temperature log analysis. The implementations can further serve as add-on methods in temperature log acquisition systems. Moreover, oil and gas companies can use the implementations for automating the process of analyzing historical temperature surveys.
In the Data Collection step illustrated in
In the Data Preprocessing step illustrated in
In the Feature Extraction step illustrated in
In some implementations, the features include an absolute energy E determined using each downhole temperature log. The feature extraction is performed for the first set of downhole temperature logs and the second set of downhole temperature logs. The absolute energy E of a downhole temperature log can be represented as in the following equation (1).
Here, x represents a temperature reading, i represents an index of a temperature reading, and n represents a total number of temperature readings in a particular temperature log. In some implementations, the features include an absolute sum of temperature changes determined using the downhole temperature log. The feature extraction is performed for the first set of downhole temperature logs and the second set of downhole temperature logs. The absolute sum of temperature changes determined using the downhole temperature log can be represented as in the following expression (2).
Here, x represents a temperature reading, i represents an index of a temperature reading, and n represents a total number of temperature readings in a particular temperature log.
In some implementations, the features include an aggregation R(l) of an autocorrelation function determined using the downhole temperature log. The autocorrelation function refers to a correlation of the measured temperatures with a delayed copy of the measured temperatures as a function of the time delay. The feature extraction is performed for the first set of downhole temperature logs and the second set of downhole temperature logs. The aggregation R(l) of the autocorrelation function can be represented as in the following equation (3).
Here, x represents a temperature reading, i represents an index of a temperature reading, n represents a total number of temperature readings in a particular temperature log, μ represents a mean temperature reading of a temperature log, σ2 represents a variance determined from the temperature log, and l represents a time delay lag of the temperature log. The autocorrelation function itself is represented as in expression (4) as follows.
ƒagg((1), . . . ,(m)) for m=max(n,maxlag). (4)
In some implementations, the features include a linear least-squares regression determined from temperature values in a downhole temperature log. For example, the computer system can determine a least-squares approximation of a function represented by the downhole temperature values, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. In some implementations, the feature extraction includes applying a vectorized approximate entropy algorithm to the downhole temperature values measured in the temperature logs. For example, the approximate entropy algorithm can be used to quantify an amount of regularity and unpredictability of fluctuations in the downhole temperature over time-series data. In some implementations, the feature extraction includes fitting an unconditional maximum likelihood of an autoregressive AR(k) process. Here, the k parameter represents a maximum time delay lag of the process. The autoregressive AR(k) process is used to describe the time-varying temperature values and the autoregressive model generated specifies that the output variable depends linearly on its own previous values and on a stochastic term. In some implementations, the feature extraction includes applying an augmented Dickey-Fuller hypothesis test to check whether a unit root is present in each downhole temperature log. A Dickey-Fuller test examines a null hypothesis that a unit root is present in an autoregressive model. In some implementations, the feature extraction includes determining a binned entropy of the downhole temperature logs. For example, the binned entropy determination can be used to estimate the differential entropy of the process based on histogram-based estimation. In some implementations, the feature extraction includes determining a corridor by multiple levels of quantiles dependent upon distribution of temperature values in a log. The average and absolute values of consecutive temperature changes of the temperature log inside the corridor is determined.
In some implementations, the features include a complexity metric of a downhole temperature log. The feature extraction is performed for the first set of downhole temperature logs and the second set of downhole temperature logs. The complexity metric of a downhole temperature log can be represented as in the following equation (5).
Here, x represents a temperature reading, i represents an index of a temperature reading, n represents a total number of temperature readings in a particular temperature log, and lag represents a time lag. In other implementations, the feature extraction includes determining a number of temperature values in a temperature log above a mean value or a number of temperature values in the log below the mean value.
In some implementations, the features include a Fourier transform performed on each downhole temperature log. The feature extraction is performed for the first set of downhole temperature logs and the second set of downhole temperature logs. For example, the computer system can determine a mean, a variance, a skew, or a kurtosis of an absolute Fourier transform. The kurtosis refers to a sharpness of a peak of a frequency-distribution curve. The computer system can determine Fourier coefficients of a one-dimensional discrete Fourier transform using a fast Fourier transformation algorithm as in the following equation (6).
Here, Ak represents a value of the Fourier transform at a frequency k, n represents a total number of temperature readings in a particular temperature log, m represents a variable used to iterate over all temperature readings in a particular temperature log, and π represents the constant 3.14.
In some implementations, the feature extraction includes determining whether a value in a temperature log occurs more than once, whether a maximum value in the temperature log is observed more than once, or whether a minimum value in the temperature log is observed more than once. In some implementations, the feature extraction includes determining an index where a percentage of a mass of the temperature log lies to the left of the index, determining a kurtosis of the temperature log, or determining whether a standard deviation of the temperature log is higher than a percentage of the difference between the maximum and minimum values, expressed as in the following inequality (7).
std(x)>r*(max(X)−min(X)) (7)
Here, r represents a desired percentage value and x represents a temperature reading.
In some implementations, the feature extraction includes determining a length of a temperature log, determining a linear least-squares regression of the temperature log, determining a length of a longest consecutive subsequence in the temperature log that is larger than a mean value of the temperature log, determining a length of a longest consecutive subsequence in the temperature log that is smaller than the mean value of the temperature log, determining a maximum temperature value in the temperature log, or determining a mean temperature value of the temperature log. In some implementations, the feature extraction includes determining a mean over absolute differences between subsequent time series values as represented in the following expression (8).
Here, x represents a temperature reading, i represents an index of a temperature reading, and n represents a total number of temperature readings in a particular temperature log.
In some implementations, the feature extraction includes determining a mean value over the differences between subsequent time series values from the temperature logs. For example, the mean value can be determined using the following expression (9).
Here, x represents a temperature reading, i represents an index of a temperature reading, and n represents a total number of temperature readings in a particular temperature log. In other implementations, the feature extraction includes determining a mean value of a central approximation of a second derivative determined from the temperature logs as in the following expression (10).
Here, x represents a temperature reading, i represents an index of a temperature reading, and n represents a total number of temperature readings in a particular temperature log.
In some implementations, the feature extraction includes determining a median value of a temperature log, a number of crossings of the temperature log for a particular temperature value, or a number of peaks having a particular support value. In some implementations, the feature extraction includes determining a value of a partial autocorrelation function at a particular time delay lag using the following equation (11).
Here, α represents a value of the partial autocorrelation for a particular time delay lag, k, between the values in the temperature log, x represents a temperature reading, i represents an index of a temperature reading, t represents a value in the temperature log value at a particular depth, Cov represents a statistical covariance, and Var represents a statistical variance.
In some implementations, the feature extraction includes determining a percentage of unique values that are present in a temperature log more than once or a ratio of unique values that are present in the temperature log more than once. In some implementations, the feature extraction includes determining quantiles of a temperature log, observed temperature values within a particular interval, or a ratio of temperature values that are larger than r×std(x), that is, determining temperature values that are away from the mean value, where r represents an integer (such as 3 or 5) and x represents a temperature reading. In some implementations, the feature extraction includes determining a ratio of a number of unique temperature values to a number of temperature values, an entropy of a temperature log, or a sample skewness of a temperature log. In some implementations, the feature extraction includes determining a power spectrum of a temperature log at different frequencies, a sum of all temperature values in a time series that are present more than once, or a sum of temperature values across the temperature log. In some implementations, the feature extraction includes determining a Boolean variable denoting whether the distribution of a temperature log is symmetric using the following expression (12).
|mean(X)−median(X)|<r*(max(X)−min(X)) (12)
Here, X represents all the values in the temperature log (X={x1, x2, . . . , xn}), where n represents a total number of temperature readings in a particular temperature log, xi represents a temperature at the ith depth, and r represents a real number.
In some implementations, the features include a metric based on a comparative feature-based time-series classification represented by the following expression (13).
Here, x represents a temperature reading, i represents an index of a temperature reading, n represents a total number of temperature readings in a particular temperature log, and lag represents a time lag. In other implementations, the feature extraction includes counting occurrences of a particular temperature value in a temperature log.
In the Dimensionality Reduction step illustrated in
In the Modeling step illustrated in
In some implementations, once the machine learning models have been trained, the computer system extracts one or more features from a third set of downhole temperature logs obtained from a hydrocarbon well. The one or more features indicate a location of a wellbore leak in the hydrocarbon well. For example, the one or more features can include a location of a maximum temperature value in a temperature log or a location of a minimum temperature value in the temperature log. The one or more features can include the last location of the maximum temperature values in the temperature log or the last location of the minimum temperature values in the temperature log. The computer system determines a location of a wellbore leak in the hydrocarbon well using the one or more machine learning models based on the one or more features.
In the Evaluation step illustrated in
The computer system receives (304) data obtained from multiple hydrocarbon wells. The data includes a first set of downhole temperature logs recorded before detection of one or more wellbore leaks in the multiple hydrocarbon wells, a second set of downhole temperature logs recorded after detection of the one or more wellbore leaks. In some implementations, a software module interfaces with the database to acquire the needed data. Custom-built data preprocessing techniques can be used to prepare the data for modeling.
The computer system extracts (308) multiple features from the data to generate an N-dimensional feature space. In some implementations, the computer system reduces redundancy in the training data (the received data obtained from the hydrocarbon reservoir) by transforming the training data into a reduced set of features (a feature vector). For example, in the Feature Extraction step, the computer system applies mathematical operations to the temperature logs to extract attributes (features). The feature vector contains the relevant information from the training data, such that features of interest are identified by machine learning using the reduced representation instead of the complete training data.
The computer system performs (312) dimensionality reduction on the N-dimensional feature space to generate an M-dimensional feature space, where M is less than N. In some implementations, unique dimensionality reduction techniques are configured to reduce computational power and time.
The computer system generates (316) one or more machine learning models trained to determine the one or more wellbore leaks in the multiple hydrocarbon wells based on the M-dimensional feature space. In some implementations, novel mathematical transformations are used for the inception of the models. The computer system takes data points from the plane (illustrated in
The methods described can be performed in any sequence and in any combination, and the components of respective embodiments can be combined in any manner. The machine-implemented operations described above can be implemented by a computer system that includes programmable circuitry configured by software or firmware, or a special-purpose circuit, or a combination of such forms. Such a special-purpose circuit can be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), or system-on-a-chip systems (SOCs).
Software or firmware to implement the techniques introduced here can be stored on a non-transitory machine-readable storage medium and executed by one or more general-purpose or special-purpose programmable microprocessors. A machine-readable medium, as the term is used, includes any mechanism that can store information in a form accessible by a machine (a machine can be, for example, a computer, network device, cellular phone, personal digital assistant (PDA), manufacturing tool, or any device with one or more processors). For example, a machine-accessible medium includes recordable or non-recordable media (RAM or ROM, magnetic disk storage media, optical storage media, or flash memory devices).
The term “logic,” as used herein, means: i) special-purpose hardwired circuitry, such as one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), or other similar device(s); ii) programmable circuitry programmed with software and/or firmware, such as one or more programmed general-purpose microprocessors, digital signal processors (DSPs) or microcontrollers, system-on-a-chip systems (SOCs), or other similar device(s); or iii) a combination of the forms mentioned in i) and ii).