Many oil wells are artificially lifted via electric submersible pumps (ESPs). ESP failures often result in significant production deferment until workover is completed to replace the failed ESPs. The lifespan ESPs may be negatively affected by many factors such as high pressure, high temperature, sour oil environment, etc. Accordingly, predicting the expected remaining lifespan of an ESP is nontrivial.
This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.
In general, in one aspect, embodiments relate to a method for predicting a lifespan of an electric submersible pump (ESP), the method comprising: obtaining data associated with the ESP, the data originating from a plurality of different categories; predicting, using a machine learning model, based on the data, a remaining expected life of the ESP; and reporting the remaining expected life.
In general, in one aspect, embodiments relate to a system for predicting a lifespan of an electric submersible pump (ESP), the system comprising: a plurality of sensors configured to measure first parameters associated with the ESP; a database configured to store second parameters associated with the ESP; and a prediction engine configured to: obtain data associated with the ESP, the data originating from a plurality of different categories and the data comprising the first parameters and the second parameters; predict, using a machine learning model, based on the data, a remaining expected life of the ESP; and report the remaining expected life.
In general, in one aspect, embodiments relate to a non-transitory machine-readable medium comprising a plurality of machine-readable instructions executed by one or more processors, the plurality of machine-readable instructions causing the one or more processors to perform operations comprising: obtaining data associated with an ESP, the data originating from a plurality of different categories; predicting, using a machine learning model, based on the data, a remaining expected life of the ESP; and reporting the remaining expected life.
In light of the structure and functions described above, embodiments of the invention may include respective means adapted to carry out various steps and functions defined above in accordance with one or more aspects and any one of the embodiments of one or more aspect described herein.
Other aspects and advantages of the claimed subject matter will be apparent from the following description and the appended claims.
Specific embodiments of the disclosed technology will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.
In the following detailed description of embodiments of the disclosure, numerous specific details are set forth in order to provide a more thorough understanding of the disclosure. However, it will be apparent to one of ordinary skill in the art that the disclosure may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as using the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
Many oil wells are artificially lifted via electric submersible pumps (ESPs). ESP failures often result in significant production deferment until workover is completed to replace the failed ESPs. As failures and breakdown seem inevitable with continuous ESP operations, being able to predict failures, and thereby being able to reduce any substantial downtime and relative maintenance costs, would be highly beneficial. However, with the lifespan of ESPs potentially being negatively affected by many factors such as high pressure, high temperature, sour oil environment, etc., predicting the expected remaining lifespan of an ESP is nontrivial.
Embodiments of the disclosure provide a prediction of the expected remaining lifespan of ESPs. A machine learning model is used for the prediction. In one embodiment of the disclosure, the machine learning model is a random forest model to predict the lifespan and detect premature ESP failures. The disclosed embodiments have the ability to alert the user of potential failures ahead of time, thus providing the user with the benefit of being able to plan in advance to avoid production losses by formulating mitigation strategies to prolong ESP lifespan. While the presence of H2S, high pressure and/or temperature, results in particularly harsh environments for ESPs and adversely affect their integrity and reliability, the machine learning model may also be used for other applications, e.g., in non-sour environments.
Embodiments of the disclosure may be used to accurately predict failure, optimizing ESP operation, and predict ESP health, thus making it easy to schedule ESP replacement when needed. Embodiments of the disclosure thereby help reduce the loss of production that would occur due to sudden, unexpected ESP failures. Unlike other predictive methods that can be computationally demanding, embodiments of the disclosure are computationally efficient, while providing a high degree of robustness and accuracy. They further generalize well, with a relatively low overall variance, and a low bias. Additional details are subsequently provided, after an introductory discussion of well environments.
In some embodiments, the well system (106) includes a wellbore (120), a well sub-surface system (122), a well surface system (124), and a well monitoring and control system (126). The well monitoring and control system (126) may monitor and/or control various operations of the well system (106), such as well production operations, well completion operations, well maintenance operations, and reservoir monitoring, assessment and development operations. In one or more embodiments, the well monitoring and control system (126) is configured to operate and or monitor the electric submersible pump (ESP) (180), as further discussed below. In some embodiments, the well monitoring and control system (126) includes a computer system that is the same as or similar to that of computer system (502) described below in
The wellbore (120) may include a bored hole that extends from the surface (108) into a target zone of the hydrocarbon-bearing formation (104), such as the reservoir (102). An upper end of the wellbore (120), terminating at or near the surface (108), may be referred to as the “up-hole” end of the wellbore (120), and a lower end of the wellbore, terminating in the hydrocarbon-bearing formation (104), may be referred to as the “downhole” end of the wellbore (120). The wellbore (120) may facilitate the circulation of drilling fluids during drilling operations, the flow of hydrocarbon production (“production”) (121) (e.g., oil and gas) from the reservoir (102) to the surface (108) during production operations, the injection of substances (e.g., water) into the hydrocarbon-bearing formation (104) or the reservoir (102) during injection operations, or the communication of monitoring devices (e.g., logging tools) into the hydrocarbon-bearing formation (104) or the reservoir (102) during monitoring operations (e.g., during in situ logging operations).
In one or more embodiments, the well system (106) is an artificially lifted well system with an ESP (180) supporting production (121). The ESP (180) may be any type of submersible pump, e.g., a multistage centrifugal pump. Stages may be stacked based on the operating requirements of the well system (106). Many different factors, including the environmental conditions in the wellbore (120) may result in mechanical and/or electrical failures within several ESP parts, thereby affecting run life.
In one or more embodiments, during operation of the well system (106), the well monitoring and control system (126) monitors and controls the ESP (180). In one or more embodiments, the monitoring and control system (126) performs operations of methods described in reference to the flowchart of
In some embodiments, the well surface system (124) includes a wellhead (130). The wellhead (130) may include a rigid structure installed at the “up-hole” end of the wellbore (120), at or near where the wellbore (120) terminates at the Earth's surface (108). The wellhead (130) may include structures for supporting (or “hanging”) casing and production tubing extending into the wellbore (120). Production (121) may flow through the wellhead (130), after exiting the wellbore (120) and the well sub-surface system (122), including, for example, the casing and the production tubing.
In some embodiments, the well surface system (124) includes a surface sensing system (134). The surface sensing system (134) may include sensor devices for sensing characteristics of substances, including production (121), passing through or otherwise located in the well surface system (124). The characteristics may include, for example, pressure, temperature and flow rate of production (121) flowing through the wellhead (130), or other conduits of the well surface system (124), after exiting the wellbore (120).
While
A random forest model is an ensemble machine learning algorithm that uses multiple decision trees to make predictions. The architecture of random forest models is unique in that it combines multiple decision trees to reduce the risk of overfitting and improve the overall generalization of the model and the accuracy of predictions, in comparison to individual trees. This is based on the idea that multiple “weak learners” can combine to create a “strong learner.” Each individual classifier is considered a “weak learner,” while the group of classifiers functioning together is regarded as a “strong learner.” This approach allows random forests to effectively capture complex relationships and interactions between features, resulting in better predictive performance.
Each of the multiple decision trees operates on a different subset of the same dataset, followed by taking an average of the results to improve the overall accuracy of the predictions. In other words, instead of relying on a single decision tree, the random forest gathers predictions from each tree and makes a final prediction based on the majority of these predictions.
The architecture of a random forest model is suitable for predicting the failure of an ESP because it is capable of capturing complex and non-linear relationships between predictors a target variable. The predictors may originate from different categories. Data (210) associated with the ESP may be collected for these predictors and may serve as inputs to the prediction engine (220). Examples of the different categories and predictors in these categories include, but are not limited to:
Data (210) associated with the ESP may vary based on the specific ESP system and the data available. These and other data that may be acquired in real-time or near-real-time, e.g., by sensors (280) as shown in
Those skilled in the art will appreciate that the data (210) mentioned above are some examples of the input variables that may be used to build the random forest machine learning model to predict ESP lifespans in sour high-pressure and temperature environments. These inputs are fed into each node of a decision tree to build the random forest model. Feeding more inputs into a random forest predictive model can increase the complexity of the model and potentially lead to better predictions. On one hand, feeding more inputs can provide the model with more information fundamental to the performance of the ESP, and potentially improve its accuracy. On the other hand, to overcome the challenge of overfitting that results with increasing the size of the feature space and reducing the risk of the model relying too heavily on any one input, or from the additional irrelevant inputs that may contain noise, additional processing is required. Specifically, feature selection and cross-validation may be performed on the data. Cross-validation involves splitting the data into multiple training and validation sets and testing the random forest (ML) model on each of these. For example, cross-validation may be performed using the cross-val-score function from the scikit-learn library using Python software available in Anaconda.
The prediction (230) of remaining expected life of the ESP is the output of the prediction engine (220) when operating on the data (210) associated with the ESP. The prediction (230) of the remaining expected life of the ESP may be a number, such as the number of days remaining until failure, rather than a specific date or time of the anticipated failure or a more general measure of pump deterioration.
The training may be achieved by training the machine learning model on training data where the target variable is the time until failure, and the input features are related to ESPs and their environments.
In Step 302, the training data associated with the operation of ESPs are obtained. The training data may include data associated with any of the predictors in any of the categories as previously described. For example, the training data may be historical data recorded from ESPs as they were operating over time. The historical data may also include a documentation of failures of these ESPs, thereby allowing the training data to be used for a supervised training of the machine learning algorithm. To ensure good generalization, the training data may be comprehensive and may include data from different well environments, for different ESPs, etc. In other words, training data that accurately and completely covers the lifetime of the ESPs is obtained. The training data may include features that are based on any combination of the parameters previously discussed.
In Step 304, the training data is pre-processed. The training data may be corrupted, noisy, or incomplete, making it difficult to build a robust model. Accordingly, the training data may be pre-processed to remove errors, outliers, or missing values. The pre-processing may further involve feature engineering. The feature engineering may identify the most influential features that contribute to ESP failure, to improve the accuracy of the model. Less relevant or irrelevant features may be removed from the training data. Different tools may be used to identify the relevance and characteristics of features.
For example, a heatmap may be used to visually represent and analyze the relationship between two variables using a color-coded grid. The heatmap may provide insights into the strength, direction, and shape of the relationship between the two variables. Also, histograms may be used to visually explore the distribution of the data and to help identify patterns and relationships between variables.
The pre-processing of the training data may also involve a data transformation that involves converting the training data into a suitable format for the training of the machine learning model. The data transformation may involve, for example, scaling or normalizing of features.
In Step 306, the random forest model is trained. Bagging (bootstrap aggregating) may be used for the training. The training involves a random split of the training data into a training data set, a validation data set, and a test data set. The ratio for the random split may be, for example, 80:10:10.
The training data set may be used to train the random forest model. The algorithm uses the data in the training data set to learn patterns and relationships between the features and the target variable.
The validation data set may be used to fine-tune the hyperparameters of the model. The hyperparameters are parameters that are not learned by the model during training, but rather set by the user. These control the behavior of the algorithm and can have a significant impact on the performance of the model. The validation data set is used to test different combinations of hyperparameters and select the ones that result in the best performance.
The test data set may be used to evaluate the final performance of the random forest model. Once the hyperparameters have been selected using the validation data set, the model is trained again using both the training and validation data sets, and the test data set is used to evaluate its performance. The test data set is a completely independent set of data that the model has never seen before and is used to estimate how the model will perform on new, unseen data. By splitting the training data into a training data set, a validation data set, and a test data set, a robust and reliable random forest model that can generalize well to new, unseen data may be obtained.
The random split may be performed using a sampling with replacement. Training the random forest model in building a random forest model entails the process of building multiple decision trees, where each tree is constructed using a random subset of features and a random subset of the training data set. During the training process, the decision trees are iteratively built by splitting the data at each node based on the best feature that separates the data points. The splitting may be performed such that the impurity of the data points at each node is minimized, e.g., based on a mean-squared error. The random forest model may be trained until the desired number of decision trees is built, and each tree grown to the maximum depth. Once trained, each tree makes a prediction based on its own set of decision rules. The final prediction is based on the average or majority vote of the individual tree predictions.
The described approach helps to avoid unstable models that cannot adapt to the addition of new data as well as overfit models that do not generalize well. The use of class weights in this method gives importance to minority classes when handling imbalanced data. There is no need to prune as in decision trees. In the course of prediction, every tree is utilized for making distinctive predictions. The predictions are combined through voting such as the obtaining of the average outcomes.
In Step 308, the performance of the random forest model is evaluated. Evaluation of the performance may involve a model selection performed to ensure that the model accurately captures the relationship between the input features and the time until failure. Further, the trained random forest model is validated using the test data set to ensure that it generalizes well to new cases.
Steps 304-308 form a training iteration. After completion of a training iteration, the relevance of the features in the training data may be evaluated. To determine the relevance of a feature, a measure called “feature importance” is used. It is calculated by considering the reduction in impurity at a node, weighted by the probability of reaching that node. This probability is calculated by dividing the number of samples that reach that node by the total number of samples. If the value of the feature importance is higher, it indicates that the feature is more important. Less relevant or irrelevant features may be eliminated in a subsequently performed training iteration. In other words, Steps 304-308 may be repeated, e.g., with irrelevant features removed.
The execution of the method including training of the random forest with different sets of hyperparameters may end when a random forest model achieves the desired performance, or no further performance improvements can be achieved. The model selection may be used to select the best-performing model (based on specified performance metrics) of the models that have been generated by repeatedly performing Steps 304-308. Metrics used for evaluation may include, for example, the mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), and R-squared.
After completion of the training, a random forest model is available to perform prediction of the timeline of a failure event for an ESP, as discussed in reference to
In Step 402, data associated with the ESP are obtained. The data may include parameters as those used for the training of the random forest model. Some of the parameters may be obtained from sensors, e.g., in real-time or near-real-time. Other parameters may be obtained from databases.
In Step 404, the data associated with the ESP are pre-processed. The pre-processing may be performed analogous to the pre-processing in Step 304.
In Step 406, the expected remaining life of the ESP is predicted. The prediction is performed using the random forest model operating on the data associated with the ESP. Given the data, the model applies the set of decision trees in the random forest to the data, and the prediction is generated by aggregating the predictions of the individual trees by calculating the mean. The timeline of failure events in this context refers to the predicted output values for each time point in the future, based on the input features provided to the model.
In Step 408, the remaining expected life is reported. The value may be reported to a user. A warning or notification may be provided when the remaining expected life drops below a specified threshold value.
In Step 410, actions may be taken, based on various actionable insights resulting from the execution of the method. These actions are subsequently discussed.
The method (400) may be executed in a loop, e.g., whenever new data become available, or at a fixed rate, e.g., once per hour, once per day, etc.
Embodiments may be implemented on a computer system.
The computer (502) can serve in a role as a client, network component, a server, a database or other persistency, or any other component (or a combination of roles) of a computer system for performing the subject matter described in the instant disclosure. The illustrated computer (502) is communicably coupled with a network (530). In some implementations, one or more components of the computer (502) may be configured to operate within environments, including cloud-computing-based, local, global, or other environment (or a combination of environments).
At a high level, the computer (502) is an electronic computing device operable to receive, transmit, process, store, or manage data and information associated with the described subject matter. According to some implementations, the computer (502) may also include or be communicably coupled with an application server, e-mail server, web server, caching server, streaming data server, business intelligence (BI) server, or other server (or a combination of servers).
The computer (502) can receive requests over network (530) from a client application (for example, executing on another computer (502)) and responding to the received requests by processing the said requests in an appropriate software application. In addition, requests may also be sent to the computer (502) from internal users (for example, from a command console or by other appropriate access method), external or third-parties, other automated applications, as well as any other appropriate entities, individuals, systems, or computers.
Each of the components of the computer (502) can communicate using a system bus (503). In some implementations, any or all of the components of the computer (502), both hardware or software (or a combination of hardware and software), may interface with each other or the interface (504) (or a combination of both) over the system bus (503) using an application programming interface (API) (512) or a service layer (513) (or a combination of the API (512) and service layer (513). The API (512) may include specifications for routines, data structures, and object classes. The API (512) may be either computer-language independent or dependent and refer to a complete interface, a single function, or even a set of APIs. The service layer (513) provides software services to the computer (502) or other components (whether or not illustrated) that are communicably coupled to the computer (502). The functionality of the computer (502) may be accessible for all service consumers using this service layer. Software services, such as those provided by the service layer (513), provide reusable, defined business functionalities through a defined interface. For example, the interface may be software written in JAVA, C++, or other suitable language providing data in extensible markup language (XML) format or other suitable format. While illustrated as an integrated component of the computer (502), alternative implementations may illustrate the API (512) or the service layer (513) as stand-alone components in relation to other components of the computer (502) or other components (whether or not illustrated) that are communicably coupled to the computer (502). Moreover, any or all parts of the API (512) or the service layer (513) may be implemented as child or sub-modules of another software module, enterprise application, or hardware module without departing from the scope of this disclosure.
The computer (502) includes an interface (504). Although illustrated as a single interface (504) in
The computer (502) includes at least one computer processor (505). Although illustrated as a single computer processor (505) in
The computer (502) also includes a memory (506) that holds data for the computer (502) or other components (or a combination of both) that can be connected to the network (530). For example, memory (506) can be a database storing data consistent with this disclosure. Although illustrated as a single memory (506) in
The application (507) is an algorithmic software engine providing functionality according to particular needs, desires, or particular implementations of the computer (502), particularly with respect to functionality described in this disclosure. For example, application (507) can serve as one or more components, modules, applications, etc. Further, although illustrated as a single application (507), the application (507) may be implemented as multiple applications (507) on the computer (502). In addition, although illustrated as integral to the computer (502), in alternative implementations, the application (507) can be external to the computer (502).
There may be any number of computers (502) associated with, or external to, a computer system containing computer (502), each computer (502) communicating over network (530). Further, the term “client,” “user,” and other appropriate terminology may be used interchangeably as appropriate without departing from the scope of this disclosure. Moreover, this disclosure contemplates that many users may use one computer (502), or that one user may use multiple computers (502).
In some embodiments, the computer (502) is implemented as part of a cloud computing system. For example, a cloud computing system may include one or more remote servers along with various other cloud components, such as cloud storage units and edge servers. In particular, a cloud computing system may perform one or more computing operations without direct active management by a user device or local computer system. As such, a cloud computing system may have different functions distributed over multiple locations from a central server, which may be performed using one or more Internet connections. More specifically, a cloud computing system may operate according to one or more service models, such as infrastructure as a service (IaaS), platform as a service (PaaS), software as a service (SaaS), mobile “backend” as a service (MBaaS), serverless computing, artificial intelligence (AI) as a service (AIaaS), and/or function as a service (FaaS).
Although only a few example embodiments have been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the example embodiments without materially departing from this invention. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the following claims. In the claims, any means-plus-function clauses are intended to cover the structures described herein as performing the recited function(s) and equivalents of those structures. Similarly, any step-plus-function clauses in the claims are intended to cover the acts described here as performing the recited function(s) and equivalents of those acts. It is the express intention of the applicant not to invoke 35 U.S.C. § 112(f) for any limitations of any of the claims herein, except for those in which the claim expressly uses the words “means for” or “step for” together with an associated function.