PREDICTING GAS LIFT EQUIPMENT FAILURE WITH DEEP LEARNING TECHNIQUES

BACKGROUND

In the petroleum industry, the gas lift equipment is used to sustain or increase the flow of fluids, such as crude oil, from a production well. Initially, hydrocarbons flow to the surface unaided when the reservoir energy is sufficient. As the water cut in the produced fluid increases over a period of time, the reservoir energy drops and may not be sufficient to overcome the hydrostatic pressure of the fluid column. The fluid flow to the surface ceases at this point.

The injection of gas from the surface into the production tubing reduces hydrostatic pressure and restores the upward movement of the fluids to the surface. As a result, the fluid flow to the surface is restored. Conventionally, this process is known as “gas lift.” Gas lift equipment includes various tools such as gas lift valves, mandrels, gas injection equipment, etc.

SUMMARY

This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.

In general, in one aspect, embodiments disclosed herein relate to a method for predicting gas lift equipment failure including obtaining data from a plurality of sources, the plurality of sources including sensor readings, maintenance records, operational parameters, and production targets and determining a plurality of initial equipment failure probabilities using a plurality of machine learning models. Further, the method includes determining an ensemble prediction using an ensemble model trained on the plurality of initial equipment failure probabilities and performing, in response to the ensemble prediction, a maintenance operation of the gas lift equipment.

In general, in one aspect, embodiments disclosed herein relate to a non-transitory computer readable medium storing a set of instructions executable by a computer processor for predicting gas lift equipment failure. The set of instructions includes the functionality for obtaining data from a plurality of sources, the plurality of sources including sensor readings, maintenance records, operational parameters, and production targets and determining a plurality of initial equipment failure probabilities using a plurality of machine learning models. Further an ensemble prediction is determined using an ensemble model trained on the plurality of initial equipment failure probabilities and a maintenance operation of a gas lift equipment is performed in response to the ensemble prediction.

In general, in one aspect, embodiments disclosed herein relate to a system including a well logging system and an equipment failure simulator comprising a computer processor, wherein the equipment failure simulator is coupled to the well logging, the equipment failure simulator comprising a functionality for obtaining data from a plurality of sources, the plurality of sources including sensor readings, maintenance records, operational parameters, and production targets and determining a plurality of initial equipment failure probabilities using a plurality of machine learning models. Further an ensemble prediction is determined using an ensemble model trained on the plurality of initial equipment failure probabilities and a maintenance operation of a gas lift equipment is performed in response to the ensemble prediction.

Other aspects and advantages will be apparent from the following description and the appended claims.

BRIEF DESCRIPTION OF DRAWINGS

Specific embodiments of the disclosed technology will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency. The sizes and relative positions of elements in the drawings are not necessarily drawn to scale. For example, the shapes of various elements and angles are not necessarily drawn to scale, and some of these elements may be arbitrarily enlarged and positioned to improve drawing legibility. Further, the particular shapes of the elements as drawn are not necessarily intended to convey any information regarding the actual shape of the particular elements and have been solely selected for ease of recognition in the drawing.

FIG. 1 shows a system according to embodiments of the present disclosure.

FIG. 2 shows a flowchart in accordance with one or more embodiments.

FIG. 3 shows a system according to embodiments of the present disclosure.

FIG. 4 shows a neural network in accordance with one or more embodiments.

FIG. 5 shows a flowchart in accordance with one or more embodiments.

FIG. 6 shows a model architecture in accordance with one or more embodiments.

FIG. 7 shows a machine learning models diagram in accordance with one or more embodiments.

FIG. 8 shows a computer system in accordance with one or more embodiments.

DETAILED DESCRIPTION

In the following detailed description of embodiments disclosed herein, numerous specific details are set forth in order to provide a more thorough understanding disclosed herein. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.

Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers does not imply or create a particular ordering of the elements or limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before,” “after,” “single,” and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.

In the following description of FIGS. 1-8, any component described with regard to a figure, in various embodiments disclosed herein, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments disclosed herein, any description of the components of a figure is to be interpreted as an optional embodiment which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.

It is to be understood that the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a horizontal beam” includes reference to one or more of such beams.

Terms such as “approximately,” “substantially,” etc., mean that the recited characteristic, parameter, or value need not be achieved exactly, but that deviations or variations, including for example, tolerances, measurement error, measurement accuracy limitations and other factors known to those of skill in the art, may occur in amounts that do not preclude the effect the characteristic was intended to provide.

It is to be understood that one or more of the steps shown in the flowcharts may be omitted, repeated, and/or performed in a different order than the order shown. Accordingly, the scope disclosed herein should not be considered limited to the specific arrangement of steps shown in the flowcharts.

Although multiple dependent claims are not introduced, it would be apparent to one of ordinary skill that the subject matter of the dependent claims of one or more embodiments may be combined with other dependent claims.

Embodiments disclosed herein provide a method and system for predicting gas lift equipment failure using deep learning models. Specifically, the gas lift equipment, including, at least, valves, orifices, mandrels, supply line, compressors, and pressure gauges is used in the oil and gas industry to optimize the production of oil wells. However, this equipment may fail over time leading to decreased production and higher maintenance costs. The deep learning models may be used to predict when gas lift equipment is likely to fail, allowing for proactive maintenance and minimizing downtime. To predict equipment failure, the deep learning models may be trained using support vector machines, artificial neural networks, and random forests classifiers on data from the equipment, including sensor readings, maintenance records, and operational parameters.

In one or more embodiments, this method may be implemented on already existing hardware packages as a plug-in module. The plug-in module may be deployed on the existing analytics setup. Initially the hardware package may be deployed as a standalone setup, with an interface provided to the driller to run Graphical User Interface (GUI). That ensures that the development and roll-out phase of the project are easier to implement, without requiring a permanent rig fixture. Alternatively, the system may be integrated directly to the rig to perform a maintenance operation.

Further, embodiments disclosed herein enable a user to operate the gas lift equipment from a distance, without an effect on normal operations. The method uses techniques including modelling and software based on artificial intelligence (AI) models, including machine learning and deep learning. Specifically, the data-driven approach of using historical data and frameworks other than traditional methods may accurately predict when gas lift equipment is likely to fail, allowing for proactive maintenance and minimizing downtime. This approach is a significant technological advancement in the oil and gas industry, as equipment failure may result in decreased production and higher maintenance costs. The method provides a solution to this problem by addressing the issue proactively and using data to guide maintenance efforts. With this proactive approach, the method helps maintenance teams schedule repairs and replacements more effectively, reducing the risk of unexpected downtime and increasing the overall efficiency of oil production.

Additionally, embodiments disclosed herein enable prediction of equipment failure in the oil and gas industry using a data-driven approach that leverages predictive analytics techniques to analyze historical data from the gas lift equipment. The approach involves training supervised models on labeled datasets that include sensor readings, maintenance records, and operational parameters. The models learn to identify patterns and relationships between input features and the occurrence of equipment failure, enabling them to predict when equipment failure is likely to occur. Real-time sensor data from the equipment is then analyzed using the trained models to make predictions about when failure is likely to occur. This approach allows maintenance teams to schedule repairs and replacements more effectively, minimizing unexpected downtime and optimizing oil production efficiency. Compared to traditional maintenance approaches, this data-driven approach is more efficient and cost-effective.

Further, the predictive maintenance may help reduce the cost of maintenance by allowing for maintenance to be performed only when needed, rather than on a fixed schedule. Additionally, the predictive maintenance may help reduce unnecessary downtime and maintenance costs. By predicting equipment failure before it occurs, artificial intelligence models can help prevent safety incidents by allowing for timely maintenance or replacement of faulty equipment. The embodiments described in this disclosure may enable real-time monitoring of equipment performance, allowing for early detection of potential issues and faster response times.

Additionally, embodiments described in this disclosure may provide data-driven insights into equipment performance, allowing for more informed decision-making and continuous improvement. By reducing unexpected downtime and optimizing maintenance schedules, the equipment failure prediction may lead to cost savings for the operator and the use of the invention for predictive maintenance of gas lift equipment has the potential to improve equipment reliability, reduce maintenance costs, and increase overall efficiency compared to commercially available alternatives.

FIG. 1 shows a schematic diagram in accordance with one or more embodiments. As shown in FIG. 1, a well environment (100) includes a hydrocarbon reservoir (“reservoir”) (102) located in a subsurface hydrocarbon-bearing formation (“formation”) (104) and a well system (106). The hydrocarbon-bearing formation (104) may include a porous or fractured rock formation that resides underground, beneath a geological surface (“surface”) (108). In the case of the well system (106) being a hydrocarbon well, the reservoir (102) may include a portion of the hydrocarbon-bearing formation (104). The hydrocarbon-bearing formation (104) and the reservoir (102) may include different layers of rock having varying characteristics, such as varying degrees of permeability, porosity, capillary pressure, and resistivity. In the case of the well system (106) being operated as a production well, the well system (106) may facilitate the extraction of hydrocarbons (or “production”) from the reservoir (102).

In some embodiments, the well system (106) includes a rig (101), a drilling system (110), a logging system (111), an equipment failure simulator (112), a wellbore (120), a well sub-surface system (122), a well surface system (124), and a well control system (“control system”) (126). The drilling system (110) may include a drill string, a drill bit, and a mud circulation system for use in drilling the wellbore (120) into the formation (104). The logging system (111) may include one or more logging tools, for use in generating well logs, based on the sensing system (134), of the formation (104). The well control system (126) may control various operations of the well system (106), such as well production operations, well drilling operation, well completion operations, well maintenance operations, and reservoir monitoring, assessment and development operations. In some embodiments, the well control system (126) includes a computer system that is the same as or similar to that of a computer system (800) described below in FIG. 8 and the accompanying description.

The rig (101) is a combination of equipment used to drill a borehole to form the wellbore (120). Major components of the rig (101) include the drilling fluid tanks, the drilling fluid pumps (e.g., rig mixing pumps), the derrick or mast, the draw works, the rotary table or top drive, the drill string, the power generation equipment and auxiliary equipment.

The wellbore (120) includes a bored hole (i.e., borehole) that extends from the surface (108) into a target zone of the hydrocarbon-bearing formation (104), such as the reservoir (102). An upper end of the wellbore (120), terminating at or near the surface (108), may be referred to as the “up-hole” end of the wellbore (120), and a lower end of the wellbore, terminating in the hydrocarbon-bearing formation (104), may be referred to as the “downhole” end of the wellbore (120). The wellbore (120) may facilitate the circulation of drilling fluids during drilling operations, flow of hydrocarbon production (“production”) (121) (e.g., oil and gas) from the reservoir (102) to the surface (108) during production operations, the injection of substances (e.g., water) into the hydrocarbon-bearing formation (104) or the reservoir (102) during injection operations, or the communication of monitoring devices (e.g., logging tools) lowered into the hydrocarbon-bearing formation (104) or the reservoir (102) during monitoring operations (e.g., during in situ logging operations).

In some embodiments, during operation of the well system (106), the well control system (126) collects and records well data (140) for the well system (106). During drilling operation of the well (106), the well data (140) may include mud properties, flow rates measured by a flow rate sensor (139), drill volume and penetration rates, formation characteristics, etc. To drill a subterranean well or wellbore (120), a drill string (110), including a drill bit and drill collars to weight the drill bit, may be inserted into a pre-drilled hole and rotated to cut into the rock at the bottom of the hole, producing rock cuttings. Commonly, the drilling fluid, or drilling mud, may be utilized during the drilling process. To remove the rock cuttings from the bottom of the wellbore (120), drilling fluid is pumped down through the drill string (110) to the drill bit. The drilling fluid may cool and lubricate the drill bit and provide hydrostatic pressure in the wellbore (120) to provide support to the sidewalls of the wellbore (120). The drilling fluid may also prevent the sidewalls from collapsing and caving in on the drill string (110) and prevent fluids in the downhole formations from flowing into the wellbore (120) during drilling operations. Additionally, the drilling fluid may lift the rock cuttings away from the drill bit and upwards as the drilling fluid is recirculated back to the surface. The drilling fluid may transport rock cuttings from the drill bit to the surface, which can be referred to as “cleaning” the wellbore (120), or hole cleaning.

In some embodiments, the well data (140) are recorded in real-time, and are available for review or use within seconds, minutes or hours of the condition being sensed (e.g., the measurements are available within 1 hour of the condition being sensed). In such an embodiment, the well data (140) may be referred to as “real-time” well data (140). Real-time well data (140) may enable an operator of the well (106) to assess a relatively current state of the well system (106), and make real-time decisions regarding a development of the well system (106) and the reservoir (102), such as on-demand adjustments in drilling fluid and regulation of production flow from the well.

In some embodiments, the well surface system (124) includes a wellhead (130). The wellhead (130) may include a rigid structure installed at the “up-hole” end of the wellbore (120), at or near where the wellbore (120) terminates at the geological surface (108). The wellhead (130) may include structures for supporting (or “hanging”) casing and production tubing extending into the wellbore (120). Production (121) may flow through the wellhead (130), after exiting the wellbore (120) and the well sub-surface system (122), including, for example, the casing and the production tubing. In some embodiments, the well surface system (124) includes flow regulating devices that are operable to control the flow of substances into and out of the wellbore (120). For example, the well surface system (124) may include one or more production valves (132) that are operable to control the flow of production (121). For example, a production valve (132) may be fully opened to enable the unrestricted flow of production (121) from the wellbore (120), the production valve (132) may be partially opened to partially restrict (or “throttle”) the flow of production (121) from the wellbore (120), and production valve (132) may be fully closed to fully restrict (or “block”) the flow of production (121) from the wellbore (120), and through the well surface system (124).

In some embodiments, the wellhead (130) includes a choke assembly. For example, the choke assembly may include hardware with functionality for opening and closing the fluid flow through pipes in the well system (106). Likewise, the choke assembly may include a pipe manifold that may lower the pressure of fluid traversing the wellhead. As such, the choke assembly may include a set of high-pressure valves and at least two chokes. These chokes may be fixed or adjustable or a mix of both. Redundancy may be provided so that if one choke has to be taken out of service, the flow can be directed through another choke. In some embodiments, pressure valves and chokes are communicatively coupled to the well control system (126). Accordingly, a well control system (126) may obtain wellhead data regarding the choke assembly as well as transmit one or more commands to components within the choke assembly in order to adjust one or more choke assembly parameters.

Keeping with FIG. 1, in some embodiments, the well surface system (124) includes a surface sensing system (134). The surface sensing system (134) may include sensors for sensing characteristics of substances, including production (121), passing through or otherwise located in the well surface system (124). The characteristics may include, for example, pressure, temperature and flow rate of production (121) flowing through the wellhead (130), or other conduits of the well surface system (124), after exiting the wellbore (120). The surface sensing system (134) may also include sensors for sensing characteristics of the rig (101), such as bit depth, hole depth, drilling fluid flow, hook load, rotary speed, etc.

In some embodiments, the well system (106) includes the equipment failure simulator (112). For example, the equipment failure simulator (112) may include hardware and/or software with functionality for generating equipment failure prediction score, initiating and performing maintenance operations, and/or performing one or more reservoir simulations. For example, the equipment failure simulator (112) may store the historic data, maintenance records, operational parameters, production targets, etc. For this purpose, the simulator may include memory with one or more data structures, such as a buffer, a table, an array, or any other suitable storage medium. The equipment failure simulator (112) may further, at least, analyze the historic data, sensor readings, maintenance records, operational parameters, production targets, determine a equipment failure score. While equipment failure simulator (112) is shown at a well site, in some embodiments, the equipment failure simulator (112) may be located remotely from well site. In some embodiments, equipment failure simulator (112) may include a computer system that is similar to the computer system (800) described below with regard to FIG. 8 and the accompanying description.

FIG. 2 shows a flowchart in accordance with one or more embodiments for predicting gas lift equipment failure using deep learning models. Specifically, in Block 201, data is obtained from a plurality of sources. As shown in FIG. 3, the data may include inputs such as sensor readings (311), maintenance records (312), operational parameters (313), and production targets (314). In one or more embodiments the sensor readings (311), the maintenance records (312), the operational parameters (313), and production targets (314) may be obtained in real-time. In other embodiments, the sensor readings (311), the maintenance records (312), the operational parameters (313), and production targets (314) may be obtained sequentially or immediately after drilling operations are performed.

The sensor readings (311) may include, at least, data about pressure, temperature, flow rate, and vibration. The sensor readings may be obtained using specialized tools such as, as least, thermometers, pressure gauges, and flowmeters (e.g., venturi meters, turbine meters, ultrasonic meters, electromagnetic meters, etc.). Further, the maintenance records (312) include data about date of last maintenance, types of maintenance performed, and reason for the last maintenance. The operational parameters (313) include data about time of the equipment's operation, load of the equipment, and speed of the equipment. Further, the production targets (314) include data on expected performance of the equipment.

In Block 202, the sensor readings, the maintenance records, and the operational parameters are inputted to a trained machine learning (ML) model to obtain an initial equipment failure probability. Further, the sensor readings, the maintenance records, and the operational parameters are split into training, validation, and test sets. In some embodiments, the validation and test set may be the same such that the data is effectively only split into two distinct sets. In some instances, training the machine learning model may be performed before preprocessing. In this case, it is common to determine the preprocessing parameters, if any, using the training set and then to apply these parameters to the validation and test sets. Additionally, the model may be regularly retrained, the model may be retrained when new data regarding the equipment failure is collected, when new equipment is added to the system, or when the accuracy threshold is modified.

As shown in FIG. 3, Support Vector Machines (320), Random Forest (330), and Neural Networks (340) may be used together to form the ensemble model (350) to predict gas lift equipment failure. The ensemble model (350) may combine the strengths of individual models to achieve better predictive performance. As discussed further below in FIGS. 3-5, the weights assigned to each model can be determined by their respective performances on the validation or test set.

The equipment failure simulator (112) may include hardware and/or software with functionality for generating and/or updating one Machine learning (ML), broadly defined, is the extraction of patterns and insights from data. The phrases “artificial intelligence,” “machine learning,” “deep learning,” and “pattern recognition” are often convoluted, interchanged, and used synonymously throughout the literature. This ambiguity arises because the field of “extracting patterns and insights from data” was developed simultaneously and disjointedly among a number of classical arts like mathematics, statistics, and computer science. For consistency, the term machine learning, or machine-learned, will be adopted herein. However, one skilled in the art will recognize that the concepts and methods detailed hereafter are not limited by this choice of nomenclature.

Machine-learned model types may include, but are not limited to, generalized linear models, Bayesian regression, random forests, and deep models such as neural networks, convolutional neural networks, and recurrent neural networks. Machine-learned model types, whether they are considered deep or not, are usually associated with additional “hyperparameters” which further describe the model. For example, hyperparameters providing further detail about a neural network may include, but are not limited to, the number of layers in the neural network, choice of activation functions, inclusion of batch normalization layers, and regularization strength. Commonly, in the literature, the selection of hyperparameters surrounding a machine-learned model is referred to as selecting the model “architecture.” Once a machine-learned model type and hyperparameters have been selected, the machine-learned model is trained to perform a task.

Herein, a cursory introduction to various machine-learned models such as a neural network (NN) and convolutional neural network (CNN) are provided as these models are often used as components—or may be adapted and/or built upon—to form more complex models such as autoencoders and diffusion models. However, it is noted that many variations of neural networks, convolutional neural networks, autoencoders, transformers, and diffusion models exist. Therefore, one with ordinary skill in the art will recognize that any variations to the machine-learned models that differ from the introductory models discussed herein may be employed without departing from the scope of this disclosure. Further, it is emphasized that the following discussions of machine-learned models are basic summaries and should not be considered limiting.

A diagram of a neural network is shown in FIG. 4. At a high level, a neural network (400) may be graphically depicted as being composed of nodes (402), where here any circle represents a node, and edges (404), shown here as directed lines. The nodes (402) may be grouped to form layers (405). FIG. 4 displays four layers (408, 410, 412, 414) of nodes (402) where the nodes (402) are grouped into columns, however, the grouping need not be as shown in FIG. 4. The edges (404) connect the nodes (402). Edges (404) may connect, or not connect, to any node(s) (402) regardless of which layer (405) the node(s) (402) is in. That is, the nodes (402) may be sparsely and residually connected. A neural network (400) will have at least two layers (405), where the first layer (408) is considered the “input layer” and the last layer (414) is the “output layer.” Any intermediate layer (410, 412) is usually described as a “hidden layer.” A neural network (400) may have zero or more hidden layers (410, 412) and a neural network (400) with at least one hidden layer (410, 412) may be described as a “deep” neural network or as a “deep learning method.” In general, a neural network (400) may have more than one node (402) in the output layer (414). In this case the neural network (400) may be referred to as a “multi-target” or “multi-output” network.

Nodes (402) and edges (404) carry additional associations. Namely, every edge is associated with a numerical value. The edge numerical values, or even the edges (404) themselves, are often referred to as “weights” or “parameters.” While training a neural network (400), numerical values are assigned to each edge (404). Additionally, every node (402) is associated with a numerical variable and an activation function. Activation functions are not limited to any functional class, but traditionally follow the form

$\begin{matrix} A = f (\sum_{i \in (incoming)} [{(node value)}_{i} {(edge value)}_{i}]), & (Equation 1) \end{matrix}$

where i is an index that spans the set of “incoming” nodes (402) and edges (404) and ƒ is a user-defined function. Incoming nodes (402) are those that, when viewed as a graph (as in FIG. 4), have directed arrows that point to the node (402) where the numerical value is being computed. Some functions for ƒ may include the linear function ƒ(x)=x, sigmoid function

$f (x) = \frac{1}{1 + e^{- x}},$

and rectified linear unit function ƒ(x)=max(0,x), however, many additional functions are commonly employed. Every node (402) in a neural network (400) may have a different associated activation function. Often, as a shorthand, activation functions are described by the function ƒ by which it is composed. That is, an activation function composed of a linear function ƒ may simply be referred to as a linear activation function without undue ambiguity.

When the neural network (400) receives an input, the input is propagated through the network according to the activation functions and incoming node (402) values and edge (404) values to compute a value for each node (402). That is, the numerical value for each node (402) may change for each received input. Occasionally, nodes (402) are assigned fixed numerical values, such as the value of 1, that are not affected by the input or altered according to edge (404) values and activation functions. Fixed nodes (402) are often referred to as “biases” or “bias nodes” (406), displayed in FIG. 4 with a dashed circle.

In some implementations, the neural network (400) may contain specialized layers (405), such as a normalization layer, or additional connection procedures, like concatenation. One skilled in the art will appreciate that these alterations do not exceed the scope of this disclosure.

As noted, the training procedure for the neural network (400) comprises assigning values to the edges (404). To begin training the edges (404) are assigned initial values. These values may be assigned randomly, assigned according to a prescribed distribution, assigned manually, or by some other assignment mechanism. Once edge (404) values have been initialized, the neural network (400) may act as a function, such that it may receive inputs and produce an output. As such, at least one input is propagated through the neural network (400) to produce an output. Training data is provided to the neural network (400). Generally, training data consists of pairs of inputs and associated targets. The targets represent the “ground truth,” or the otherwise desired output, upon processing the inputs. During training, the neural network (400) processes at least one input from the training data and produces at least one output. Each neural network (400) output is compared to its associated input data target. The comparison of the neural network (400) output to the target is typically performed by a so-called “loss function;” although other names for this comparison function such as “error function,” “misfit function,” and “cost function” are commonly employed. Many types of loss functions are available, such as the mean-squared-error function, however, the general characteristic of a loss function is that the loss function provides a numerical evaluation of the similarity between the neural network (400) output and the associated target. The loss function may also be constructed to impose additional constraints on the values assumed by the edges (404), for example, by adding a penalty term, which may be physics-based, or a regularization term. Generally, the goal of a training procedure is to alter the edge (404) values to promote similarity between the neural network (400) output and associated target over the training data. Thus, the loss function is used to guide changes made to the edge (404) values, typically through a process called “backpropagation.”

While a full review of the backpropagation process exceeds the scope of this disclosure, a brief summary is provided. Backpropagation consists of computing the gradient of the loss function over the edge (404) values. The gradient indicates the direction of change in the edge (404) values that results in the greatest change to the loss function. Because the gradient is local to the current edge (404) values, the edge (404) values are typically updated by a “step” in the direction indicated by the gradient. The step size is often referred to as the “learning rate” and need not remain fixed during the training process. Additionally, the step size and direction may be informed by previously seen edge (404) values or previously computed gradients. Such methods for determining the step direction are usually referred to as “momentum” based methods.

Once the edge (404) values have been updated, or altered from their initial values, through a backpropagation step, the neural network (400) will likely produce different outputs. Thus, the procedure of propagating at least one input through the neural network (400), comparing the neural network (400) output with the associated target with a loss function, computing the gradient of the loss function with respect to the edge (404) values, and updating the edge (404) values with a step guided by the gradient, is repeated until a termination criterion is reached. Common termination criteria are: reaching a fixed number of edge (404) updates, otherwise known as an iteration counter; a diminishing learning rate; noting no appreciable change in the loss function between iterations; reaching a specified performance metric as evaluated on the data or a separate hold-out data set. Once the termination criterion is satisfied, and the edge (404) values are no longer intended to be altered, the neural network (400) is said to be “trained.”

One or more embodiments disclosed herein employ a convolutional neural network (CNN). A CNN is similar to a neural network (400) in that it can technically be graphically represented by a series of edges (404) and nodes (402) grouped to form layers. However, it is more informative to view a CNN as structural groupings of weights; where here the term structural indicates that the weights within a group have a relationship. CNNs are widely applied when the data inputs also have a structural relationship, for example, a spatial relationship where one input is always considered “to the left” of another input. Grid data, which may be three-dimensional, has such a structural relationship because each data element, or grid point, in the grid data has a spatial location (and sometimes also a temporal location when grid data is allowed to change with time). Consequently, a CNN is an intuitive choice for processing grid data.

A structural grouping, or group, of weights is herein referred to as a “filter”. The number of weights in a filter is typically much less than the number of inputs, where here the number of inputs refers to the number of data elements or grid points in a set of grid data. In a CNN, the filters can be thought as “sliding” over, or convolving with, the inputs to form an intermediate output or intermediate representation of the inputs which still possesses a structural relationship. Like unto the neural network (400), the intermediate outputs are often further processed with an activation function. Many filters may be applied to the inputs to form many intermediate representations. Additional filters may be formed to operate on the intermediate representations creating more intermediate representations. This process may be repeated as prescribed by a user. There is a “final” group of intermediate representations, wherein no more filters act on these intermediate representations. In some instances, the structural relationship of the final intermediate representations is ablated; a process known as “flattening.” The flattened representation may be passed to a neural network (400) to produce a final output. Note, that in this context, the neural network (400) is still considered part of the CNN. Like unto a neural network (400), a CNN is trained, after initialization of the filter weights, and the edge (404) values of the internal neural network (400), if present, with the backpropagation process in accordance with a loss function.

A common architecture for CNNs is the so-called “U-net.” The term U-net is derived because a CNN after this architecture is composed of an encoder branch and a decoder branch that, when depicted graphically, often form the shape of the letter “U.” Generally, in a U-net type CNN the encoder branch is composed of N encoder blocks and the decoder branch is composed of N decoder blocks, where N≥1. The value of N may be considered a hyperparameter that can be prescribed by user or learned (or tuned) during a training and validation procedure. Typically, each encoder block and each decoder block consist of a convolutional operation, followed by an activation function and the application of a pooling (i.e., downsampling) or upsampling operation. Further, in a U-net type CNN each of the N encoder and decoder blocks may be said to form a pair. Intermediate data representations output by an encoder block may be passed to, and often concatenated with other data, an associated (i.e., paired) decoder block through a “skip” connection or “residual” connection.

Turning to random forests, a random forest model may an algorithmic model that combines the output of multiple decision trees to reach a single predicted result. For example, a random forest model may be composed of a collection of decision trees, where training the random forest model may be based on three main hyperparameters that include node size, a number of decision trees, and a number of input features being sampled. During training, a random forest model may allow different decision trees to randomly sample from a dataset with replacement (e.g., from a bootstrap sample) to produce multiple final decision trees in the trained model. For example, when multiple decision trees form an ensemble in the random forest model, this ensemble may determine more accurate predicted data, particularly when the individual trees are uncorrelated with each other. In some embodiments, a random forest model implements a software algorithm that is an extension of a bagging method. As, a random forest model may use both bagging and feature randomness to create an uncorrelated forest of decision trees. Feature randomness (also referred to as “feature bagging”) may generate a random subset of input features. This random subject may thereby result in low correlation among decision trees in the random forest model. In a training operation for a random forest model, a training operation may search for decision trees that provide the best split to subset particular data, such as through a Classification and Regression Tree (CART) algorithm. Different metrics, such as information gain or mean square error (MSE), may be used to determine the quality of a data split for various decision trees.

Keeping with random forests, a random forest model may be a classifier that uses data having discrete labels or classes. Likewise, a random forest model may also be used as a random forest regressor to solve regression problems. Depending on the type of problem being addressed by the random forest model, how predicted data is determined may vary accordingly. For a regression task, the individual decision trees may be averaged in a predicted result. For a classification task, a majority vote (e.g., predicting an output based on the most frequent categorical variable) may determine a predicted class. In a random forest regressor, the model may work with data having a numeric or continuous output, which cannot be defined by distinct classes.

Turning to reinforcement learning, a simulator may perform one or more reinforcement learning algorithms using a reinforcement learning system to train a machine-learning model. In particular, a reinforcement learning algorithm may be a type of method that autonomously learns agent policies through multiple iterations of trials and evaluations based on observation data. The objective of a reinforcement learning algorithm may be to learn an agent policy 7L that maps one or more states of an environment to an action so as to maximize an expected reward J(π). A value reward may describe one or more qualities of a particular state, agent action, and/or trajectory at particular time within an operation, such as an electric power generation operation. As such, a reinforcement learning system may include hardware and/or software with functionality for implementing one or more reinforcement learning algorithms. For example, a reinforcement learning algorithm may train a policy to make a sequence of decisions based on the observed states of the environment to maximize the cumulative reward determined by a reward function. For example, a reinforcement learning algorithm may employ a trial-and-error procedure to determine one or more agent policies based on various agent interactions with a complex environment, such as a geological subsurface with various geological interfaces and different formations. As such, a reinforcement learning algorithm may include a reward function that teaches a particular action selection engine to follow certain rules, while still allowing the reinforcement learning model to retain information learned from previous simulations.

In some embodiments, one or more components in a reinforcement learning system are trained using a training system. For example, an agent policy and/or a reward function may be updated through a training process that is performed by a machine-learning algorithm. In some embodiments, historical data, augmented data, and/or synthetic data may provide a supervised signal for training an action selector engine, an agent policy, and/or a reward function, such as through an imitation learning algorithm. In another embodiment, an interactive expert may provide data for adjusting agent policies and/or reward functions. However, if case of an inadequate amount of historical data to train the predictive model, the model may not have the capability to predict equipment failure precisely, causing false alarms or undetected equipment failures and leading to ineffective maintenance planning and increased operational outages.

Turning to deep reinforcement learning, deep reinforcement learning may combine various machine-learning models (e.g., artificial neural networks) with a framework of reinforcement learning that helps agents learn how to reach their goals. That is, deep reinforcement learning may use both function approximation and target optimization in order to map various states and actions to specific rewards. For example, artificial neural networks as used in computer vision, natural language processing, and time series predictions may be combined with reinforcement learning algorithms.

Another type of machine-learned model is a transformer. A detailed description of a transformer exceeds the scope of this disclosure. However, in summary, a transformer may be said to be deep neural network capable of learning context among data features. Generally, transformers act on sequential data (such as a sentence where the words form an ordered sequence). Transformers often determine or track the relative importance of features in input and output (or target) data through a mechanism known as “attention.” In some instances, attention mechanism may further be specified as “self-attention” and “cross-attention,” where self-attention determines the importance of features of a data set (e.g., input data, intermediate data) relative to other features of the data set. For example, if the data set is formatted as a vector with M elements, then self-attention quantifies a relationship between the M elements. In contrast, cross-attention determines the relative importance of features to each other between two data sets (e.g., an input vector and an output vector). Although transformers generally operate on sequential data composed of ordered elements, transformers do not process the elements of the data sequentially (such as in a recurrent neural network) and require an additional mechanism to capture the order, or relative positions, of data elements in a given sequence. Thus, transformers often use a positional encoder to describe the position of each data element in a sequence, where the positional encoder assigns a unique identifier to each position. A positional encoder may be used to describe a temporal relationship between data elements (i.e., time series) or between iterations of a data set when a data set is processed iteratively (i.e., representations of a data set at different iterations). While concepts such as attention and positional encoding were generally developed in the context of a transformer, they may be readily inserted into—and used with—other types of machine-learned models (e.g., diffusion models).

FIG. 5 depicts a general framework for training and evaluating a machine-learned model. Herein, when training a machine-learned model, the more general term “modeling data” will be adopted as opposed to training data to refer to data used for training, evaluating, and testing a machine-learned model. Further, use of the term modeling data prevents ambiguity when discussing various partitions of modeling data such as a training set, validation set, and test set, described below. In the context of FIG. 5, modeling data will be said to consist of pairs of inputs and associated targets. When a machine-learned model is trained using pairs of inputs and associated targets, that machine-learned model is typically categorized as a “supervised” machine-learned model or a supervised method. In the literature, autoencoders are often categorized as “unsupervised” or “semi-supervised” machine learning models because modeling data used to train these models does not include distinct targets. For example, in the case of autoencoders, the output, and thus the desired target, of an autoencoder is the input. That said, while autoencoders may not be considered supervised models, the training procedure depicted in FIG. 5 may still be applied to train autoencoders where it is understood that an input-target pair is formed by setting the target equal to the input.

Keeping with FIG. 5, in Block 504, modeling data is obtained. As stated, the modeling data may be acquired from historical datasets, be synthetically generated, or may be a combination of real and synthetic data. In Block 506, the modeling data is split into a training set, validation set, and test set. In one or more embodiments, the validation and the test set are the same such that the modeling data is effectively split into a training set and a validation/testing set. In Block 508, given the machine-learned model type (e.g., autoencoder) an architecture (e.g., number of layers, compression ratio, etc.) are selected. In accordance with one or more embodiments, architecture selection is performed by cycling through a set of user-defined architectures for a given model type. In other embodiments, the architecture is selected based on the performance of previously evaluated models with their associated architectures, for example, using a Bayesian-based search. In Block 510, with an architecture selected, the machine-learned model is trained using the training set.

During training, the machine-learned model is adjusted such that the output of the machine-learned model, upon receiving an input, is similar to the associated target (or, in the case of an autoencoder, the input). Once the machine-learned model is trained, in Block 512, the validation set is processed by the trained machine-learned model and its outputs are compared to the associated targets. Thus, the performance of the trained machine-learned model can be evaluated. Block 514 represents a decision. If the trained machine-learned model is found to have suitable performance as evaluated on the validation set, where the criterion for suitable performance is defined by a user, then the trained machine-learned model is accepted for use in a production (or deployed) setting. As such, in Block 518, the trained machine-learned model is used in production. However, before the machine-learned model is used in production a final indication of its performance can be acquired by estimating the generalization error of the trained machine-learned model, as shown in Block 516. The generalization error is estimated by evaluating the performance of the trained machine-learned model, after a suitable model has been found, on the test set. One with ordinary skill in the art will recognize that the training procedure depicted in FIG. 5 is general and that many adaptions can be made without departing from the scope of the present disclosure. For example, common training techniques, such as early stopping, adaptive or scheduled learning rates, and cross-validation may be used during training without departing from the scope of this disclosure.

Turning back to FIG. 3, in one or more embodiments, data mining computations use the historical data to identify patterns and relationships between various factors and the occurrence of equipment failure. This may be done using supervised learning, where the system is trained on a labeled dataset of historical data. Specifically, the dataset includes features or inputs that are relevant to the prediction task, such as sensor readings, maintenance records, and operational parameters. The features may be represented as a vector x=[x₁, x₂, . . . x_n], where n is the number of features.

Further, the dataset also may include labels or outputs that indicate whether the equipment failure occurred or not. The labels may be represented as a binary variable y={0, 1}, where 0 represents no failure and 1 represents failure. The goal of the algorithm is to learn a function f(x) that maps the input features to the output label, such that f(x)=y. This function is learned by minimizing a loss function L (y, f(x)), which measures the difference between the predicted label and the true label. The loss function may be represented as:

$\begin{matrix} L (y, f (x)) = - [y * \log (f (x)) + (1 - y) * \log (1 - f (x))] & (Equation 2) \end{matrix}$

where log is the natural logarithm, and f(x) is the predicted probability of failure given the input features x.

In one or more embodiments, the predicted probability of failure may be represented as:

$\begin{matrix} f (x) = sigmoid (w_{0} + w_{1} x_{1} + w_{2} x_{2} + \dots w_{n} x_{n}) & (Equation 3) \end{matrix}$

where w₀, w₁, w₂, and w_nare the weights of the model that determine the contribution of each input feature to the predicted probability of failure. The sigmoid function maps the predicted probability to a value between 0 and 1, which may be interpreted as the probability of failure.

Further, the weights of the model may be learned using an optimization step such as gradient descent, which minimizes the loss function by adjusting the weights in the direction of the negative gradient of the loss function. Once the model is trained, it may be used to predict the probability of failure for new input features by inputting the weights into the model and computing the sigmoid function. If the predicted probability exceeds a certain threshold, the equipment is predicted to fail, and maintenance can be scheduled accordingly.

Support Vector Machines

In one or more embodiments, the Support Vector Machines (SVM) (320) may be a type of supervised learning method used for classification, regression, and outlier detection. In the case of predicting gas lift equipment failure, the SVM (320) may be used to predict whether an equipment will fail given a set of features such as gas flow rates, pressure, temperature, and information on equipment failures and maintenance activities. Specifically, the SVM (320) may find a best hyperplane that separates classes in feature space. Specifically, the hyperplane is a decision boundary that separates distinct categories in which data points are classified. The distinct categories are separated in n-dimensional space where input data points are represented. The points closest to the hyperplane are called support vectors, and the margin between the support vectors is maximized.

In some embodiments, an input to the SVM (320) may be gas flow rate, pressure, temperature, and maintenance records including a number of days since the last maintenance and a record indicating whether the equipment is healthy or faulty. The SVM model (320) may be train on this dataset to predict the health status of new equipment based on its features. The SVM (320) may learn a decision boundary that separates the healthy and faulty equipment in the feature space.

Further, the SVM (320) regressor may be split into two main components. Firstly, it may transform input data to the feature space, the feature space being of higher dimension than the original input space. The transformation may be done using a kernel function chosen from a family of functions, with many existing kernels and the option to create new ones for specific use-cases. The choice of kernel function may be a hyperparameter of the support vector machine model. Kernel functions have specific mathematical properties, including the “kernel trick” which allows for computing distances between pairs of data points in the feature space without actually transforming them from the original input space. The second component may involve parameterizing a hyperplane in the feature space. The set of weights {w₀, w₁, . . . w_n}defines the hyperplane, the hyperplane being the predicted output of the support vector machine regressor for a given input.

The hyperplane may be expressed as:

$\begin{matrix} y = w_{0} + \sum_{i = 1}^{n} w_{i} x_{i} & (Equation 4) \end{matrix}$

where y denotes the value of the hyperplane and x_idenotes a value on the i-th axis of the feature space with n dimensions.

Additionally, some embodiments may include the weight w₀in the summation. The weight vector w denotes the set of weights, while a data point in the feature space is denoted as a vector x. By incorporating w₀into the weight vector and using vector notation, the prediction for a data point j can be represented as

$\begin{matrix} y_{j} = w^{T} x_{j} & (Equation 5) \end{matrix}$

To train a support vector machine model and find the appropriate weights, an optimization problem is solved using the following approach:

$\begin{matrix} \min \frac{1}{2} { w }^{2} subject to : {❘ y_{j -} w^{T} x_{j} ❘}^{2} \leq ϵ, \forall j in training data, & (Equation 6) \end{matrix}$

Wherein ϵ denotes an error term representing a hyperparameter of the support vector machine model. Further, ϵ may be set by a user.

According to Equation 5, w^Tx_jrepresents the predicted gas flow rate for a given training data point x_j. Further, the constraint |y_j−w^Tx_j|≤ϵ in Equation 6 specifies that the difference between the actual value y_jand the predicted value w^Tx_jmay be less than or equal to a pre-defined error ϵ. However, this approach may be sensitive to outliers in the data, as accommodating the constraint for an outlier data point may require altering the hyperplane, which can have adverse effects. Alternatively, the value of c may need to be increased. To address this issue and improve the predictive power of the support vector machine regressor, Equation 6 may be modified to include slack terms ξ_jand a regularization term λ as follows

$\begin{matrix} \min (\frac{1}{2} { w }^{2} + λ \sum_{j = 1}^{m} ❘ ξ_{j} ❘) & (Equation 7) \end{matrix}$

$subject to : ❘ y_{j -} w^{T} x_{j} ❘ \leq ϵ + ❘ ξ_{j} ❘, \forall j$

Equation 7 refers to a scenario where there are m training data points, indexed by j, in the dataset. Each data point is associated with a slack term ξ_jthat may relax the constraint, allowing it to be met for outlier data points without requiring significant alterations to the hyperplane. However, allowing the slack terms to increase without any bounds would nullify the constraint. To prevent this, the slack terms should be minimized, as shown in the second term, Σ_j=1^m|ξ_j|. This approach introduces a tradeoff between adjusting the hyperplane and minimizing the slack terms. The regularization term λ controls this tradeoff and is considered a hyperparameter of the support vector machine model.

To predict the health status of new equipment, the data of the new equipment may be inputted into the SVM and to determine which side of the decision boundary it falls on. Alternatively, the dataset may be split into training and test sets and the SVM may be trained on the training set. Further, the hyperparameters such as the kernel parameter and regularization parameter may be tuned using cross-validation and then the performance of the SVM may be evaluated on the test set.

Random Forest

Random Forest (330) is an ensemble learning technique that uses decision trees to predict outcomes. In the case of gas lift equipment failure prediction, the Random Forest process may be used to predict the likelihood of failure based on various features such as the gas flow rates, the pressure, the temperature, and the data on equipment failures and maintenance activities.

In the case of predicting gas lift equipment failure, the Random Forest (330) may be used to predict if an equipment will fail given a set of features such as gas flow rates, gas injection pressure, tubing pressure, casing pressure, temperature, and information on equipment failures and equipment failure.

To build the Random Forest model (330), the data may be split into a training set and a testing set. Further, the number of decision trees in the forest is defined, as well as other hyperparameters such as the maximum depth of the trees and the minimum number of samples required to split a node. After defining the hyperparameters, the model may be fit on the training set, which builds the decision trees. During the training process, the non-parametric scheme selects a random subset of features to split each node of the tree, which helps to reduce overfitting. After the model is trained, the performance of the model is evaluated on the testing set. A plurality of metrics may be computed such as accuracy, precision, recall, and F1 score to assess the model's performance.

To make predictions on new data, the data is inputted into the trained Random Forest (330) model. The approach averages the predictions from all the decision trees in the forest to produce the final prediction. The model will output a probability score indicating the likelihood of failure, which may be used to take preventive measures to avoid equipment downtime and reduce operational disruptions.

Neural Networks

The neural networks (340) are a type of algorithmic technique that may be used to predict gas lift equipment failure. The neural network (340) consists of interconnected nodes or “neurons” that are organized into layers and uses a set of mathematical operations to process and transform input data into an output prediction.

In the case of predicting gas lift equipment failure, the neural network (340) may be used to predict if an equipment will fail given a set of parameters such as gas flow rates, pressure, temperature, and information on equipment failures and equipment failure. The data may be divided into training and testing data set and the training data may be used to train the neural network (340).

In one or more embodiments, the training process may involve adjusting the weights and biases of the neurons in the network so that the network can learn the relationships between the input features and the target output (e.g., equipment failure or no failure). This may be done using an enhancement computation such as gradient descent that iteratively adjusts the weights and biases to minimize the error between the predicted output and the actual output.

After the neural network (340) has been trained, the neural network (340) may be used to make predictions on new data. Given a set of input parameters, such as, gas flow rates, pressure, temperature, the neural network (340) will generate a prediction of whether the equipment is likely to fail or not. A simplified formula for the output of a neural network with one hidden layer may be:

$\begin{matrix} y = σ (w_{2} * σ (w_{1} * x + b_{1}) + b_{2}) & (Equation 8) \end{matrix}$

- where x denotes a vector of input parameters, w₁and b₁denote weights and biases for the hidden layer, w₂and b₂denote weights and biases for the output layer, σ denotes sigmoid activation function, and y denotes the predicted output.

In one or more embodiments, during training, the network may adjust the weights and biases to minimize the error between the predicted output and the actual output. After the network has been trained, the network may be used to make predictions on new data by feeding the input features into the network and generating a prediction of whether the equipment is likely to fail or not.

Ensemble Model

In Block 203, the ensemble model (350) may be trained by trained by combining the predictions of the individual models using a weighted average. The weights assigned to each model can be determined by their respective performances on the validation or test set. The formula for the weighted average is as follows:

$\begin{matrix} Ensemble prediction = w_{1} * p_{SVM} + w_{2} * p_{R F} + w_{3} * p_{NN} & (Equation 9) \end{matrix}$

Where w₁, w₂, and w₃denote weights assigned to SVM, RF, and NN, respectively. Further, values of the weights w₁, w₂, and w₃sum to 1.

In one or more embodiments, the user determines a threshold, and the ensemble prediction is continuously compared to the threshold. The ensemble predictions that are equal or higher than the threshold raise a flag that the equipment is likely to fail. Alternatively, the ensemble predictions that are lower than the threshold do not raise a flag, and the process continues to operate without interruptions. Further, the ensemble predictions may include, at least, probability of failure (351), time to failure (352), and maintenance recommendations (353).

Further, the gas lift equipment may fail in intricate ways that may not be readily captured by the equipment failure simulator (112). For instance, there may be multiple failure modes that interact with each other, or there may be hidden factors that contribute to failure that is not captured by the available data. Additionally, the automated machine learning models may not be able to forecast failure modes that were not encountered in the past data. This can result in unforeseen equipment failures that were not considered in the maintenance planning. As such, it is critical to regularly supervise and improve the cognitive model to guarantee its precision and durability. The improvement may entail gathering more data, enhancing the feature selection method, and adjusting the failure prediction threshold based on input from maintenance staff. Furthermore, it is important to integrate specialized knowledge into the model development and interpretation to ensure that the outcomes are significant and practical.

In Block 204, a maintenance operation is carried out, when the ensemble prediction raises a flag. Specifically, the maintenance operation may include, at least, refurbishing gas lift equipment components and replacing damaged or worn-out motor components. In another example, the maintenance operation may include an electronic signal sent to an automated maintenance system for procuring and delivering gas lift equipment components to a system site for performing a maintenance operation of replacing or refurbishing the gas lift equipment components.

Specifically, a variety of maintenance procedures involve specific, tangible actions that are carried out on the gas lift equipment, that could follow the ensemble prediction. A preventive maintenance may involve performing routine checks and inspections of the gas lift equipment to identify any potential issues before they lead to equipment failure. This may include checking the integrity of the gas lift valves, inspecting the condition of the mandrels, and assessing the performance of the compressors.

Additionally, the ensemble prediction may also trigger predictive maintenance procedures, such as using the ensemble prediction to forecast when specific components of the gas lift equipment are likely to fail and scheduling maintenance activities accordingly. This can help to minimize downtime and optimize the overall efficiency of the oil production process. Further, the ensemble prediction may lead to corrective maintenance operation. If the ensemble prediction indicates a high likelihood of equipment failure, corrective actions such as repairing or replacing faulty components could be taken immediately to prevent the predicted failure.

The ensemble prediction may also trigger a condition-based maintenance, where maintenance tasks are only performed when certain indicators show signs of decreasing performance or upcoming failure. This involves monitoring the real-time condition of the gas lift equipment and performing maintenance activities based on the current state of the equipment. The condition of the equipment could be assessed using various sensor readings such as gas flow rate, pressure, and temperature. Further, the ensemble prediction may be used to implement a reliability-centered maintenance program involving identifying the components of the gas lift equipment that are most critical to the overall performance of the system and focusing the maintenance efforts on these components. The ensemble prediction could be used to identify the components that are most likely to fail and prioritize them for maintenance.

In some embodiments, the ensemble prediction could be used to implement a maintenance operation. If the ensemble prediction indicates a potential equipment failure, a remote-control system could be used to adjust the operating parameters of the gas lift equipment to prevent the failure. This could include adjusting the gas flow rate, pressure, or temperature to maintain the optimal operating conditions for the equipment.

Illustrative Example

Using the information detailed above in FIGS. 1-8, the following is an example of using an ensemble model to predict gas lift equipment failure using hypothetical data. Suppose the dataset for the gas lift equipment has the following features: gas flow rate (m3/day), gas pressure (bar), and gas temperature (° C.). The target variable is binary, where 0 represents no equipment failure and 1 represents equipment failure. SVM, RF, and NN models are trained on this dataset to obtain the following performance metrics on the test set:

- SVM: accuracy=0.85, precision=0.82, recall=0.75
- RF: accuracy=0.87, precision=0.85, recall=0.77
- NN: accuracy=0.89, precision=0.88, recall=0.81

Weights are assigned to each model based on their respective performances. For example, weights of 0.3, 0.4, and 0.3 are assigned to SVM, RF, and NN, respectively, since NN has the best performance, followed by RF and SVM. In one or more embodiments, the weights for different machine learning models (e.g., SVM and NN) may the same even though one model may perform better because the ensemble models often benefit from the diversity of the individual models. Even if one model performs better on its own, the model may be capturing similar aspects of the data as another model. In such cases, giving more weight to the better-performing model may not significantly improve the ensemble's performance. Instead, it may be more beneficial to give equal weight to different models that capture different aspects of the data, even if some of them do not perform as well individually.

Further, the weights assigned to each model in an ensemble may not necessarily be proportional to their individual performances. The weights may be determined based on a variety of factors, including the diversity of the models, the correlation of their errors, and their ability to capture different aspects of the data. In one or more embodiments, even though the Neural Network (NN) model may have the best individual performance, the Support Vector Machine (SVM) model may be capturing some aspects of the data that the NN model is missing. Therefore, all models may be given equal weights to ensure that these aspects are not overlooked in the ensemble prediction.

In some embodiments, the weight for the Random Forest (RF) model of 0.4 may be higher than the others because it may be providing a good balance between bias and variance, which is a key aspect of model performance. The weights may be determined based on a variety of factors, including the diversity of the models, the correlation of their errors, and their ability to capture different aspects of the data. In some embodiments, the RF model may be capturing some unique aspects of the data that are not captured by the other models, so it is given a higher weight.

For example, even though the NN model may have the highest accuracy, precision, and recall, it might be overfitting the training data, which could lead to poor generalization to new data. The RF model, on the other hand, may be better at generalizing to new data due to its inherent ability to control overfitting. Therefore, it is given a higher weight in the ensemble. In one or more embodiments, the weights may be assigned based on the performance of the models. In the given example, the weights of 0.3, 0.4, and 0.3 are assigned to SVM, RF, and NN, respectively.

Then, predictions can be made for a new set of features (gas flow rate=9450 bbls/day, gas pressure=725 psi, gas temperature=30° C.) using the ensemble model as follows:

$Ensemble prediction = 0.3 * 0 + 0.4 * 1 + 0.3 * 1 = 0.7$

The weights are multiplied by the predictions of the respective models (represented by p_SVM, p_RF, and p_NN in the Equation 9) to obtain the ensemble prediction. The ensemble model predicts a probability of 0.7 that the equipment will fail. We can then use a threshold to convert the probability into a binary prediction. For example, we can set a threshold of 0.5, so the final prediction will be 1 (equipment failure) since the probability is greater than the threshold.

The data-driven approach of using historical data and frameworks disclosed herein rather than traditional methods can accurately predict when gas lift equipment is likely to fail, allowing for proactive maintenance and minimizing downtime. This approach is a significant technological advancement in the oil and gas industry, as equipment failure can result in decreased production and higher maintenance costs. The method provides a solution to this problem by addressing the issue proactively and using data to guide maintenance efforts. With this proactive approach, the method helps maintenance teams schedule repairs and replacements more effectively, reducing the risk of unexpected downtime and increasing the overall efficiency of oil production. The approach involves training supervised models on labeled datasets that include sensor readings, maintenance records, and operational parameters. The models learn to identify patterns and relationships between input features and the occurrence of equipment failure, enabling them to predict when equipment failure is likely to occur. Real-time sensor data from the equipment is then analyzed using the trained models to make predictions about when failure is likely to occur. This approach allows maintenance teams to schedule repairs and replacements more effectively, minimizing unexpected downtime and optimizing oil production efficiency. Compared to traditional maintenance approaches, this data-driven approach is more efficient and cost-effective.

In one or more embodiments, to evaluate the performance of the predictive analytic models, a confusion matrix could be used. This matrix compares the predicted failure outcomes to the actual failure outcomes, allowing for the calculation of metrics such as precision, recall, and F1-score. The use of historical sensor data, maintenance records, failure data, and evaluation metrics such as confusion matrices can help explain the idea of using classification techniques to predict gas lift equipment failure and provide insights into the accuracy and reliability of the method.

An example of a model architecture is shown in FIG. 6. Specifically, the model architecture represents the process of using three different machine learning models to predict gas lift equipment failure based on various input features. Obtaining the data (600) is the starting point of the process. The data (600) may consist of various features such as Gas Flow Rate (601), Gas Injection Pressure (602), Tubing Pressure (603), Casing Pressure (604), Temperature (605), and Time since Last Maintenance (606). The features are collected from the equipment and are used as input for the machine learning models.

Further, a plurality of different machine learning models (e.g., RF, NN, and SVM) may be used to predict equipment failure. Each model takes the same input data but processes it in a different way due to their unique architectures. For example, the Random Forest model (607) creates multiple decision trees and makes a prediction based on the majority vote of the trees. Further, the Neural Network model (608) consists of interconnected layers of nodes or “neurons” and makes a prediction based on the weighted sum of the inputs and the activation function. Additionally, the Support Vector Machine model (609) finds a hyperplane in a high-dimensional space that distinctly classifies the data points.

The output of the process is a prediction of equipment failure (610). Specifically, each model predicts the time until equipment failure based on the input features. The predictions can then be used to schedule maintenance and prevent equipment failure. The diagram visually represents this process, showing how the data flows through each model to produce a prediction. The diagram is a high-level view of the system, providing an overview of how the models are used to predict equipment failure.

The diagram representing the architecture of three different machine learning models is shown in FIG. 7. The machine learning models are used to predict the time until failure of gas lift equipment based on input data (700). The SVM model (710) may find a hyperplane (712) that best separates the data (700) based on the input features. The Support Vectors (711) are the data points (700) that are closest to the hyperplane (712) and that define the hyperplane (712).

Further, the Neural Network model (720) consists of an Input Layer (721), Hidden Layer (722), and Output Layer (723). The input data (700) is passed through the Input Layer (721), transformed in the Hidden Layer (722), and a prediction is made in the Output Layer (723). The exact number of neurons in each layer and the activation functions used can vary depending on the problem and the data.

Additionally, the Random Forest model (730) includes multiple decision trees (731-733). Each decision tree is trained on a random subset of the input data and makes a prediction based on the input features. The final prediction of the Random Forest model is the average or majority vote of the predictions made by all the decision trees in the forest.

The diagram also includes a feedback loop from the Output node (740) back to the Input node (700). This represents the continuous nature of the system where after an output (e.g., predicted time until failure) is produced, the system continues to receive new input data and produce new output.

Embodiments disclosed herein may be implemented on any suitable computing device, such as the computer system shown in FIG. 8. Specifically, FIG. 8 is a block diagram of a computer system (800) used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures as described in the instant disclosure, according to an implementation. The illustrated computer (800) is intended to encompass any computing device such as a high performance computing (HPC) device, a server, desktop computer, laptop/notebook computer, wireless data port, smart phone, personal data assistant (PDA), tablet computing device, one or more processors within these devices, or any other suitable processing device, including both physical or virtual instances (or both) of the computing device. Additionally, the computer (800) may include a computer that includes an input device, such as a keypad, keyboard, touch screen, or other device that can accept user information, and an output device that conveys information associated with the operation of the computer (800), including digital data, visual, or audio information (or a combination of information), or a GUI.

The computer (800) can serve in a role as a client, network component, a server, a database or other persistency, or any other component (or a combination of roles) of a computer system for performing the subject matter described in the instant disclosure. The illustrated computer (800) is communicably coupled with a network (810). In some implementations, one or more components of the computer (800) may be configured to operate within environments, including cloud-computing-based, local, global, or other environment (or a combination of environments).

At a high level, the computer (800) is an electronic computing device operable to receive, transmit, process, store, or manage data and information associated with the described subject matter. According to some implementations, the computer (800) may also include or be communicably coupled with an application server, e-mail server, web server, caching server, streaming data server, business intelligence (BI) server, or other server (or a combination of servers).

The computer (800) can receive requests over network (810) from a client application (for example, executing on another computer (800) and responding to the received requests by processing the said requests in an appropriate software application. In addition, requests may also be sent to the computer (800) from internal users (for example, from a command console or by other appropriate access method), external or third-parties, other automated applications, as well as any other appropriate entities, individuals, systems, or computers.

Each of the components of the computer (800) can communicate using a system bus (870). In some implementations, any or all of the components of the computer (800), both hardware or software (or a combination of hardware and software), may interface with each other or the interface (820) (or a combination of both) over the system bus (870) using an application programming interface (API) (850) or a service layer (860) (or a combination of the API (850) and service layer (860). The API (850) may include specifications for routines, data structures, and object classes. The API (850) may be either computer-language independent or dependent and refer to a complete interface, a single function, or even a set of APIs. The service layer (860) provides software services to the computer (800) or other components (whether or not illustrated) that are communicably coupled to the computer (800). The functionality of the computer (800) may be accessible for all service consumers using this service layer. Software services, such as those provided by the service layer (860), provide reusable, defined business functionalities through a defined interface. For example, the interface may be software written in JAVA, C++, or other suitable language providing data in extensible markup language (XML) format or other suitable format. While illustrated as an integrated component of the computer (800), alternative implementations may illustrate the API (850) or the service layer (860) as stand-alone components in relation to other components of the computer (800) or other components (whether or not illustrated) that are communicably coupled to the computer (800). Moreover, any or all parts of the API (850) or the service layer (860) may be implemented as child or sub-modules of another software module, enterprise application, or hardware module without departing from the scope of this disclosure.

The computer (800) includes an interface (820). Although illustrated as a single interface (820) in FIG. 8, two or more interfaces (820) may be used according to particular needs, desires, or particular implementations of the computer (800). The interface (820) is used by the computer (800) for communicating with other systems in a distributed environment that are connected to the network (810). Generally, the interface (820 includes logic encoded in software or hardware (or a combination of software and hardware) and operable to communicate with the network (810). More specifically, the interface (820) may include software supporting one or more communication protocols associated with communications such that the network (810) or interface's hardware is operable to communicate physical signals within and outside of the illustrated computer (800).

The computer (800) includes at least one computer processor (830). Although illustrated as a single computer processor (830) in FIG. 8, two or more processors may be used according to particular needs, desires, or particular implementations of the computer (800). Generally, the computer processor (830) executes instructions and manipulates data to perform the operations of the computer (800) and any algorithms, methods, functions, processes, flows, and procedures as described in the instant disclosure.

The computer (800) also includes a memory (880) that holds data for the computer (800) or other components (or a combination of both) that can be connected to the network (810). For example, memory (880) can be a database storing data consistent with this disclosure. Although illustrated as a single memory (880) in FIG. 8, two or more memories may be used according to particular needs, desires, or particular implementations of the computer (800) and the described functionality. While memory (880) is illustrated as an integral component of the computer (800), in alternative implementations, memory (880) can be external to the computer (800).

The application (840) is an algorithmic software engine providing functionality according to particular needs, desires, or particular implementations of the computer (800), particularly with respect to functionality described in this disclosure. For example, application (840) can serve as one or more components, modules, applications, etc. Further, although illustrated as a single application (840), the application (840) may be implemented as multiple applications (840) on the computer (800). In addition, although illustrated as integral to the computer (800), in alternative implementations, the application (840) can be external to the computer (800).

There may be any number of computers (800) associated with, or external to, a computer system containing computer (800), each computer (800) communicating over network (810). Further, the term “client,” “user,” and other appropriate terminology may be used interchangeably as appropriate without departing from the scope of this disclosure. Moreover, this disclosure contemplates that many users may use one computer (800), or that one user may use multiple computers (800).

In some embodiments, the computer (800) is implemented as part of a cloud computing system. For example, a cloud computing system may include one or more remote servers along with various other cloud components, such as cloud storage units and edge servers. In particular, a cloud computing system may perform one or more computing operations without direct active management by a user device or local computer system. As such, a cloud computing system may have different functions distributed over multiple locations from a central server, which may be performed using one or more Internet connections. More specifically, cloud computing system may operate according to one or more service models, such as infrastructure as a service (IaaS), platform as a service (PaaS), software as a service (SaaS), mobile “backend” as a service (MBaaS), serverless computing, artificial intelligence (AI) as a service (AIaaS), and/or function as a service (FaaS).

Although only a few example embodiments have been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the example embodiments without materially departing from this invention. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the following claims.

PREDICTING GAS LIFT EQUIPMENT FAILURE WITH DEEP LEARNING TECHNIQUES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims