The disclosure generally relates to the field of data correction and to predictive modeling for drilling operations.
Different types of predictive models involve both statistical tools such as machine learning and domain-level rules that can be applied to predict outcomes. Machine learning predictive models include clustering, random forests, regression models, support vector machines, neural networks, etc. Rules-based predictive models vary significantly based on the domain to which they are applied and often incorporate expert knowledge for known outcomes in the respective domain.
Embodiments of the disclosure may be better understood by referencing the accompanying drawings.
The description that follows includes example systems, methods, techniques, and program flows that embody embodiments of the disclosure. However, it is understood that this disclosure may be practiced without these specific details. For instance, this disclosure refers to correction of drilling data using a machine learning drilling model and rules-based drilling model in illustrative examples. Embodiments of this disclosure can be instead applied to correcting any type of data using a machine learning model and a rules-based model. In other instances, well-known instruction instances, protocols, structures and techniques have not been shown in detail in order not to obfuscate the description.
Overview
A drilling data quality engine disclosed herein corrects high-importance drilling data using a combination of machine learning drilling models and rules-based drilling models. The machine learning drilling models are trained on impactful features of drilling data across a variety of sources and domains. The rules-based drilling models are determined by domain-level experts using knowledge of drilling operations. A data importance analyzer identifies high-importance drilling data including data segments that are known to have a high impact on drilling operations. The high-importance drilling data is input into the machine learning drilling models and rules-based models. A drilling data prediction analyzer determines which of the model outputs to use based on confidence values in the outputs of the machine learning drilling models and user input. The final predictions are evaluated for quality control and used to improve existing drilling data.
Example Illustrations
The data importance analyzer 101 sends the high-importance drilling data 102 to a drilling model database 140 that retrieves a machine learning drilling model 103 and a rules-based drilling model 105. The high-importance drilling data 102 can instead be a query indicating the high-importance data segments and the drilling model database 140 can be indexed by data segment type/location to retrieve the corresponding models. The machine learning drilling model 103 generates machine learning drilling data predictions 104 and machine learning drilling data confidence values 106 using features of the incoming drilling data 100 as well as auxiliary supplemental drilling data as input. The rules-based drilling model 105 generates rules-based drilling data predictions 110 also using the incoming drilling data 100 and auxiliary drilling data. The machine learning drilling data predictions 104 and rules-based drilling data predictions 110 comprise predictions to verify the quality of the high-importance drilling data 102. The machine learning drilling data confidence values 106 comprise confidence values indicating likelihoods that each of the machine learning drilling data predictions 104 are correct. The machine learning drilling model 103 and/or rules-based drilling model 105 can be trained or configured to make predictions specifically for a high-importance data segment identified by the data importance analyzer 101. The high-importance drilling data 102 can have flaws in data entries within one or more high-importance data segments that a drilling data quality analyzer (not shown) can detect and send incoming drilling data 100 to the data importance analyzer 101 in response to detecting flawed data entries.
The drilling data prediction analyzer 107 receives the machine learning drilling data predictions 104, the machine learning drilling data confidence values 106, and the rules-based drilling data predictions 110. The drilling data prediction analyzer 107 then applies a series of criteria to determine which of the machine learning drilling data predictions 104 to use and which of the rules-based drilling data predictions to use. As depicted in
A user operating the computing device 109 evaluates the low confidence drilling data predictions 112 to determine the user-determined drilling data prediction 116. For instance, a user can be presented, via a user interface running on the computing device 109, machine learning drilling data predictions 104 and rules-based drilling data predictions 110 in the low confidence drilling data predictions 112 along with corresponding confidence values in the machine learning drilling data confidence values 106. The user chooses between the machine learning drilling data predictions 104 and rules-based drilling data predictions 110 using the given confidence values and known domain-level knowledge. For instance, the user can know thresholds or shapes of petrophysical property values downhole present in a machine learning prediction not present in a rules-based prediction, and can choose to add the machine learning prediction to the user-determined drilling data predictions 116. In some instances, the user can determine to use none of the low confidence drilling data predictions 112 and instead maintain the high-importance drilling data 102 in memory. The computing device 109 communicates the user-determined drilling data predictions 116 to the master drilling database 120 and the master drilling database replaces the corresponding values of the high-importance drilling data 102 in memory. Subsequently, the improved drilling data in the master drilling database 120 can be used to improve a machine learning drilling model in the drilling model database 140.
The computing device 109, based on the high confidence drilling data predictions 118 and the user-determined drilling data predictions 116, generates a quality control report such as the example quality control report 114. The example quality control report 114 is provided to a user interface by the computing device 109 and comprises the following table of values:
The table describes three drilling data fixes with attributes fix description, issue #1, issue #2, and confidence of fix. The first fix corrected issues 7 and 2 with high confidence of 85%. The second fix corrected issues 12 and 6 with medium confidence of 40%. The third fix corrected issue 9 with confidence of 6%. The confidence value in the example quality control report 114 can be a confidence value in the machine learning drilling data confidence values 106 corresponding to the respective fixes converted into a percentage. When the fix is rules-based (as determined by a user) the user can estimate the confidence percentage value based on the user's confidence in the rules-based prediction.
The drilling feature generator 201 determines drilling feature data 208 from the aggregated drilling data 204 and client drilling data 206. The drilling feature generator 201 can use standard statistical methods for feature selection such as information-theory based or correlation-based feature selection to determine features used to generate the drilling feature data 208. For instance, the drilling feature generator 201 can use Pearson correlation coefficients between candidate features and corresponding classifications for drilling data (e.g., correct or incorrect) to identify features that are correlated with correct or incorrect drilling data. In some embodiments, a domain-level expert can identify features such as curve units that are known to be of high importance or to have an effect or whether drilling data is correct. Once sufficiently many features are determined, for instance based on a user-specified input for a number of features or corresponding to complexity of a desired machine-learning model, the drilling feature generator 201 generates the drilling feature data 208 by applying the features to the aggregated drilling data 204 and client drilling data 206. For instance, the drilling feature generator 201 can extract curve units that are known to have high importance for drilling data being correct from petrophysical property data downhole in the aggregated drilling data 204 and client drilling data 206.
The drilling feature generator 201 sends the drilling feature data 208, after segregating into training and testing data, to an initialized machine learning drilling model 203 that was initialized by a machine learning drilling model trainer 205. The initialized machine learning drilling model 203 can have an architecture specified by a user or hard coded based on factors such as the complexity of the drilling data to be corrected. For instance, when the initialized machine learning drilling model 203 is a neural network, a user can specify the number of internal layers, type of internal layers, number of nodes in each layer, etc. for a neural network. The initialized machine learning drilling model 203 uses training data in the drilling feature data 208 to generate drilling data predictions 210. The machine learning drilling model trainer 205 compares the drilling data predictions 210 to correct drilling data in the aggregated drilling data 204 and the client drilling data 206 and, based on the difference, communicates updated model parameters 212 to the initialized machine learning drilling model 203. The machine learning drilling model trainer 205 continues to update parameters of the initialized machine learning drilling model 203 until the drilling data predictions 210 are sufficiently close to the corresponding correct drilling data or other training criteria are satisfied. For instance, the machine learning drilling model trainer 205 can input testing data in the drilling feature data 208 into the initialized machine learning drilling model 203 to determine generalization error, which can be required to be sufficiently low. Once the training criteria are satisfied, the machine learning drilling model trainer 205 stores a trained machine learning drilling model 207 in a machine learning drilling model repository 214.
The drilling data predictions 210 can correspond to a high-importance drilling data segment corresponding to a drilling attribute (e.g., a curve unit for a petrophysical property). Thus, the trained machine learning drilling model 207 can be trained to make predictions specifically for that data segment. The machine learning drilling model repository 214 can contain machine learning drilling models for all known high-importance data segments corresponding to a drilling operation or set of drilling operations. The trained machine learning drilling model 207 can be trained in response to the detection of a new high-importance data segment.
The example operations in
At block 303, the drilling data correction system inputs drilling data into a machine learning drilling model. The machine learning drilling model can be trained to predict a high-importance data segment in the drilling data. Any predictive machine learning model such as k-means clustering, neural networks, support vector machines, etc. can be used. The drilling data correction system can preprocess the drilling data to extract meaningful features that have a high correlation with predicting a data segment (e.g., using Pearson correlation coefficients).
At block 305, the drilling data correction system inputs drilling data into a rules-based drilling model. The rules-based drilling model can comprise threshold values for petrophysical properties downhole or at the surface, inventory values, task-based descriptions, etc. For instance, the rules-based drilling model can threshold a heat flow value known to be too high for a specific drilling operation into a range of reasonable heat flow values as determined by an expert. The rules-based drilling model can additionally correct task-based drilling data by, for instance, replacing a task description with a task description in a list of known drilling task description for drilling operations.
At block 307, the drilling data correction system applies drilling data prediction criteria to outputs of the machine learning and rules-based drilling models to generate corrected high-importance drilling data. The operations at block 307 are described in greater detail in
At block 309, the drilling data correction system replaces high-importance drilling data with corrected high-importance drilling data. The drilling data correction system can use the corrected high-importance drilling data to improve machine learning drilling models and/or rules-based drilling models by, for instance, retraining models with the corrected high-importance drilling data.
At block 311, the drilling data correction system generates a quality control report for the corrected high-importance drilling data. The quality control report can comprise indications of drilling data entries that were corrected, the corrections, whether the machine learning drilling model outputs or rules-based drilling model outputs were used, whether a user decided which drilling model to use, confidence values for the corrections (e.g., in the machine learning drilling model outputs or indicated by a user), etc. The quality control report can be presented to a user and can comprise further analytics such as severity of corrections, frequency of corrections, importance of corresponding data segments, etc.
At block 403, the drilling data correction system uses the machine learning drilling model predictions as high-importance drilling data. The drilling data correction system can automatically replace the high-importance drilling data with the machine learning drilling model predictions in memory without consulting a user because the confidence of the predictions is above the high-confidence threshold.
At block 405, the drilling data correction system determines whether the machine learning drilling data predictions are above a medium confidence threshold. The medium confidence threshold can be a confidence value below the high confidence threshold (e.g., 0.6) and can also be tuned by a user depending on many factors such as importance of a corresponding data segment, desired confidence level in drilling data corrections, importance of a corresponding drilling operation, etc. If the machine drilling data predictions are above the medium confidence threshold, operations proceed to block 407. Otherwise, operations skip to block 409.
At block 407, the drilling data correction system presents a user with machine learning drilling data predictions, confidence values, and rules-based drilling data predictions to determine corrected high-importance data. The user can determine which predictions to use as the corrected high-importance data based on a variety of factors including the confidence of the machine learning drilling data predictions, expert domain-level knowledge of the rules-based predictions, etc. For instance, a user can determine that a task description from a machine learning drilling data prediction is incorrect and can instead choose a rules-based task description. Conversely, a user can determine that a curve unit for heat flow in the machine learning drilling data predictions has a more accurate shape than curve units in the rules-based drilling data predictions.
At block 409, the drilling data correction system presents a user with the rules-based drilling data predictions to determine corrected high-importance drilling data. The drilling data correction system cannot present the user with machine learning drilling data predictions with confidence values that are too low (e.g., a threshold determined by the user). Conversely, the drilling data correction system can present the user with low confidence machine learning drilling data predictions along with indications warning the user that the predictions are low confidence. The user can make a determination of which predictions to use as the high-importance drilling data using any of the aforementioned factors.
The flowcharts are provided to aid in understanding the illustrations and are not to be used to limit scope of the claims. The flowcharts depict example operations that can vary within the scope of the claims. Additional operations may be performed; fewer operations may be performed; the operations may be performed in parallel; and the operations may be performed in a different order. For example, the operations depicted in blocks 303 and 305 can be performed in parallel or concurrently. With respect to
As will be appreciated, aspects of the disclosure may be embodied as a system, method or program code/instructions stored in one or more machine-readable media. Accordingly, aspects may take the form of hardware, software (including firmware, resident software, micro-code, etc.), or a combination of software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” The functionality presented as individual modules/units in the example illustrations can be organized differently in accordance with any one of platform (operating system and/or hardware), application ecosystem, interfaces, programmer preferences, programming language, administrator preferences, etc.
Any combination of one or more machine-readable medium(s) may be utilized. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable storage medium may be, for example, but not limited to, a system, apparatus, or device, that employs any one of or combination of electronic, magnetic, optical, electromagnetic, infrared, or semiconductor technology to store program code. More specific examples (a non-exhaustive list) of the machine-readable storage medium would include the following: a portable computer diskette, a hard disk, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a machine-readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. A machine-readable storage medium is not a machine-readable signal medium.
A machine-readable signal medium may include a propagated data signal with machine-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A machine-readable signal medium may be any machine-readable medium that is not a machine-readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a machine-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The program code/instructions may also be stored in a machine-readable medium that can direct a machine to function in a particular manner, such that the instructions stored in the machine-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
During drilling operations, the drill string 608 (perhaps including the Kelly 616, the drill pipe 618, and the bottom hole assembly 620) can be rotated by the rotary table 610. In addition to, or alternatively, the bottom hole assembly 620 can also be rotated by a motor (e.g., a mud motor) that is located down hole. The drill collars 622 can be used to add weight to the drill bit 626. The drill collars 622 may also operate to stiffen the bottom hole assembly 620, allowing the bottom hole assembly 620 to transfer the added weight to the drill bit 626, and in turn, to assist the drill bit 626 in penetrating the surface 604 and subsurface formations 614.
During drilling operations, a mud pump 632 can pump drilling fluid (sometimes known by those of ordinary skill in the art as “drilling mud”) from a mud pit 634 through a hose 636 into the drill pipe 618 and down to the drill bit 626. The drilling fluid can flow out from the drill bit 626 and be returned to the surface 604 through an annular area 640 between the drill pipe 618 and the sides of the borehole 612. The drilling fluid can then be returned to the mud pit 634, where such fluid is filtered. A computing device 600 can monitor the drilling fluid as it flows through the hose 636. The computing device 600 can be in communication with an operator and the operator can logs tasks performed by the system 664. A drilling data correction system running on the computing device 600 can identify high-importance data segments in drilling data logged by the computing device 600 and can use a combination of a machine learning drilling model and rules-based drilling model to generate candidate corrections for the high-importance data segments as described variously herein. In some embodiments, the drilling fluid can be used to cool the drill bit 626, as well as to provide lubrication for the drill bit 626 during drilling operations. Additionally, the drilling fluid can be used to remove subsurface formation 614 cuttings created by operating the drill bit 626. It is the images of these cuttings that many embodiments operate to acquire and process.
In certain embodiments, the control unit 734 can be positioned at the surface, in the borehole (e.g., in the conveyance 715 and/or as part of the logging tool 726) or both (e.g., a portion of the processing can occur downhole and a portion can occur at the surface). The control unit 734 can include a control system or a control algorithm. In certain embodiments, a control system, an algorithm, or a set of machine-readable instructions can cause the control unit 734 to generate and provide an input signal to one or more elements of the logging tool 726, such as the sensors along the logging tool 726. The input signal can cause the sensors to be active or to output signals indicative of sensed properties. The logging facility 744 (shown in
The logging tool 726 includes a mandrel and a number of extendible arms coupled to the mandrel. One or more pads are coupled to each of the extendible arms. Each of the pads have a surface facing radially outward from the mandrel. Additionally, at least sensor disposed on the surface of each pad. During operation, the extendible arms are extended outwards to a wall of the borehole to extend the surface of the pads outward against the wall of the borehole. The sensors of the pads of each extendible arm can detect image data to create captured images of the formation surrounding the borehole.
At block 803, the drilling data correction system inputs features of the drilling data into a trained machine learning model to generate a first prediction for the data segment of the first drilling data attribute. The features can comprise any number of drilling data or drilling metadata features that can be extracted prior to drilling data correction. These features can be determined empirically by measuring Pearson coefficients for correlations between inputting features into a trained machine learning model and generating corrected drilling data.
At block 805, the drilling data correction system applies one or more drilling data rules to the drilling data to generate a second prediction for the data segment of the first drilling data attribute. For instance, the rule can threshold petrophysical property values to be within a range of known operational petrophysical property values. Other conditions for different types of drilling data can be used, and the rules can be determined by a domain-level expert at a drilling operation corresponding to the drilling data.
At block 807, the drilling data correction system indicates a set of one or more corrections for the data segment of the first drilling data attribute based, at least in part, on the first prediction, the second prediction, and a confidence value for the first prediction. If the confidence value for the first prediction is sufficiently high, the drilling data correction system can correct the data segment of the first drilling data attribute as the first prediction. Otherwise, the drilling data correction system can present a combination of the first prediction and the second prediction to a user along with the confidence value for the first prediction to determine the correction.
While the aspects of the disclosure are described with reference to various implementations and exploitations, it will be understood that these aspects are illustrative and that the scope of the claims is not limited to them. In general, techniques for correcting high-importance drilling data segments using machine learning drilling models and rules-based drilling models as described herein may be implemented with facilities consistent with any hardware system or hardware systems. Many variations, modifications, additions, and improvements are possible.
Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the disclosure. In general, structures and functionality presented as separate components in the example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure.
Use of the phrase “at least one of” preceding a list with the conjunction “and” should not be treated as an exclusive list and should not be construed as a list of categories with one item from each category, unless specifically stated otherwise. A clause that recites “at least one of A, B, and C” can be infringed with only one of the listed items, multiple of the listed items, and one or more of the items in the list and another item not listed.
Embodiment 1: A method comprising identifying a first subset of drilling data having flawed drilling data entries, wherein the first subset of drilling data corresponds to a data segment of a first drilling data attribute, inputting features of the drilling data into a trained machine learning model to generate a first prediction for the data segment of the first drilling data attribute, applying one or more drilling rules to the drilling data to generate a second prediction for the data segment of the first drilling data attribute, and indicating a set of one or more corrections for the data segment of the first drilling data attribute based, at least in part, on the first prediction, the second prediction and a confidence value for the first prediction.
Embodiment 2: The method of Embodiment 1 further comprising determining that the confidence value for the first prediction satisfies a confidence threshold and correcting flawed drilling data entries in the first subset of drilling data with the first prediction.
Embodiment 3: The method of any of Embodiments 1-2 further comprising determining that the confidence value for the first prediction does not satisfy a confidence threshold, determining that the second prediction satisfies a data quality criterion, and correcting flawed drilling data entries in the first subset of drilling data with the second prediction.
Embodiment 4: The method of any of Embodiments 1-3 further comprising generating drilling feature data based, at least in part, on a first plurality of features of a second subset of drilling data and generating the trained machine learning model to predict the data segment of the first drilling data attribute based, at least in part, on the drilling feature data.
Embodiment 5: The method of any of Embodiments 1-4, further comprising identifying the data segment of the first drilling data attribute based, at least in part, on flaws in the first subset of drilling data.
Embodiment 6: The method of any of Embodiments 1-5, wherein the data segment of the first drilling data attribute comprises a curve of petrophysical property values.
Embodiment 7: The method of any of Embodiments 1-6, further comprising updating the first subset of drilling data with at least a correction of the set of one or more corrections for the data segment of the first drilling data attribute.
Embodiment 8: The method of Embodiment 7, further comprising retraining the trained machine learning model using at least the updated first subset of drilling data.
Embodiment 9: One or more non-transitory machine-readable media comprising program to identify a first subset of drilling data having flawed drilling data entries, wherein the first subset of drilling data corresponds to a data segment of a first drilling data attribute, input features of the drilling data into a trained machine learning model to generate a first prediction for the data segment of the first drilling data attribute, apply one or more drilling rules to the drilling data to generate a second prediction for the data segment of the first drilling data attribute, and indicate a set of one or more corrections for the data segment of the first drilling data attribute based, at least in part, on the first prediction, the second prediction and a confidence value for the first prediction.
Embodiment 10: The non-transitory machine-readable media of Embodiment 9 further comprising program code to determine that the confidence value for the first prediction satisfies a confidence threshold and correct flawed drilling data entries in the first subset of drilling data with the first prediction.
Embodiment 11: The non-transitory machine-readable media of any of Embodiments 9-10 further comprising program code to determine that the confidence value for the first prediction does not satisfy a confidence threshold, determine that the second prediction satisfies a data quality criterion, and correct flawed drilling data entries in the first subset of drilling data with the second prediction.
Embodiment 12: The non-transitory machine-readable media of any of Embodiments 9-11 further comprising program code to generate drilling feature data based, at least in part, on a first plurality of features of a second subset of drilling data and generate the trained machine learning model to predict the data segment of the first drilling data attribute based, at least in part, on the drilling feature data.
Embodiment 13: The non-transitory machine-readable media of any of Embodiments 9-12, further comprising program code to identify the data segment of the first drilling data attribute based, at least in part, on flaws in the first subset of drilling data.
Embodiment 14: The non-transitory machine-readable media of any of Embodiments 9-13, wherein the data segment of the first drilling data attribute comprises a curve of petrophysical property values.
Embodiment 15: The non-transitory machine-readable media of any of Embodiments 9-14, further comprising program code to update the first subset of drilling data with at least a correction of the set of one or more corrections for the data segment of the first drilling data attribute.
Embodiment 16: The non-transitory machine-readable media of Embodiment 15, further comprising program code to retrain the trained machine learning model using at least the updated first subset of drilling data.
Embodiment 17: A apparatus comprising a processor, and a machine-readable medium having program code executable by the processor to cause the apparatus to identify a first subset of drilling data having flawed drilling data entries, wherein the first subset of drilling data corresponds to a data segment of a first drilling data attribute, input features of the drilling data into a trained machine learning model to generate a first prediction for the data segment of the first drilling data attribute, apply one or more drilling rules to the drilling data to generate a second prediction for the data segment of the first drilling data attribute, and indicate a set of one or more corrections for the data segment of the first drilling data attribute based, at least in part, on the first prediction, the second prediction and a confidence value for the first prediction.
Embodiment 18: The apparatus of Embodiment 17 further comprising program code executable by the processor to cause the apparatus to determine that the confidence value for the first prediction satisfies a confidence threshold and correct flawed drilling data entries in the first subset of drilling data with the first prediction.
Embodiment 19: The apparatus of any of Embodiments 17-18 further comprising program code executable by the processor to cause the apparatus to determine that the confidence value for the first prediction does not satisfy a confidence threshold, determine that the second prediction satisfies a data quality criterion, and correct flawed drilling data entries in the first subset of drilling data with the second prediction.
Embodiment 20: The apparatus of any of Embodiments 17-19 further comprising program code executable by the processor to cause the apparatus to generate drilling feature data based, at least in part, on a first plurality of features of a second subset of drilling data and generate the trained machine learning model to predict the data segment of the first drilling data attribute based, at least in part, on the drilling feature data.