ESTIMATING PRODUCTIVITY AND ESTIMATED ULTIMATE RECOVERY (EUR) OF UNCONVENTIONAL WELLS THROUGH SPATIAL-PERFORMANCE RELATIONSHIP USING MACHINE LEARNING

Information

  • Patent Application
  • 20240102371
  • Publication Number
    20240102371
  • Date Filed
    September 23, 2022
    a year ago
  • Date Published
    March 28, 2024
    a month ago
Abstract
Systems and methods include an importance that each of the attributes and features of the well data has on machine learning models. Well data is collected for each well in an unconventional field, including attributes and features of basin data, completion data, and production data. Spatial features are generated for each well in different regions. A combined well features dataset is generated. The dataset maps the well data to the spatial features for each well in the different regions. A training dataset and a testing dataset are generated by splitting the combined dataset. A machine learning model is trained using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR). The performance of a machine learning (EUR) model is evaluated with respect to different regression metrics. An importance that each of the attributes and features of the well data has on machine learning models is determined.
Description
TECHNICAL FIELD

The present disclosure applies to estimating production in unconventional wells, e.g., oil fracking wells.


BACKGROUND

Unconventional oil and gas exploration and exploitation depend highly on drilling multiple horizontal wells and completing them by using multi-stage hydraulic fracturing operations in order to produce economically. Oil and gas are produced from subterranean geologic formations (e.g., reservoirs) by drilling a well that penetrates into hydrocarbon bearing formations. Unconventional oil and gas shale reservoirs offer great potential, as source rocks and have been pivotal in the supply of gas in the last two decades. Significant developments have occurred in the United States and worldwide. While gas shale reservoirs, for example, offer great potential energy sources, they can only be exploited using multi-fracturing horizontal wells. Stimulation in the form of hydraulic fracturing is of significant importance in the success and performance of shale gas wells.


SUMMARY

The present disclosure describes techniques that can be used for estimating the ultimate productivity of wells in different regions in unconventional fields. In some implementations, a computer-implemented method includes the following. Well data is collected for each well in an unconventional field, including attributes and features of basin data, completion data, and production data. Spatial features are generated for each well in different regions in the unconventional field. A combined well features dataset is generated using the well data and the spatial features. The dataset maps the well data to the spatial features for each well in the different regions of the unconventional field. A training dataset and a testing dataset are generated by splitting the combined well features dataset. A machine learning model is trained using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR). The performance of a machine learning (EUR) model is evaluated with respect to different regression metrics. An importance that each of the attributes and features of the well data has on machine learning models is determined based on the spatial features and by evaluating the EUR model. The importance identifies an impact on production by each attribute and feature.


The previously described implementation is implementable using a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer-implemented system including a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method, the instructions stored on the non-transitory, computer-readable medium.


The subject matter described in this specification can be implemented in particular implementations, so as to realize one or more of the following advantages. Techniques of the present disclosure can help in estimating well productivity and in making decisions on field development. The decisions can be based on production data and inherent subsurface relationships using non-reservoir data that, in conventional systems, typically require significant costs and time to obtain. The techniques of the present disclosure can help to perform field evaluations that are not feasible with conventional techniques. This is due to a high order of complexity that exists in the phenomenon of calculation, including the use of several implicit parameters of significance. In conventional systems, it is costly and time-consuming to collect reservoir-related data for future decisions. The techniques of the present disclosure can speed and improve the evaluation of fields/basins while facilitating the identification of sweet spots. This can be done with minimal or no requirements for subsurface and/or reservoir data. This can lead to effective field development planning for maximizing field deliverability and can minimize unnecessary drilling for coring/reservoir. [Inventors, are there any other advantages?]


The details of one or more implementations of the subject matter of this specification are set forth in the Detailed Description, the accompanying drawings, and the claims. Other features, aspects, and advantages of the subject matter will become apparent from the Detailed Description, the claims, and the accompanying drawings.





DESCRIPTION OF DRAWINGS


FIG. 1 is a flow diagram showing an example workflow using machine learning to determine productivity per area, according to some implementations of the present disclosure.



FIGS. 2A and 2B are side and perspective views, respectively, showing an example of a single well and pad with the well, according to some implementations of the present disclosure.



FIGS. 3A and 3B are side and perspective views, respectively, showing an example of a field divided by X and Y with a split block with well locations, according to some implementations of the present disclosure.



FIGS. 4A and 4B are side and perspective views, respectively, showing an example of a multi-well pad and wells drilled to different targets, according to some implementations of the present disclosure.



FIGS. 5A and 5B are top and perspective views, respectively, showing an example of a field split based on X and Y locations, according to some implementations of the present disclosure.



FIG. 6 is a scatter plot showing example locations of gas, oil, and gas+oil wells, according to some implementations of the present disclosure.



FIG. 7 is a graph showing examples of bar charts and descriptive statistics per area based on 90-day cumulative production per location, according to some implementations of the present disclosure.



FIGS. 8A and 8B are graphs showing examples of XGBOOST training and test data results, according to some implementations of the present disclosure.



FIG. 9 is a plot showing examples of feature importance of each input parameter for the studied area, according to some implementations of the present disclosure.



FIG. 10 is a flowchart of an example of a method for determining the importance that each of the attributes and features of the well data has on machine learning models, according to some implementations of the present disclosure.



FIG. 11 is a block diagram illustrating an example computer system used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures as described in the present disclosure, according to some implementations of the present disclosure.





Like reference numbers and designations in the various drawings indicate like elements.


DETAILED DESCRIPTION

The following detailed description describes techniques for estimating the ultimate productivity of wells in different regions in unconventional fields. Unconventional fields include unconventional wells, which are crude oil or gas wells that employ hydraulic fracturing to enhance crude oil or gas production volumes. Various modifications, alterations, and permutations of the disclosed implementations can be made and will be readily apparent to those of ordinary skill in the art, and the general principles defined may be applied to other implementations and applications, without departing from the scope of the disclosure. In some instances, details unnecessary to obtain an understanding of the described subject matter may be omitted so as to not obscure one or more described implementations with unnecessary detail and inasmuch as such details are within the skill of one of ordinary skill in the art. The present disclosure is not intended to be limited to the described or illustrated implementations, but to be accorded the widest scope consistent with the described principles and features.


Techniques of the present disclosure include the use of machine learning-based techniques for predicting production performance of a location and estimated ultimate recovery (EUR) of unconventional (shale) wells through utilization of spatial relationships among available wells. The information can be based on extended relationships (e.g., between wells in a field) determined through models based on machine learning and using production data that is readily available without the need for acquiring and using expensive additional reservoir data.


An objective of the techniques of the present disclosure is to estimate productivity and the EUR of the wells in different regions in unconventional fields by using machine learning models. This can be done using existing data associated with well information, completion, and production (oil, gas, and water) for the wells, without the need for subsurface reservoir characteristics information. Additionally, the techniques can help to identify sweet spot locations for unconventional fields integrated with completion parameters, surface location, vertical depth, and production data. This can be done with limited or no subsurface reservoir data by using machine learning algorithms and based on inherent relationships between production performance and overall completion and reservoir characteristics.



FIG. 1 is a flow diagram showing an example workflow 100 using machine learning to determine productivity per area, according to some implementations of the present disclosure. At 102, data is collected. Collected wells field/basin information that is collected can include surface geographical positions of wells, the wells' subsurface geographical positions at landing and toe sections, the wells' measured depths (MD), the wells' lateral length, and the wells' true vertical depth (TVD). Well completion parameters can include proppant volume per stage or per well, clean fluid volume per stage or per well, stage number/count, fracturing type (e.g., slickwater or hybrid crosslink), and wireline plug and perforation or multi-stage sleeve. Production parameters (e.g., for 90 days, 180 days, or longer) that are collected can include daily production rates and volumes for gas, oil, and water, and production period data.


At 104, the data is cleaned and normalized. Wells that have missing data are eliminated. The data can be normalized based on machine learning algorithm requirements.


At 106, locations of the wells are split. Well groups are split based on surface/subsurface geographical positions, by area, or by field. Additionally, splits can occur by X and Y coordinates or by N×M-mile (e.g., e.g., 1 mile by 1 mile) areas. Well groups are identified based on split geographical locations.


At 108, the dataset is split, e.g., into 80% training and 20% testing. At 110, a model is trained using cross-validation and tuning. At 112, the model is evaluated using training data. The training is done with respect to regression error metrics. At 114, feature importance is calculated to understand which parameters significantly impact production. At 116, productivity per area/location is calculated. At 118, the model is used for future predictions.



FIGS. 2A and 2B are side and perspective views, respectively, showing an example of a single well 200 and pad 206 with the well, according to some implementations of the present disclosure. Oil and gas can be produced from a subterranean geologic formation (described as a reservoir in some cases), e.g., by drilling the well to penetrate into hydrocarbon bearing formations. A surface location 202 is shown for the well. A subsurface location 204 shows a position and depth to which the well is drilled before being completed horizontally into the targeted formation. A true vertical depth (TVD) of the well is indicated as Z coordinate 212. In the 3D perspective view of FIG. 2B, the well's X coordinate 208, Y coordinate 210, and Z coordinate 212 are shown at side view perspective where pad 206 location of the well is also shown.



FIGS. 3A and 3 B are side and perspective views, respectively, showing an example of a field 300 divided by X and Y with split block with well locations, according to some implementations of the present disclosure. Unconventional well production is highly dependent on drilling horizontal wells and completing them with multi-stage fracking stimulation. FIG. 3A shows a horizontal well lateral section 302 described. Multi-stage fracking stimulation treatment is shown at segment 304 (e.g., with perforations).



FIGS. 4A and 4B are side and perspective views, respectively, showing an example of a multi-well pad 400 and wells 402 drilled to different targets, according to some implementations of the present disclosure. Drilling and completing a well with a multi-well pad 404 are best practices for shale development to effectively utilize surface locations. Multiple wells can be drilled from a single pad and completed by drilling into different formations/layers, especially considering if there are multiple target layers available. Useful indicator for the wells that are drilled into the different layers can be identified as different TVDs.



FIGS. 5A and 5B are top and perspective views, respectively, showing an example of a field 500 split based on X and Y locations 502, according to some implementations of the present disclosure. A given field or basin 504 can be split (or divided) along a surface area by X and Y geographical locations 502. In cases in which multiple subsurface producible layers exist, those wells' 506 TVDs can be different for the given pad, e.g., as shown in FIG. 5A as segment (Z) (or Z1, Z2, etc.). This also can be dipping of the entire area where the depth 508 of the wells can be represented for given wells.


Machine Learning Workflow

A primary objective of the techniques of the present disclosure is to estimate the ultimate productivity of the wells in different regions in unconventional fields. This can be done by using machine learning models with data associated with basin, completion operations, and production (oil, gas and water) for the wells, e.g., in the absence of subsurface reservoirs characteristics information. The assumption is that those wells in the same region (unit) shown by pad 404 share the same characteristics. As a result of this assumption, the geological location (spatial coordinates) can be used as a feature representing the underlying reservoir characteristics. In other words, each cell in the grid shown by given field or basin 504 in FIG. 5A is represented by (x,y) coordinates, e.g., D2, and these (x,y) coordinates can be used as input features for the machine learning predictive models.


The workflow of the machine learning model generation is summarized as follows. Data is collected, including basin, completion, and production data, for each well in the unconventional field. Spatial (x,y) coordinate features are collected for each well in each region in the field. The two data sets are combined into one complete set of features. The dataset is split, e.g., 80%/20%, into a training dataset and a testing dataset. The model is trained using cross-validation and tuning. The model is evaluated on training data and testing data, with respect to regression error metrics. An importance of input attributes/features on machine learning models is calculated. This step is used to help to understand which parameters have the greatest impact on production based on the features.


The workflow is applicable to machine learning algorithms including but not limited to neural networks (ANN), support vector machines (SVM) or extreme gradient boosting machines (XGBOOST) models and deep learning algorithms including, but not limited to, deep neural networks applicable to tabular data.


Initial Cumulative 90-day Production as a Proxy to EUR

It is common to use initial cumulative production for a certain number of days as a proxy for the EUR to represent well performance having a high level of correlation with the EUR. This is also a practical approach, as cumulative production for a certain number of initial days is known at the early life of the well, as a known fact rather than an assumption or an estimation. The duration of initial production days can be, for example, 30, 60, or 90 days. The present disclosure uses 90 days of initial cumulative for improved certainty and correlation to EUR.


Example Case Used for a Study

The development of case studies and subsequent testing associated with the techniques of the present disclosure were based on data from more than 7,000 horizontal wells in a particular shale basin. Data was collected for wells drilled and completed during a 10-year time period spanning 2010 to 2020. A machine learning predictive model workflow was successfully applied for the study. A dataset was generated that covers completion and production data for the 10-year time period.


Collected Input Data

Input data includes completion parameters and geographical locations. Completion parameters can include true vertical depth (TVD), measured depth (MD), lateral length, fluid volume, proppant volume, proppant per lateral (e.g., measured in pounds per foot), fluid per lateral (e.g., measured in gallons per foot), and proppant per fluid (e.g., measured in pounds per gallon). Geographical locations can include locations split based on special X and Y coordinates or by geographical location, from which new groups are classified. Example split locations are shown in FIG. 6.



FIG. 6 is a scatter plot 600 showing example locations 602 of gas, oil, and gas'0 oil wells 604, according to some implementations of the present disclosure. The wells' locations can be divided into sections. The wells' locations are plotted relative to an x-axis 606 and a y-axis 608.


Output Parameter-Target

A target cumulative total barrels of oil equivalent (e.g., measured in thousands of barrels (k-bbls)) can be set for a given time period (e.g. 90 days), as was done during development and testing. The field, in this case, included shale fluid, which is a mixture of gas, condensate, and oil. The units of barrels of oil equivalent (BOE) were used as a representative measurement for combined fluids.



FIG. 7 is a graph 700 showing examples of bar charts 702 and descriptive statistics 704 per area based on 90-day cumulative production 706 per location 708, according to some implementations of the present disclosure. The descriptive statistics 704 include a count, a median, and a number of outliers. Non-productive ones (e.g., E5) of locations 708 are not included in the bar charts 702, as indicated in the corresponding gap in the bar charts 702 above the location-column-oriented descriptive statistics 704. The box plot of the graph 700 shows a distribution of the data, where the bottom of a rectangle 710 represents a 1st quartile (Q1), and a top of the rectangle 710 represents a 3rd quartile (Q3). A height/length of each rectangle 710 shows an inter-quartile range (IQR). The top and bottom of the solid lines in the bar charts 702 mark the values [Q3+3*IQR/2] and [Q1−3*IQR/2], respectively. White dash lines 711 indicate Q2 (median) values. Individual dots on the lines/bars of the bar charts 702 show outliers in the corresponding data.


Data Cleanout and Pre-Processing

In some implementations, data cleaning can occur, based on an analysis of the dataset, and can include eliminating one or both of missing rows and non-representative data. In some implementations, pre-processing can occur, including performing a conversion of datasets to match a format used by the machine learning models and to normalize the data as needed.


Machine Learning Model Implementation

In experimentation and in developing and testing the techniques of the present disclosure, general machine learning models were created for predicting productivity of the wells by using completion and location parameters as input. Implementations of the machine learning models included the exploration of neural networks, support vector machines, or extreme gradient boosting machine models.


Performance Evaluation of the Machine Learning (ML) Algorithms

Results of a study of the performance and the test results of the different ML models shown in Table 1. The table includes a comparison of different models. The results show that the XGBOOST model showed the best results based on training and testing datasets based on mean average error (MAE) results. Training and test data results for XGBOOST are also shown in FIGS. 8A and 8B.









TABLE 1







Machine Learning Model Results - Test Data









Results












Algorithm
RMSE
MAE
R2
















XGBOOST
27
18
0.6



ANN
19
19
0.5



SVM
29
20
0.5











FIGS. 8A and 8B are graphs 800 and 850, respectively, showing examples of XGBOOST training and test data results, according to some implementations of the present disclosure. The graphs include data points for individual wells plotted relative to a predicted cumulative production 802 and an actual cumulative production 804.



FIG. 9 is a plot showing examples of feature importance 902 of each input parameter 904 for the studied area, according to some implementations of the present disclosure. The feature importance and parameters apply to the XGBOOST model that was studied. The input parameters 904 are ranked by over feature importance 902. Each well's parameter and corresponding color/shading as plotted represents a feature value 908 on a scale from highest to lowest. Feature values 908 range from a low value 910 (indicating a negative impact on model output) to a high value 912 (indicating a positive impact on the model output). Each data point is plotted relative to a SHapley Additive exPlanations (SHAP) value 906, indicating an impact on the model output.


Table 2 represents cumulative production for selected locations. Table 2 lists well counts and cumulative production totals for the locations, including cumulative oil measured in bbl, cumulative gas measured in thousands of cubic feet (Mcf), cumulative totals in thousands of barrels per day, and cumulative totals in thousands of barrels per day. Cumulative total 90 days oil equivalent production in thousands of oil equivalent barrels (bbls e) per day.









TABLE 2







Cumulative 90-day production by location distribution














Cum.
Cum.
Cum.
Cum.



Well
oil 90
Gas 90
Total 90
Total 90


Location
Count
(bbl)
(mcf)
(Kbbls)
(bbls e)















B7
1169
26,213
183,470
59
58,504


G5
583
68,775
158,787
97
96,721


H2
125
33,744
11,194
36
35,714









This example outlines the overall machine learning/artificial intelligence methodologies of the proposed innovative solution.



FIG. 10 is a flowchart of an example of a method 1000 for determining an importance that each of the attributes and features of the well data has on machine learning models, according to some implementations of the present disclosure. For clarity of presentation, the description that follows generally describes method 1000 in the context of the other figures in this description. However, it will be understood that method 1000 can be performed, for example, by any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate. In some implementations, various steps of method 1000 can be run in parallel, in combination, in loops, or in any order.


At 1002, well data is collected for each well in an unconventional field, including attributes and features of basin data, completion data, and production data. In some implementations, the well data can be cleaned and normalized to remove well data for wells with missing data and to normalize the well data per requirements for machine learning. From 1002, method 1000 proceeds to 1004.


At 1004, spatial features are generated for each well in different regions in the unconventional field. For example, the spatial features can include (x,y) coordinates indicating a horizontal position and a z-coordinate indicating depth. From 1004, method 1000 proceeds to 1006.


At 1006, a combined well features dataset is generated using the well data and the spatial features. The dataset maps the well data to the spatial features for each well in the different regions of the unconventional field. Well data for each parameter can be grouped based on surface locations. From 1006, method 1000 proceeds to 1008.


At 1008, a training dataset and a testing dataset are generated by splitting the combined well features dataset. As an example, the splitting (e.g., an 80%/20% split) can be performed using machine learning modeling and includes applying stratifications based on a region having a representative well for each region. From 1008, method 1000 proceeds to 1010.


At 1010, a machine learning model is trained using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR). Cumulative 90 days of production are used as proxy for EUR. During development, EUR was used as a target parameter. From 1010, method 1000 proceeds to 1012.


At 1012, the performance of an machine learning (EUR) model is evaluated with respect to different regression metrics including (but not limited to) root-mean squared error (RMSE), mean squared error (MSE), and mean absolute error (MAE) on the training dataset, and the testing dataset to quantify the accuracy of the model. From 1012, method 1000 proceeds to 1014. well data is collected for each well in an unconventional field, including attributes and features of basin data, completion data, and production data.


At 1014, an importance that each of the attributes and features of the well data has on machine learning models is determined based on the spatial features and by evaluating the EUR model. The importance identifies an impact on production by each attribute and feature. For example, the importances can be determined as described with reference to FIG. 9. After 1014, method 1000 can stop.


In some implementations, method 1000 further includes comprising using the importance of each of the attributes and features of the well data in designing wells and stimulating wells. In some implementations, method 1000 further includes determining, using the importance of each of the attributes and features of the well data, horizontal and spatial well placement.


During development and testing, in an exploration phase, a limited number of wells were initially drilled to evaluate and obtain subsurface reservoir data of the wells by performing, pilot well drilling well logging, core sampling, formation fluid sampling and stimulation, and testing. Then, horizontal wells were drilled and completed using multi-stage fracturing operations and well testing to confirm productivity. During an appraisal phase of the exploitation of the field regarding unconventional resources, additional wells were drilled and completed, including determining and confirming well spacing and effective landing points. Diagnostic tests were performed such as extended flowback, optimization of fracturing proppant and fluid volume, and well spacing. Optimization can refer, for example, to achieving fracturing proppant and fluid volume mixtures (e.g., percentages) that result in a production (or an improvement of production) greater than a predefined threshold. The diagnostic tests were integrated with micro-seismic monitoring performed during appraisal/pilot phases. During the development phase, a number of wells were drilled and completed and put into production to maximize recovery.


In some implementations, in addition to (or in combination with) any previously-described features, techniques of the present disclosure can include the following. Outputs of the techniques of the present disclosure can be performed before, during, or in combination with wellbore operations, such as to provide inputs to change the settings or parameters of equipment used for drilling. Examples of wellbore operations include forming/drilling a wellbore, hydraulic fracturing, and producing through the wellbore, to name a few. The wellbore operations can be triggered or controlled, for example, by outputs of the methods of the present disclosure. In some implementations, customized user interfaces can present intermediate or final results of the above described processes to a user. Information can be presented in one or more textual, tabular, or graphical formats, such as through a dashboard. The information can be presented at one or more on-site locations (such as at an oil well or other facility), on the Internet (such as on a webpage), on a mobile application (or “app”), or at a central processing facility. The presented information can include suggestions, such as suggested changes in parameters or processing inputs, that the user can select to implement improvements in a production environment, such as in the exploration, production, and/or testing of petrochemical processes or facilities. For example, the suggestions can include parameters that, when selected by the user, can cause a change to, or an improvement in, drilling parameters (including drill bit speed and direction) or overall production of a gas or oil well. The suggestions, when implemented by the user, can improve the speed and accuracy of calculations, streamline processes, improve models, and solve problems related to efficiency, performance, safety, reliability, costs, downtime, and the need for human interaction. In some implementations, the suggestions can be implemented in real-time, such as to provide an immediate or near-immediate change in operations or in a model. The term real-time can correspond, for example, to events that occur within a specified period of time, such as within one minute or within one second. Events can include readings or measurements captured by downhole equipment such as sensors, pumps, bottom hole assemblies, or other equipment. The readings or measurements can be analyzed at the surface, such as by using applications that can include modeling applications and machine learning. The analysis can be used to generate changes to settings of downhole equipment, such as drilling equipment. In some implementations, values of parameters or other variables that are determined can be used automatically (such as through using rules) to implement changes in oil or gas well exploration, production/drilling, or testing. For example, outputs of the present disclosure can be used as inputs to other equipment and/or systems at a facility. This can be especially useful for systems or various pieces of equipment that are located several meters or several miles apart, or are located in different countries or other jurisdictions.



FIG. 11 is a block diagram of an example computer system 1100 used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures described in the present disclosure, according to some implementations of the present disclosure. The illustrated computer 1102 is intended to encompass any computing device such as a server, a desktop computer, a laptop/notebook computer, a wireless data port, a smart phone, a personal data assistant (PDA), a tablet computing device, or one or more processors within these devices, including physical instances, virtual instances, or both. The computer 1102 can include input devices such as keypads, keyboards, and touch screens that can accept user information. Also, the computer 1102 can include output devices that can convey information associated with the operation of the computer 1102. The information can include digital data, visual data, audio information, or a combination of information. The information can be presented in a graphical user interface (UI) (or GUI).


The computer 1102 can serve in a role as a client, a network component, a server, a database, a persistency, or components of a computer system for performing the subject matter described in the present disclosure. The illustrated computer 1102 is communicably coupled with a network 1130. In some implementations, one or more components of the computer 1102 can be configured to operate within different environments, including cloud-computing-based environments, local environments, global environments, and combinations of environments.


At a top level, the computer 1102 is an electronic computing device operable to receive, transmit, process, store, and manage data and information associated with the described subject matter. According to some implementations, the computer 1102 can also include, or be communicably coupled with, an application server, an email server, a web server, a caching server, a streaming data server, or a combination of servers.


The computer 1102 can receive requests over network 1130 from a client application (for example, executing on another computer 1102). The computer 1102 can respond to the received requests by processing the received requests using software applications. Requests can also be sent to the computer 1102 from internal users (for example, from a command console), external (or third) parties, automated applications, entities, individuals, systems, and computers.


Each of the components of the computer 1102 can communicate using a system bus 1103. In some implementations, any or all of the components of the computer 1102, including hardware or software components, can interface with each other or the interface 1104 (or a combination of both) over the system bus 1103. Interfaces can use an application programming interface (API) 1112, a service layer 1113, or a combination of the API 1112 and service layer 1113. The API 1112 can include specifications for routines, data structures, and object classes. The API 1112 can be either computer-language independent or dependent. The API 1112 can refer to a complete interface, a single function, or a set of APIs.


The service layer 1113 can provide software services to the computer 1102 and other components (whether illustrated or not) that are communicably coupled to the computer 1102. The functionality of the computer 1102 can be accessible for all service consumers using this service layer. Software services, such as those provided by the service layer 1113, can provide reusable, defined functionalities through a defined interface. For example, the interface can be software written in JAVA, C++, or a language providing data in extensible markup language (XML) format. While illustrated as an integrated component of the computer 1102, in alternative implementations, the API 1112 or the service layer 1113 can be stand-alone components in relation to other components of the computer 1102 and other components communicably coupled to the computer 1102. Moreover, any or all parts of the API 1112 or the service layer 1113 can be implemented as child or sub-modules of another software module, enterprise application, or hardware module without departing from the scope of the present disclosure.


The computer 1102 includes an interface 1104. Although illustrated as a single interface 1104 in FIG. 11, two or more interfaces 1104 can be used according to particular needs, desires, or particular implementations of the computer 1102 and the described functionality. The interface 1104 can be used by the computer 1102 for communicating with other systems that are connected to the network 1130 (whether illustrated or not) in a distributed environment. Generally, the interface 1104 can include, or be implemented using, logic encoded in software or hardware (or a combination of software and hardware) operable to communicate with the network 1130. More specifically, the interface 1104 can include software supporting one or more communication protocols associated with communications. As such, the network 1130 or the interface's hardware can be operable to communicate physical signals within and outside of the illustrated computer 1102.


The computer 1102 includes a processor 1105. Although illustrated as a single processor 1105 in FIG. 11, two or more processors 1105 can be used according to particular needs, desires, or particular implementations of the computer 1102 and the described functionality. Generally, the processor 1105 can execute instructions and can manipulate data to perform the operations of the computer 1102, including operations using algorithms, methods, functions, processes, flows, and procedures as described in the present disclosure.


The computer 1102 also includes a database 1106 that can hold data for the computer 1102 and other components connected to the network 1130 (whether illustrated or not). For example, database 1106 can be an in-memory, conventional, or a database storing data consistent with the present disclosure. In some implementations, database 1106 can be a combination of two or more different database types (for example, hybrid in-memory and conventional databases) according to particular needs, desires, or particular implementations of the computer 1102 and the described functionality. Although illustrated as a single database 1106 in FIG. 11, two or more databases (of the same, different, or combination of types) can be used according to particular needs, desires, or particular implementations of the computer 1102 and the described functionality. While database 1106 is illustrated as an internal component of the computer 1102, in alternative implementations, database 1106 can be external to the computer 1102.


The computer 1102 also includes a memory 1107 that can hold data for the computer 1102 or a combination of components connected to the network 1130 (whether illustrated or not). Memory 1107 can store any data consistent with the present disclosure. In some implementations, memory 1107 can be a combination of two or more different types of memory (for example, a combination of semiconductor and magnetic storage) according to particular needs, desires, or particular implementations of the computer 1102 and the described functionality. Although illustrated as a single memory 1107 in FIG. 11, two or more memories 1107 (of the same, different, or combination of types) can be used according to particular needs, desires, or particular implementations of the computer 1102 and the described functionality. While memory 1107 is illustrated as an internal component of the computer 1102, in alternative implementations, memory 1107 can be external to the computer 1102.


The application 1108 can be an algorithmic software engine providing functionality according to particular needs, desires, or particular implementations of the computer 1102 and the described functionality. For example, application 1108 can serve as one or more components, modules, or applications. Further, although illustrated as a single application 1108, the application 1108 can be implemented as multiple applications 1108 on the computer 1102. In addition, although illustrated as internal to the computer 1102, in alternative implementations, the application 1108 can be external to the computer 1102.


The computer 1102 can also include a power supply 1114. The power supply 1114 can include a rechargeable or non-rechargeable battery that can be configured to be either user- or non-user-replaceable. In some implementations, the power supply 1114 can include power-conversion and management circuits, including recharging, standby, and power management functionalities. In some implementations, the power supply 1114 can include a power plug to allow the computer 1102 to be plugged into a wall socket or a power source to, for example, power the computer 1102 or recharge a rechargeable battery.


There can be any number of computers 1102 associated with, or external to, a computer system containing computer 1102, with each computer 1102 communicating over network 1130. Further, the terms “client,” “user,” and other appropriate terminology can be used interchangeably, as appropriate, without departing from the scope of the present disclosure. Moreover, the present disclosure contemplates that many users can use one computer 1102 and one user can use multiple computers 1102.


Described implementations of the subject matter can include one or more features, alone or in combination.


For example, in a first implementation, a computer-implemented method includes the following. Well data is collected for each well in an unconventional field, including attributes and features of basin data, completion data, and production data. Spatial features are generated for each well in different regions in the unconventional field. A combined well features dataset is generated using the well data and the spatial features. The dataset maps the well data to the spatial features for each well in the different regions of the unconventional field. A training dataset and a testing dataset are generated by splitting the combined well features dataset. A machine learning model is trained using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR). The performance of a machine learning (EUR) model is evaluated with respect to different regression metrics. An importance that each of the attributes and features of the well data has on machine learning models is determined based on the spatial features and by evaluating the EUR model. The importance identifies an impact on production by each attribute and feature.


The foregoing and other described implementations can each, optionally, include one or more of the following features:


A first feature, combinable with any of the following features, where the method further includes cleaning and normalizing the well data to remove well data for wells with missing data and to normalize the well data per requirements for machine learning.


A second feature, combinable with any of the previous or following features, where the spatial features include (x,y) coordinates indicating a horizontal position and a z-coordinate indicating depth.


A third feature, combinable with any of the previous or following features, where the splitting is performed using machine learning modeling and includes applying stratifications based on region having representative well for each regions.


A fourth feature, combinable with any of the previous or following features, where the method further includes using the importance of each of the attributes and features of the well data in designing wells and stimulating wells.


A fifth feature, combinable with any of the previous or following features, where the method further includes determining, using the importance of each of the attributes and features of the well data, horizontal and spatial well placement.


A sixth feature, combinable with any of the previous or following features, where the spatial features are generated for a time period of 30, 60, or 90 days.


In a second implementation, a non-transitory, computer-readable medium stores one or more instructions executable by a computer system to perform operations including the following. Well data is collected for each well in an unconventional field, including attributes and features of basin data, completion data, and production data. Spatial features are generated for each well in different regions in the unconventional field. A combined well features dataset is generated using the well data and the spatial features. The dataset maps the well data to the spatial features for each well in the different regions of the unconventional field. A training dataset and a testing dataset are generated by splitting the combined well features dataset. A machine learning model is trained using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR). The performance of a machine learning (EUR) model is evaluated with respect to different regression metrics. An importance that each of the attributes and features of the well data has on machine learning models is determined based on the spatial features and by evaluating the EUR model. The importance identifies an impact on production by each attribute and feature.


The foregoing and other described implementations can each, optionally, include one or more of the following features:


A first feature, combinable with any of the following features, where the operations further include cleaning and normalizing the well data to remove well data for wells with missing data and to normalize the well data per requirements for machine learning.


A second feature, combinable with any of the previous or following features, where the spatial features include (x,y) coordinates indicating a horizontal position and a z-coordinate indicating depth.


A third feature, combinable with any of the previous or following features, where the splitting is performed using machine learning modeling and includes applying stratifications based on region having representative well for each regions.


A fourth feature, combinable with any of the previous or following features, where the operations further include using the importance of each of the attributes and features of the well data in designing wells and stimulating wells.


A fifth feature, combinable with any of the previous or following features, where the operations further include determining, using the importance of each of the attributes and features of the well data, horizontal and spatial well placement.


A sixth feature, combinable with any of the previous or following features, where the spatial features are generated for a time period of 30, 60, or 90 days.


In a third implementation, a computer-implemented system includes one or more processors and a non-transitory computer-readable storage medium coupled to the one or more processors and storing programming instructions for execution by the one or more processors. The programming instructions instruct the one or more processors to perform operations including the following. Well data is collected for each well in an unconventional field, including attributes and features of basin data, completion data, and production data. Spatial features are generated for each well in different regions in the unconventional field. A combined well features dataset is generated using the well data and the spatial features. The dataset maps the well data to the spatial features for each well in the different regions of the unconventional field. A training dataset and a testing dataset are generated by splitting the combined well features dataset. A machine learning model is trained using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR). The performance of a machine learning (EUR) model is evaluated with respect to different regression metrics. An importance that each of the attributes and features of the well data has on machine learning models is determined based on the spatial features and by evaluating the EUR model. The importance identifies an impact on production by each attribute and feature.


The foregoing and other described implementations can each, optionally, include one or more of the following features:


A first feature, combinable with any of the following features, where the operations further include cleaning and normalizing the well data to remove well data for wells with missing data and to normalize the well data per requirements for machine learning.


A second feature, combinable with any of the previous or following features, where the spatial features include (x,y) coordinates indicating a horizontal position and a z-coordinate indicating depth.


A third feature, combinable with any of the previous or following features, where the splitting is performed using machine learning modeling and includes applying stratifications based on region having representative well for each regions.


A fourth feature, combinable with any of the previous or following features, where the operations further include using the importance of each of the attributes and features of the well data in designing wells and stimulating wells.


A fifth feature, combinable with any of the previous or following features, where the operations further include determining, using the importance of each of the attributes and features of the well data, horizontal and spatial well placement.


Implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Software implementations of the described subject matter can be implemented as one or more computer programs. Each computer program can include one or more modules of computer program instructions encoded on a tangible, non-transitory, computer-readable computer-storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively, or additionally, the program instructions can be encoded in/on an artificially generated propagated signal. For example, the signal can be a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to a suitable receiver apparatus for execution by a data processing apparatus. The computer-storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of computer-storage mediums.


The terms “data processing apparatus,” “computer,” and “electronic computer device” (or equivalent as understood by one of ordinary skill in the art) refer to data processing hardware. For example, a data processing apparatus can encompass all kinds of apparatuses, devices, and machines for processing data, including by way of example, a programmable processor, a computer, or multiple processors or computers. The apparatus can also include special purpose logic circuitry including, for example, a central processing unit (CPU), a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). In some implementations, the data processing apparatus or special purpose logic circuitry (or a combination of the data processing apparatus or special purpose logic circuitry) can be hardware- or software-based (or a combination of both hardware- and software-based). The apparatus can optionally include code that creates an execution environment for computer programs, for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of execution environments. The present disclosure contemplates the use of data processing apparatuses with or without conventional operating systems, such as LINUX, UNIX, WINDOWS, MAC OS, ANDROID, or IOS.


A computer program, which can also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language. Programming languages can include, for example, compiled languages, interpreted languages, declarative languages, or procedural languages. Programs can be deployed in any form, including as stand-alone programs, modules, components, subroutines, or units for use in a computing environment. A computer program can, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, for example, one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files storing one or more modules, sub-programs, or portions of code. A computer program can be deployed for execution on one computer or on multiple computers that are located, for example, at one site or distributed across multiple sites that are interconnected by a communication network. While portions of the programs illustrated in the various figures may be shown as individual modules that implement the various features and functionality through various objects, methods, or processes, the programs can instead include a number of sub-modules, third-party services, components, and libraries. Conversely, the features and functionality of various components can be combined into single components as appropriate. Thresholds used to make computational determinations can be statically, dynamically, or both statically and dynamically determined.


The methods, processes, or logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The methods, processes, or logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, for example, a CPU, an FPGA, or an ASIC.


Computers suitable for the execution of a computer program can be based on one or more of general and special purpose microprocessors and other kinds of CPUs. The elements of a computer are a CPU for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a CPU can receive instructions and data from (and write data to) a memory.


Graphics processing units (GPUs) can also be used in combination with CPUs. The GPUs can provide specialized processing that occurs in parallel to processing performed by CPUs. The specialized processing can include artificial intelligence (AI) applications and processing, for example. GPUs can be used in GPU clusters or in multi-GPU computing.


A computer can include, or be operatively coupled to, one or more mass storage devices for storing data. In some implementations, a computer can receive data from, and transfer data to, the mass storage devices including, for example, magnetic, magneto-optical disks, or optical disks. Moreover, a computer can be embedded in another device, for example, a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a portable storage device such as a universal serial bus (USB) flash drive.


Computer-readable media (transitory or non-transitory, as appropriate) suitable for storing computer program instructions and data can include all forms of permanent/non-permanent and volatile/non-volatile memory, media, and memory devices. Computer-readable media can include, for example, semiconductor memory devices such as random access memory (RAM), read-only memory (ROM), phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and flash memory devices. Computer-readable media can also include, for example, magnetic devices such as tape, cartridges, cassettes, and internal/removable disks. Computer-readable media can also include magneto-optical disks and optical memory devices and technologies including, for example, digital video disc (DVD), CD-ROM, DVD+/-R, DVD-RAM, DVD-ROM, HD-DVD, and BLU-RAY. The memory can store various objects or data, including caches, classes, frameworks, applications, modules, backup data, jobs, web pages, web page templates, data structures, database tables, repositories, and dynamic information. Types of objects and data stored in memory can include parameters, variables, algorithms, instructions, rules, constraints, and references. Additionally, the memory can include logs, policies, security or access data, and reporting files. The processor and the memory can be supplemented by, or incorporated into, special purpose logic circuitry.


Implementations of the subject matter described in the present disclosure can be implemented on a computer having a display device for providing interaction with a user, including displaying information to (and receiving input from) the user. Types of display devices can include, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), a light-emitting diode (LED), and a plasma monitor. Display devices can include a keyboard and pointing devices including, for example, a mouse, a trackball, or a trackpad. User input can also be provided to the computer through the use of a touchscreen, such as a tablet computer surface with pressure sensitivity or a multi-touch screen using capacitive or electric sensing. Other kinds of devices can be used to provide for interaction with a user, including to receive user feedback including, for example, sensory feedback including visual feedback, auditory feedback, or tactile feedback. Input from the user can be received in the form of acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to, and receiving documents from, a device that the user uses. For example, the computer can send web pages to a web browser on a user's client device in response to requests received from the web browser.


The term “graphical user interface,” or “GUI,” can be used in the singular or the plural to describe one or more graphical user interfaces and each of the displays of a particular graphical user interface. Therefore, a GUI can represent any graphical user interface, including, but not limited to, a web browser, a touch-screen, or a command line interface (CLI) that processes information and efficiently presents the information results to the user. In general, a GUI can include a plurality of user interface (UI) elements, some or all associated with a web browser, such as interactive fields, pull-down lists, and buttons. These and other UI elements can be related to or represent the functions of the web browser.


Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, for example, as a data server, or that includes a middleware component, for example, an application server. Moreover, the computing system can include a front-end component, for example, a client computer having one or both of a graphical user interface or a Web browser through which a user can interact with the computer. The components of the system can be interconnected by any form or medium of wireline or wireless digital data communication (or a combination of data communication) in a communication network. Examples of communication networks include a local area network (LAN), a radio access network (RAN), a metropolitan area network (MAN), a wide area network (WAN), Worldwide Interoperability for Microwave Access (WIMAX), a wireless local area network (WLAN) (for example, using 802.11 a/b/g/n or 802.20 or a combination of protocols), all or a portion of the Internet, or any other communication system or systems at one or more locations (or a combination of communication networks). The network can communicate with, for example, Internet Protocol (IP) packets, frame relay frames, asynchronous transfer mode (ATM) cells, voice, video, data, or a combination of communication types between network addresses.


The computing system can include clients and servers. A client and server can generally be remote from each other and can typically interact through a communication network. The relationship of client and server can arise by virtue of computer programs running on the respective computers and having a client-server relationship.


Cluster file systems can be any file system type accessible from multiple servers for read and update. Locking or consistency tracking may not be necessary since the locking of exchange file system can be done at the application layer. Furthermore, Unicode data files can be different from non-Unicode data files.


While this specification contains many specific implementation details, these should not be construed as limitations on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular implementations. Certain features that are described in this specification in the context of separate implementations can also be implemented, in combination, in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations, separately, or in any suitable sub-combination. Moreover, although previously described features may be described as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can, in some cases, be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.


Particular implementations of the subject matter have been described. Other implementations, alterations, and permutations of the described implementations are within the scope of the following claims as will be apparent to those skilled in the art. While operations are depicted in the drawings or claims in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed (some operations may be considered optional), to achieve desirable results. In certain circumstances, multitasking or parallel processing (or a combination of multitasking and parallel processing) may be advantageous and performed as deemed appropriate.


Moreover, the separation or integration of various system modules and components in the previously described implementations should not be understood as requiring such separation or integration in all implementations. It should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.


Accordingly, the previously described example implementations do not define or constrain the present disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of the present disclosure.


Furthermore, any claimed implementation is considered to be applicable to at least a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer system including a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method or the instructions stored on the non-transitory, computer-readable medium.

Claims
  • 1. A computer-implemented method, comprising: collecting well data, including attributes and features of basin data, completion data, and production data, for each well in an unconventional field;generating spatial features for each well in different regions in the unconventional field;generating, using the well data and the spatial features, a combined well features dataset mapping the well data to the spatial features for each well in the different regions of the unconventional field;generating, by splitting the combined well features dataset, a training dataset, and a testing dataset;training a machine learning model using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR);evaluating the performance of a machine learning (EUR) model with respect to different regression metrics; anddetermining, based on the spatial features and by evaluating the EUR model, an importance of each of the attributes and features of the well data in machine learning models, wherein the importance identifies an impact on production by each attribute and feature.
  • 2. The computer-implemented method of claim 1, further comprising: cleaning and normalizing the well data to remove well data for wells with missing data and to normalize the well data per requirements for machine learning.
  • 3. The computer-implemented method of claim 1, wherein the spatial features include (x,y) coordinates indicating a horizontal position and a z-coordinate indicating depth.
  • 4. The computer-implemented method of claim 1, wherein the splitting is performed using machine learning modeling and includes applying stratifications based on region having representative well for each regions.
  • 5. The computer-implemented method of claim 1, further comprising using the importance of each of the attributes and features of the well data in designing wells and stimulating wells.
  • 6. The computer-implemented method of claim 1, further comprising determining, using the importance of each of the attributes and features of the well data, horizontal and spatial well placement.
  • 7. The computer-implemented method of claim 1, wherein the spatial features are generated for a time period of 30, 60, or 90 days.
  • 8. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising: collecting well data, including attributes and features of basin data, completion data, and production data, for each well in an unconventional field;generating spatial features for each well in different regions in the unconventional field;generating, using the well data and the spatial features, a combined well features dataset mapping the well data to the spatial features for each well in the different regions of the unconventional field;generating, by splitting the combined well features dataset, a training dataset, and a testing dataset;training a machine learning model using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR);evaluating the performance of a machine learning (EUR) model with respect to different regression metrics; anddetermining, based on the spatial features and by evaluating the EUR model, an importance of each of the attributes and features of the well data in machine learning models, wherein the importance identifies an impact on production by each attribute and feature.
  • 9. The non-transitory, computer-readable medium of claim 8, the operations further comprising: cleaning and normalizing the well data to remove well data for wells with missing data and to normalize the well data per requirements for machine learning.
  • 10. The non-transitory, computer-readable medium of claim 8, wherein the spatial features include (x,y) coordinates indicating a horizontal position and a z-coordinate indicating depth.
  • 11. The non-transitory, computer-readable medium of claim 8, wherein the splitting is performed using machine learning modeling and includes applying stratifications based on region having representative well for each regions.
  • 12. The non-transitory, computer-readable medium of claim 8, the operations further comprising using the importance of each of the attributes and features of the well data in designing wells and stimulating wells.
  • 13. The non-transitory, computer-readable medium of claim 8, the operations further comprising determining, using the importance of each of the attributes and features of the well data, horizontal and spatial well placement.
  • 14. The non-transitory, computer-readable medium of claim 8, wherein the spatial features are generated for a time period of 30, 60, or 90 days.
  • 15. A computer-implemented system, comprising: one or more processors; anda non-transitory computer-readable storage medium coupled to the one or more processors and storing programming instructions for execution by the one or more processors, the programming instructions instructing the one or more processors to perform operations comprising: collecting well data, including attributes and features of basin data, completion data, and production data, for each well in an unconventional field;generating spatial features for each well in different regions in the unconventional field;generating, using the well data and the spatial features, a combined well features dataset mapping the well data to the spatial features for each well in the different regions of the unconventional field;generating, by splitting the combined well features dataset, a training dataset, and a testing dataset;training a machine learning model using cross-validation and tuning on the training dataset to predict estimated ultimate recovery (EUR);evaluating the performance of a machine learning (EUR) model with respect to different regression metrics; anddetermining, based on the spatial features and by evaluating the EUR model, an importance of each of the attributes and features of the well data in machine learning models, wherein the importance identifies an impact on production by each attribute and feature.
  • 16. The computer-implemented system of claim 15, the operations further comprising: cleaning and normalizing the well data to remove well data for wells with missing data and to normalize the well data per requirements for machine learning.
  • 17. The computer-implemented system of claim 15, wherein the spatial features include (x,y) coordinates indicating a horizontal position and a z-coordinate indicating depth.
  • 18. The computer-implemented system of claim 15, wherein the splitting is performed using machine learning modeling and includes applying stratifications based on region having representative well for each regions.
  • 19. The computer-implemented system of claim 15, the operations further comprising using the importance of each of the attributes and features of the well data in designing wells and stimulating wells.
  • 20. The computer-implemented system of claim 15, the operations further comprising determining, using the importance of each of the attributes and features of the well data, horizontal and spatial well placement.