The disclosure generally relates to oil and gas operations and, more specifically, to improving oil and gas operations, such as the retrieval of hydrocarbons, via the analysis of textual data.
Accessing hydrocarbon reserves, such as gas or oil reserves, is an example of an oil and gas operation that involves creating a wellbore by drilling into the earth using a drill bit. There are several aspects of the entire process of accessing the reserves that involve operations and actions before and after the drilling, including planning and analyzing. Often the actions or activities involve the exchange of data that can differ in variety, volume, velocity, veracity, and value. A large amount of the data is represented and communicated in the form of numbers. To name a few examples, depth, pressure, weight, tension, torque, revolutions per minute (RPM), and flow rate are the type of data that can be represented by numbers. Apart from the numerical data, there is also a large amount of information associated with operations that is represented and shared in the form of text.
Reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
As mentioned apart from the numerical data, there is also a large amount of information associated with operations that is represented and shared in the form of text. The textual data includes, for example, standard operating procedures, drilling operation guidelines, daily operational reports (DOR), health and safety executive (HSE) reports, end of well (EOW) reports, transcribed phone calls, chats, emails, etc. Analyzing and obtaining usable information from the unstructured text data is a challenge that requires manual review over days, weeks, or months. While some rule-based solutions like fuzzy-string matching, regex matching, etc., exists for automated text-based analysis and information extraction, the rules might need to be conditioned according to a dataset to be applied, which only a professional programmer/software engineer is able to do. Additionally, if any calibrations or modifications are needed for existing solutions, such as natural language processing (NLP), fuzzy logic, regex modes, n-grams, etc., changes to the corresponding code of the program/software are typically required.
As such, the disclosure provides a system and method for automated analysis of textual data associated with oil and gas operations using Large Language Models (LLMs) and carefully designed prompts or prompt chains using domain knowledge to simulate human thinking and analysis. The oil and gas operation textual data can be structured data, unstructured data, or a combination of both. The LLMs can either be online models, such as GPT 3.5/4 from OpenAI/AzureOpenAI, etc., or offline models, such as Llama-2, Dolly, etc. The offline LLMs may be fine-tuned models and can be located on edge devices using smaller and more efficient quantized version of the models. In contrast to existing textual analysis systems noted above, adjustments or calibrations to the datasets in the disclosed system are based on pre-configured domain-based prompts, referred to herein as curated domain prompts, which are all in natural language using domain expertise, keywords, and explanations that do not require a programmer/software developer. Instead, a domain person can develop and save the curated domain prompts to direct the analysis of the oil and gas textual data. As such, prototyping of ideas and scenarios can be quickly enabled compared to primitive NLP approaches presently being used.
The disclosed solution provides several advantages for analyzing textual data associated with oil and gas operations, such as the reduction of time compared to manual analysis of the textual data, which could take several days, weeks, or even months to complete. The disclosed solution automates the analysis process to at least minimize the manual effort needed for reviewing the textual data. Advantageously, the disclosed solution also does not necessarily require any fine-tuning or training on any historical data with large foundational LLMs like GPT 3.5, 4, etc. As such, the step of training a model or building very specific regular expressions (regex), fuzzy logics, etc. in primitive rule-based approaches for a particular dataset is removed. Additionally, users do not have to generate their own prompts since curated domain prompts are used. Accordingly, the disclosure provides improvements in the computer technological area of textual analysis.
The disclosure provides a textual analysis system that includes an automated analyzer for reviewing oil and gas textual data and extracting event data therefrom using a LLM (or LLMs) and the curated domain prompt or prompts. A curated domain prompt is a statement having context within a particular sphere of knowledge or activity of an oil or gas operation. Accordingly, the curated domain prompts are directed to and are unique for a particular sphere of knowledge or activity of an oil or gas operation. The curated domain prompts are pre-established such that users do not have to generate prompts when reviewing the oil and gas textual data. Experts in various oil or gas operations can generate the curated domain prompts, which can then be placed in data storage, such as a data reservoir, database, or memory. Examples of domains in oil or gas operations include drilling, fracking, and health and safety, cementing, completions, work-over operations, coil tubing unit operations, slick line, production, exploration, reservoirs, etc. A domain or a status indicated in a curated domain prompt can be used to identify the context of the textual data to use for analysis. For instance, “pressure reading not observed” can be part of several of the above mentioned operations, however, a context might be needed to identify which type of operation is being discussed. This context can be brought in using the curated domain or status prompt.
A curated domain prompt as used herein collectively refers to a single curated domain prompt or a chain of domain prompts. In addition to being directed to a particular domain, a curated domain prompt can include an oil or gas operation event and at least one parameter associated with the oil and gas operation event. The curated domain prompt can also include formatting information for generating an event summary. The curated domain prompt can also be configured in a particular format wherein one or more portions of the prompt can be edited by a user and other portions are fixed and uneditable. A fixed portion can be designated as such based on expected use in a subsequent analysis step of the oil and gas textual data. For example, a chain of curated domain prompts may be used and a first one of the curated domain prompts can have fixed fields of information that is needed for a subsequent one of the curated domain prompts.
Event data is textual data that corresponds to a parameter associated with an oil and gas operation event. As indicated above, the parameter and the oil and gas operation event can be identified in a curated domain prompt. The event data can be operational parameters associated with the parameter. For example, the oil and gas operation event can be a downhole or sub-surface problem in a wellbore, the parameter can be a depth value within the wellbore, and the event data can be the pull force at the depths. After extracting the depth values where a downhole problem occurred via the LLM, the LLM can also generate an event summary corresponding to the curated domain prompt that correlates downhole problems in the wellbore to the depth values where the downhole problem occurred and the pull force at each of the operating parameters.
Oil and gas textual data is textual data associated with one or more oil or gas operations. The oil and gas textual data includes textual data generally associated with one or more of pre-operation, active operation, and post operation of a wellbore. The pre-operation data includes, for example, operating procedures, operation objectives, operation physics, operational guidelines, sequence guidelines having safety and personnel information, and associated published text, such as articles directed to oil and gas operations. The published text can be proprietary or public and can be directed to oil and gas operations in general or directed to a specific topic or geographical location. The published text can be weighted differently. For example, either the public or proprietary published text can be given more weight during analysis. The operating procedures can include standard operating procedures associated with a particular company and the operational guidelines can include a drilling program that provides details of specific drilling operations along with the safety protocols and contingency activities.
The active operation data includes textual data generated during an active well operation, such as drilling and fracking. The active operation data includes, for example, daily operating reports. The daily operating reports contain a summary of the daily activities for a well operation that provide valuable information about the issues observed and the remedial measures taken. Several daily reports may be studied in order to create a lessons report. Examples of daily operating reports (DOR) include daily drilling reports (DDR), daily mud reports, daily mud logging reports, daily geologist reports, and workover/CTU/production related operational reports. The active operation data can also include HSE reports that relate to the safety issues, accidents, and near-misses.
The HSE reports contain valuable information for future wells and can also be part of the post operation data. The post operation data can also include end of well reports and lessons learned reports. These are created at the end of wells drilling/workover cycle and provide a summary of the operations observed, issues observed, and lessons learned during the lifecycle of a well.
Each of the pre-operation, active operation, and post operation data can include communication text. A large amount of conversation can occur during the three operational phases; especially during the active operation. The conversations can be, for example, via texts, emails, or telephone calls. Each of the conversations can include essential operational details. The audio data from the telephone calls can be transcribed in order to convert the data in text format for analysis.
The information extracted from the above example data sources in text format can be automatically converted into meaningful information by the LLM incorporated with a curated domain prompt. The meaningful information can be saved in a structured format that provides an event summary, which can be easily retrieved or recalled (feedback loop) for offset analysis, historical investigation, real time tracking of standard operating procedures, etc.
In one example the textual analysis system includes a data reservoir that stores the textual data from the multiple data sources. The data reservoir can also store the curated domain prompts. The data reservoir can be implemented at a single storage location or can be distributed across multiple data storage locations. The data reservoir can be proprietary wherein access can be restricted to a single company, approved companies, or approved individuals. The data reservoir can include public and proprietary textual data.
The example textual analysis system also includes an automated analyzer for reviewing oil and gas textual data. The automated analyzer includes an interface configured to provide curated domain prompts to an LLM(s) for analyzing oil and gas textual data. The curated domain prompts (or chain of prompts) can be stored in a database or memory associated with the automated analyzer. The automated analyzer will automatically use a curated domain prompt (or a chain of prompts) from the stored curated domain prompts according to the type of oil and gas data being processed. The interface can also receive a custom request from a user that allows a user to input a specific type of information that is not included in a curated domain prompt. For example, the user can provide a custom request that limits an output of the textual analysis system to a particular region, company, well type, time period, etc. Accordingly, a user can customize an output via one or more custom requests. The custom request can be used with some optional input parameters which go to the prompt in an abstracted way from the user-editable portions of curated domain prompts. A user, however, does not need to provide any input. Advantageously, the automated analyzer can automatically use the appropriate curated domain prompt or chain of prompts according to the type of oil and gas textual data being processed. A user can provide the custom request, for example, when looking for a specific piece of information included in a curated domain prompt. The automated analyzer also includes one or more processors to perform operations that include providing event data from the oil and gas textual data using the LLM and the curated domain prompt.
In the drawings and descriptions that follow, like parts are typically marked throughout the specification and drawings with the same reference numerals, respectively. The drawn figures are not necessarily to scale. Certain features of the disclosure may be shown exaggerated in scale or in somewhat schematic form and some details of certain elements may not be shown in the interest of clarity and conciseness. The present disclosure may be implemented in embodiments of different forms.
The drilling system 100 includes a BHA 110 coupled to a drill string 120. The BHA 110 includes a drill bit 112, which can be moved axially within the wellbore 101. The system 100 is configured to drive the BHA 110 positioned or otherwise arranged at the bottom of the drill string 120 that is extended into the earth 102 from a derrick 130 arranged at the surface 104. The system 100 also includes a top drive 134 that is used to rotate the drill string 120 at the surface 104, which then rotates the drill bit 112 in the wellbore 101. Operation of the top drive 134 is controlled by a top drive controller (not shown). The system 100 also includes a kelly 136 and can include a traveling block (not shown) that is used to lower and raise the kelly 136 and drill string 120.
Fluid or “drilling mud” from a mud tank 140 may be pumped downhole using a mud pump 142 powered by an adjacent power source, such as a prime mover or motor 144. The drilling mud may be pumped from mud tank 140, through a stand pipe 146, which feeds the drilling mud into drill string 120 and conveys the same to the drill bit 112. The drilling mud exits one or more nozzles arranged in the drill bit 112 and in the process cools the drill bit 112. After exiting the drill bit 112, the mud circulates back to the surface 104 via the annulus defined between the wellbore 101 and the drill string 120, and in the process, returns drill cuttings and debris to the surface. The cuttings and mud mixture are passed through a flow line 148 and are processed such that a cleaned mud is returned downhole through the stand pipe 146 once again.
The system 100 also includes a well site controller 160, and a computing system 164, which can be communicatively coupled to well site controller 160. Well site controller 160 includes one or more processors and one or more memories and is configured to direct operation of the system 100 using the processors and memories. The well site controller 160 can direct the operation based on the analysis of oil and gas textual data as disclosed herein. The oil and gas textual data used by the well site controller 160 can be pre-operational, active operational, or a combination thereof. The oil and gas textual data can also include post operational data from one or more previous oil and gas operation. Computing system 164 can be a laptop, smartphone, personal digital assistant (PDA), server, desktop computer, cloud computing system, other computing systems, or a combination thereof, that are operable to perform the processes and methods described herein for operating the system 100. Well site operators, engineers, and other personnel can send and receive data, instructions, measurements, and other information by various conventional means with computing system 164 or well site controller 160. Well site controller 160 or computing system 164, can be utilized to communicate with downhole tools of the BHA 110, such as sending and receiving telemetry, data, drilling sensor data, instructions, and other information, including but not limited to collected or measured parameters, location within the borehole 101, and cuttings information. A communication channel may be established by using, for example, electrical signals or mud pulse telemetry for most of the length from the drill bit 112 to the controller 160.
As stated above, the drill bit 112 penetrates the earth 102 and thereby creates the wellbore 101. BHA 110 provides directional control of the drill bit 112 as it advances into the earth 102. A tool string 114 of the BHA 110 can include the downhole tools of the BHA 110. Accordingly, the tool string 114 can be semi-permanently mounted with various measurement tools (not shown) such as, but not limited to, MWD and LWD tools, that may be configured to take downhole measurements of drilling conditions and geological formation of the earth 102. The measurement tools can include sensors, such as magnetometers, accelerometers, gyroscopes, etc. As noted herein, the sensors can be used to detect the presence of harmful vibrations, such HFTO.
In addition to the sensor measurements that are obtained during drilling, textual data is also generated during the drilling process, which is referred to herein as active operational data. The active operational data can be generated by users located at the wellbore 101, such as engineers and operators, and stored in a digital format on the well site controller 160, computing system 164, or both. The active operational data can relate to the various components of the drilling, such as, operation of the BHA 110, composition of the drilling mud, circulation of the drilling mud, complications with the drill string 120, etc. Pre and post operational data is also associated with the drilling. The well site controller 160, the computing system 164, or a combination of both can be part of a textual analysis system. For example, the well site controller 160 or the computing system 164 can be part of a data reservoir that stores oil and gas textual data. The data reservoir can also store curated domain prompts. The data reservoir, or at least a portion thereof, can be located remotely from the wellbore 101, such as in a data center, and receive the active operational data via a communications network. One or more of the well site controller 160 or the computing system 164 can also include an automated analyzer such as disclosed herein and/or an offline LLM, such as a fine-tuned model. Instead of an offline LLM, the well site controller 160 or the computing system 164 can be communicatively coupled to an online LLM. Drilling of the wellbore 101 is used as an example of a well operation that occurs downhole.
In addition to the active operational data, the data reservoir can also store the pre-operational and post-operational data. Continuing to use drilling as an example of an oil and gas operation, the pre-operational data is generated before the drilling of wellbore 101 begins and the post operational drilling is generated after drilling of the wellbore 101. Other users besides those located at the wellbore 101 can be involved with the generation of the pre and post operational data.
The data reservoir 210 is data storage that is configured to store data, such as oil and gas textual data as represented by block 213. Additionally, the data reservoir 210 can store the curated domain prompts as represented by block 217. The oil and gas textual data 213 can be from multiple data sources and can cover various aspects of a well operation, such as pre, active, and post operation textual data as represented in
The automated analyzer 220 receives a curated domain prompt from the curated domain prompts 217 and corresponding oil and gas textual data from the oil and gas textual data 213, and cooperates with an LLM (or LLMs) to provide event data extracted from the oil and gas textual data. The textual analysis system 200 can send one or more curated domain prompts to both online and offline LLMs working together. For example, a chain of curated domain prompts can be used and the different curated domain prompts can be sent to one or more different LLMs, such as online, offline, and fine tuned LLMs.
The event summary correlates the event data to an oil and gas operation event and a parameter identified in the curated domain prompt. The automated analyzer 220 includes one or more interfaces represented by interface 222, one or more memories represented by memory 224, and one or more processors represented by processor 228. The interface 222 is a communication interface that receives the curated domain prompt and the oil and gas operation event provides the curated domain prompt and the oil and gas textual data to the LLM. As such, the interface 222 communicates with the data reservoir 210 to obtain the curated domain prompt and the oil and gas textual data for analysis and a LLM for performing the analysis. The interface 222 includes the necessary circuitry, software, or combination thereof to send and receive data. The interface 222 can be a conventional interface.
The memory 224 can store data and operating instructions that direct the operation of the processor 228. The memory 224 can include the necessary circuitry for storing data and the processor 228 can include the necessary computing circuitry for processing data. The operating instructions can be or can represent one or more algorithms directed to analyzing oil and gas textual data using one or more LLMs and one or more curated domain prompts. As illustrated in
The scheduler 410 controls processing the oil and gas textual data for analysis by the LLM 320. The scheduler 410 can, for example, extract, transform and load (ETL) the oil and gas textual data for processing jobs by the LLM 320. The scheduler 410 can automatically ETL the oil and gas textual data into the vector store 420 in real time. The scheduler 410 can ETL according to a predetermined schedule. In addition to the oil and gas textual data, the scheduler 410 can also receive the feedback data via the generated reports 430 and provide the feedback data to the LLM 320.
The vector store 420 encodes the oil and gas textual data, divides in it into chunks of smaller text, and saves it in a database associated with the LLM 320. Whenever the LLM 320 is prompted by a curated domain prompt, the vector store 420 matches the prompt to the most relevant chunk or chunks of text from the oil and gas textual data that has been encoded and stored, i.e., query-based data retrieval. For example, the curated domain prompt can be directed to the drilling domain and the vector store 420 may select a chunk of oil and gas textual data that include DDR. In another example, the curated domain prompt can be directed to health or safety and the vector store 420 can select a chunk of oil and gas textual data that includes HSE reports. Oil and gas textual data from multiple different sources can be selected. The curated domain prompt and the selected chunk of oil and gas textual data is sent to the LLM 320 for augmented generation of the event data 330.
As noted above, the curated domain prompts used for the query-based data retrieval are predefined prompts that are single prompts or prompt chains and can include reflexion. The curated domain prompts can be used to create agents for the LLM 320. The agents can be generated for the particular domains based on the curated domain prompts. For example, LLM agents can be generated for particular domains of drilling, fracking, and health and safety based on curated domain prompts for the domains of drilling, fracking and health and safety, respectively. A chain of curated domain prompts can be used to generate the agents.
The operations represented in
In step 510, a curated domain prompt is obtained that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event. The curated domain prompt can be received by a computer or computing system, such as an automated analyzer, from a data reservoir, such as data reservoir 210 or 310, via a communications interface.
Oil and gas textual data is obtained in step 520. The oil and gas textual data can be retrieved from a data reservoir, which can be the same data reservoir storing at least some of the curated domain prompts. An automated analyzer can obtain the oil and gas textual data from the data reservoir via a communications interface, such as interface 222. The oil and gas textual data, or at least a portion of the oil and gas textual data, can be retrieved from other locations besides the data reservoir. For example, the oil and gas textual data or a portion thereof can be retrieved directly from the source of the oil and gas textual data. The oil and gas textual data can also be retrieved from a vector store. An appropriate chunk of oil and gas textual data as identified by the curated domain prompt can be retrieved from a vector store, such as vector store 420, via query-based data retrieval.
In step 530, the curated domain prompt and the oil and gas textual data are provided to an LLM. The LLM can be an online or offline LLM. An interface, such as interface 222, can be used to communicate with the LLM. The curated domain prompt and the oil and gas textual data can be provided to the LLM via query-based data retrieval as noted in
The oil and gas textual data is filtered in step 540 based on the oil and gas operation event. The LLM performs the filtering using the oil and gas operation event identified in the curated domain prompt.
In step 550, the event data is automatically extracted from the filtered oil and gas textual data based on the oil and gas operation event and the parameter. The LLM can automatically extract the event data using augmented generation as noted in
An event summary is automatically generated in step 560 by the LLM that correlates the oil and gas operation event and the event data. Various examples of an event summary that can be generated are provided in the event summary 330 of
In step 570, an oil or gas operation is performed using the event summary. The event summary can be provided to an automated analyzer that provides the event summary to an oil and gas system for enacting or altering an oil or gas operation using the event summary. For example, the event summary can indicate a downhole problem, such as a tight pull, at certain depths of a wellbore and this information can be used when drilling a subsequent wellbore. The event summary can also be provided to an operator, such as via a display screen, for altering a fracking operation, which can be ongoing. The method 500 continues to step 580 and ends.
In
A portion of the above-described apparatus, systems or methods may be embodied in or performed by various analog or digital data processors, wherein the processors are programmed or store executable programs of sequences of software instructions to perform one or more of the steps of the methods. A processor may be, for example, a programmable logic device such as a programmable array logic (PAL), a generic array logic (GAL), a field programmable gate array (FPGA), or another type of computer processing device (CPD). The software instructions of such programs may represent algorithms and be encoded in machine-executable form on non-transitory digital data storage media, e.g., magnetic or optical disks, random-access memory (RAM), magnetic hard disks, flash memories, or read-only memory (ROM), to enable various types of digital data processors or computers to perform one, multiple or all of the steps of one or more of the above-described methods, or functions, systems or apparatuses described herein.
Portions of disclosed examples or aspects may relate to computer storage products with a non-transitory computer-readable medium that has program code thereon for performing various computer-implemented operations that embody a part of an apparatus, device or carry out the steps of a method set forth herein. Non-transitory used herein refers to all computer-readable media except for transitory, propagating signals. Examples of non-transitory computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floppy disks; and hardware devices that are specially configured to store and execute program code, such as ROM and RAM devices. Configured or configured to means, for example, designed, constructed, or programmed, with the necessary logic or features for performing a task or tasks. Examples of program code include both machine code, such as produced by a compiler, and files containing higher-level code that may be executed by the computer using an interpreter.
In interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced.
Those skilled in the art to which this application relates will appreciate that other and further additions, deletions, substitutions, and modifications may be made to the described aspects. It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the claims. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, a limited number of the exemplary methods and materials are described herein.
Unless otherwise specified, use of the terms “connect,” “engage,” “couple,” “attach,” or any other like term describing an interaction between elements is not meant to limit the interaction to direct interaction between the elements and may also include indirect interaction between the elements described. Unless otherwise specified, use of the terms “up,” “upper,” “upward,” “uphole,” “upstream,” or other like terms shall be construed as generally away from the bottom, terminal end of a well, regardless of the wellbore orientation; likewise, use of the terms “down,” “lower,” “downward,” “downhole,” “downstream,” or other like terms shall be construed as generally toward the bottom, terminal end of a well, regardless of the wellbore orientation. Use of any one or more of the foregoing terms shall not be construed as denoting positions along a perfectly vertical axis. In some instances, a part near the end of the well can be horizontal or even slightly directed upwards. Unless otherwise specified, use of the term “subterranean formation” shall be construed as encompassing both areas below exposed earth and areas below earth covered by water such as ocean or fresh water.
The disclosure provides the following aspects:
Each of the aspects A, B, and C can have one or more of the following additional elements in combination: Element 1: wherein the extracting includes filtering the oil and gas textual data based on the oil and gas operation event and automatically extracting the event data from the filtered oil and gas textual data. Element 2: wherein the event summary is in a structured format. Element 3: wherein the oil and gas textual data includes unstructured text. Element 4: wherein the LLM is an online LLM. Element 5: wherein the LLM is an offline LLM. Element 6: wherein the offline LLM is a fine-tuned model. Element 7: wherein the directing includes sending the curated domain prompt and at least a portion of the oil and gas textual data to the LLM. Element 8 further comprising filtering the oil and gas textual data based on the oil and gas operation event, wherein the automatically extracting the event data is from the filtered oil and gas textual data. Element 9: wherein the event summary includes the parameters, the event data associated with the parameters, and the oil and gas operation event for each of the parameters. Element 10: wherein the oil and gas textual data includes pre-operation, active operation, and post operation textual data for the oil and gas operation event. Element 11: wherein the LLM is an offline LLM. Element 12: wherein the oil and gas textual data includes unstructured text. Element 13: wherein the automatically extracting includes augmented generation by the LLM. Element 14: wherein the automatically generating includes generating the event summary into a structured format identified in the domain prompt. Element 15: further comprising obtaining the oil and gas textual data and the curated domain prompt from a data reservoir. Element 16: wherein the instructing includes sending the curated domain prompt and the oil and gas textual data to the LLM. Element 17: wherein the operations further include sending the event summary to a well operation system for enacting or altering a well operation.