SYSTEM AND METHOD USING LARGE LANGUAGE MODELS FOR THE ANALYSIS OF TEXTUAL DATA ASSOCIATED WITH OIL AND GAS OPERATIONS

Information

  • Patent Application
  • 20250188823
  • Publication Number
    20250188823
  • Date Filed
    December 07, 2023
    2 years ago
  • Date Published
    June 12, 2025
    7 months ago
Abstract
The disclosure provides automated analysis of oil and gas textual data that uses one or more LLMs and designed prompts or prompt chains. The prompts, referred to a curated domain prompts, use oil and gas domain knowledge to simulate human thinking and analysis. The curated domain prompts are pre-configured such that users do not need to create prompts for analyzing the textual data. In one example, a method of automatically analyzing oil and gas textual data, includes: (1) obtaining a curated domain prompt that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event, (2) automatically extracting, using a large language model (LLM), event data from oil and gas textual data based on the oil and gas operation event and the parameter, and (3) automatically generating an event summary that correlates the oil and gas operation event and the event data.
Description
TECHNICAL FIELD

The disclosure generally relates to oil and gas operations and, more specifically, to improving oil and gas operations, such as the retrieval of hydrocarbons, via the analysis of textual data.


BACKGROUND

Accessing hydrocarbon reserves, such as gas or oil reserves, is an example of an oil and gas operation that involves creating a wellbore by drilling into the earth using a drill bit. There are several aspects of the entire process of accessing the reserves that involve operations and actions before and after the drilling, including planning and analyzing. Often the actions or activities involve the exchange of data that can differ in variety, volume, velocity, veracity, and value. A large amount of the data is represented and communicated in the form of numbers. To name a few examples, depth, pressure, weight, tension, torque, revolutions per minute (RPM), and flow rate are the type of data that can be represented by numbers. Apart from the numerical data, there is also a large amount of information associated with operations that is represented and shared in the form of text.





BRIEF DESCRIPTION

Reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:



FIG. 1 illustrates an example of an oil and gas system that performs an operation that benefits from the analysis of oil and gas textual data carried out according to the principles of the disclosure;



FIG. 2 is a block diagram of an example of a textual analysis system constructed according to the principles of the disclosure;



FIG. 3 is a diagram that demonstrates an example of the interaction of a data reservoir, LLM (or LLMs), and event summary that is extracted from the oil and gas textual data of the data reservoir via the LLM;



FIG. 4 is a more detailed diagram demonstrating the interaction of the data reservoir with the LLM and the generation of the event summary;



FIG. 5 illustrates a flow diagram of an example method of automatically analyzing oil and gas textual data carried out according to the principles of the disclosure;



FIG. 6 illustrates an example workflow of filtering of oil and gas textual data using DDR as the oil and gas textual data, which is carried out according to the principles of the disclosure;



FIG. 7 illustrates an example of a workflow of zero-shot classification by an LLM carried out according to the principles of the disclosure;



FIG. 8 illustrates an example of workflow using a filtered DDR in the analysis process according to the principles of the disclosure;



FIG. 9 illustrates an example workflow using both a filtered DDR and an event summary in the analysis process according to the principles of the disclosure;



FIG. 10 illustrates an example workflow of automated generation of an event summary carried out according to the principles of the disclosure; and



FIG. 11 illustrates an example of a curated domain prompt that can be used to obtain the event summary of FIG. 9.





DETAILED DESCRIPTION

As mentioned apart from the numerical data, there is also a large amount of information associated with operations that is represented and shared in the form of text. The textual data includes, for example, standard operating procedures, drilling operation guidelines, daily operational reports (DOR), health and safety executive (HSE) reports, end of well (EOW) reports, transcribed phone calls, chats, emails, etc. Analyzing and obtaining usable information from the unstructured text data is a challenge that requires manual review over days, weeks, or months. While some rule-based solutions like fuzzy-string matching, regex matching, etc., exists for automated text-based analysis and information extraction, the rules might need to be conditioned according to a dataset to be applied, which only a professional programmer/software engineer is able to do. Additionally, if any calibrations or modifications are needed for existing solutions, such as natural language processing (NLP), fuzzy logic, regex modes, n-grams, etc., changes to the corresponding code of the program/software are typically required.


As such, the disclosure provides a system and method for automated analysis of textual data associated with oil and gas operations using Large Language Models (LLMs) and carefully designed prompts or prompt chains using domain knowledge to simulate human thinking and analysis. The oil and gas operation textual data can be structured data, unstructured data, or a combination of both. The LLMs can either be online models, such as GPT 3.5/4 from OpenAI/AzureOpenAI, etc., or offline models, such as Llama-2, Dolly, etc. The offline LLMs may be fine-tuned models and can be located on edge devices using smaller and more efficient quantized version of the models. In contrast to existing textual analysis systems noted above, adjustments or calibrations to the datasets in the disclosed system are based on pre-configured domain-based prompts, referred to herein as curated domain prompts, which are all in natural language using domain expertise, keywords, and explanations that do not require a programmer/software developer. Instead, a domain person can develop and save the curated domain prompts to direct the analysis of the oil and gas textual data. As such, prototyping of ideas and scenarios can be quickly enabled compared to primitive NLP approaches presently being used.


The disclosed solution provides several advantages for analyzing textual data associated with oil and gas operations, such as the reduction of time compared to manual analysis of the textual data, which could take several days, weeks, or even months to complete. The disclosed solution automates the analysis process to at least minimize the manual effort needed for reviewing the textual data. Advantageously, the disclosed solution also does not necessarily require any fine-tuning or training on any historical data with large foundational LLMs like GPT 3.5, 4, etc. As such, the step of training a model or building very specific regular expressions (regex), fuzzy logics, etc. in primitive rule-based approaches for a particular dataset is removed. Additionally, users do not have to generate their own prompts since curated domain prompts are used. Accordingly, the disclosure provides improvements in the computer technological area of textual analysis.


The disclosure provides a textual analysis system that includes an automated analyzer for reviewing oil and gas textual data and extracting event data therefrom using a LLM (or LLMs) and the curated domain prompt or prompts. A curated domain prompt is a statement having context within a particular sphere of knowledge or activity of an oil or gas operation. Accordingly, the curated domain prompts are directed to and are unique for a particular sphere of knowledge or activity of an oil or gas operation. The curated domain prompts are pre-established such that users do not have to generate prompts when reviewing the oil and gas textual data. Experts in various oil or gas operations can generate the curated domain prompts, which can then be placed in data storage, such as a data reservoir, database, or memory. Examples of domains in oil or gas operations include drilling, fracking, and health and safety, cementing, completions, work-over operations, coil tubing unit operations, slick line, production, exploration, reservoirs, etc. A domain or a status indicated in a curated domain prompt can be used to identify the context of the textual data to use for analysis. For instance, “pressure reading not observed” can be part of several of the above mentioned operations, however, a context might be needed to identify which type of operation is being discussed. This context can be brought in using the curated domain or status prompt.


A curated domain prompt as used herein collectively refers to a single curated domain prompt or a chain of domain prompts. In addition to being directed to a particular domain, a curated domain prompt can include an oil or gas operation event and at least one parameter associated with the oil and gas operation event. The curated domain prompt can also include formatting information for generating an event summary. The curated domain prompt can also be configured in a particular format wherein one or more portions of the prompt can be edited by a user and other portions are fixed and uneditable. A fixed portion can be designated as such based on expected use in a subsequent analysis step of the oil and gas textual data. For example, a chain of curated domain prompts may be used and a first one of the curated domain prompts can have fixed fields of information that is needed for a subsequent one of the curated domain prompts.


Event data is textual data that corresponds to a parameter associated with an oil and gas operation event. As indicated above, the parameter and the oil and gas operation event can be identified in a curated domain prompt. The event data can be operational parameters associated with the parameter. For example, the oil and gas operation event can be a downhole or sub-surface problem in a wellbore, the parameter can be a depth value within the wellbore, and the event data can be the pull force at the depths. After extracting the depth values where a downhole problem occurred via the LLM, the LLM can also generate an event summary corresponding to the curated domain prompt that correlates downhole problems in the wellbore to the depth values where the downhole problem occurred and the pull force at each of the operating parameters. FIGS. 6 and 10 illustrate examples of an event summary 630 and 1030, respectively.


Oil and gas textual data is textual data associated with one or more oil or gas operations. The oil and gas textual data includes textual data generally associated with one or more of pre-operation, active operation, and post operation of a wellbore. The pre-operation data includes, for example, operating procedures, operation objectives, operation physics, operational guidelines, sequence guidelines having safety and personnel information, and associated published text, such as articles directed to oil and gas operations. The published text can be proprietary or public and can be directed to oil and gas operations in general or directed to a specific topic or geographical location. The published text can be weighted differently. For example, either the public or proprietary published text can be given more weight during analysis. The operating procedures can include standard operating procedures associated with a particular company and the operational guidelines can include a drilling program that provides details of specific drilling operations along with the safety protocols and contingency activities.


The active operation data includes textual data generated during an active well operation, such as drilling and fracking. The active operation data includes, for example, daily operating reports. The daily operating reports contain a summary of the daily activities for a well operation that provide valuable information about the issues observed and the remedial measures taken. Several daily reports may be studied in order to create a lessons report. Examples of daily operating reports (DOR) include daily drilling reports (DDR), daily mud reports, daily mud logging reports, daily geologist reports, and workover/CTU/production related operational reports. The active operation data can also include HSE reports that relate to the safety issues, accidents, and near-misses.


The HSE reports contain valuable information for future wells and can also be part of the post operation data. The post operation data can also include end of well reports and lessons learned reports. These are created at the end of wells drilling/workover cycle and provide a summary of the operations observed, issues observed, and lessons learned during the lifecycle of a well.


Each of the pre-operation, active operation, and post operation data can include communication text. A large amount of conversation can occur during the three operational phases; especially during the active operation. The conversations can be, for example, via texts, emails, or telephone calls. Each of the conversations can include essential operational details. The audio data from the telephone calls can be transcribed in order to convert the data in text format for analysis.


The information extracted from the above example data sources in text format can be automatically converted into meaningful information by the LLM incorporated with a curated domain prompt. The meaningful information can be saved in a structured format that provides an event summary, which can be easily retrieved or recalled (feedback loop) for offset analysis, historical investigation, real time tracking of standard operating procedures, etc.


In one example the textual analysis system includes a data reservoir that stores the textual data from the multiple data sources. The data reservoir can also store the curated domain prompts. The data reservoir can be implemented at a single storage location or can be distributed across multiple data storage locations. The data reservoir can be proprietary wherein access can be restricted to a single company, approved companies, or approved individuals. The data reservoir can include public and proprietary textual data.


The example textual analysis system also includes an automated analyzer for reviewing oil and gas textual data. The automated analyzer includes an interface configured to provide curated domain prompts to an LLM(s) for analyzing oil and gas textual data. The curated domain prompts (or chain of prompts) can be stored in a database or memory associated with the automated analyzer. The automated analyzer will automatically use a curated domain prompt (or a chain of prompts) from the stored curated domain prompts according to the type of oil and gas data being processed. The interface can also receive a custom request from a user that allows a user to input a specific type of information that is not included in a curated domain prompt. For example, the user can provide a custom request that limits an output of the textual analysis system to a particular region, company, well type, time period, etc. Accordingly, a user can customize an output via one or more custom requests. The custom request can be used with some optional input parameters which go to the prompt in an abstracted way from the user-editable portions of curated domain prompts. A user, however, does not need to provide any input. Advantageously, the automated analyzer can automatically use the appropriate curated domain prompt or chain of prompts according to the type of oil and gas textual data being processed. A user can provide the custom request, for example, when looking for a specific piece of information included in a curated domain prompt. The automated analyzer also includes one or more processors to perform operations that include providing event data from the oil and gas textual data using the LLM and the curated domain prompt.


In the drawings and descriptions that follow, like parts are typically marked throughout the specification and drawings with the same reference numerals, respectively. The drawn figures are not necessarily to scale. Certain features of the disclosure may be shown exaggerated in scale or in somewhat schematic form and some details of certain elements may not be shown in the interest of clarity and conciseness. The present disclosure may be implemented in embodiments of different forms.



FIG. 1 illustrates an example of an oil and gas system that performs an operation that benefits from the analysis of oil and gas textual data carried out according to the principles of the disclosure. The oil and gas system of FIG. 1 is a drilling system 100 configured to perform formation drilling to create a wellbore 101. The drilling is an example of an oil and gas operation occurring at a wellbore. Other examples occurring at a wellbore include fracking, cementing, completions, work-over, slick line, etc. The system 100 can be, for example, a logging-while-drilling (LWD) system or a measurement-while-drilling (MWD) system. FIG. 1 depicts an onshore operation. Those skilled in the art will understand that the disclosure is equally well suited for use in offshore operations or onshore operations over a body of water.


The drilling system 100 includes a BHA 110 coupled to a drill string 120. The BHA 110 includes a drill bit 112, which can be moved axially within the wellbore 101. The system 100 is configured to drive the BHA 110 positioned or otherwise arranged at the bottom of the drill string 120 that is extended into the earth 102 from a derrick 130 arranged at the surface 104. The system 100 also includes a top drive 134 that is used to rotate the drill string 120 at the surface 104, which then rotates the drill bit 112 in the wellbore 101. Operation of the top drive 134 is controlled by a top drive controller (not shown). The system 100 also includes a kelly 136 and can include a traveling block (not shown) that is used to lower and raise the kelly 136 and drill string 120.


Fluid or “drilling mud” from a mud tank 140 may be pumped downhole using a mud pump 142 powered by an adjacent power source, such as a prime mover or motor 144. The drilling mud may be pumped from mud tank 140, through a stand pipe 146, which feeds the drilling mud into drill string 120 and conveys the same to the drill bit 112. The drilling mud exits one or more nozzles arranged in the drill bit 112 and in the process cools the drill bit 112. After exiting the drill bit 112, the mud circulates back to the surface 104 via the annulus defined between the wellbore 101 and the drill string 120, and in the process, returns drill cuttings and debris to the surface. The cuttings and mud mixture are passed through a flow line 148 and are processed such that a cleaned mud is returned downhole through the stand pipe 146 once again.


The system 100 also includes a well site controller 160, and a computing system 164, which can be communicatively coupled to well site controller 160. Well site controller 160 includes one or more processors and one or more memories and is configured to direct operation of the system 100 using the processors and memories. The well site controller 160 can direct the operation based on the analysis of oil and gas textual data as disclosed herein. The oil and gas textual data used by the well site controller 160 can be pre-operational, active operational, or a combination thereof. The oil and gas textual data can also include post operational data from one or more previous oil and gas operation. Computing system 164 can be a laptop, smartphone, personal digital assistant (PDA), server, desktop computer, cloud computing system, other computing systems, or a combination thereof, that are operable to perform the processes and methods described herein for operating the system 100. Well site operators, engineers, and other personnel can send and receive data, instructions, measurements, and other information by various conventional means with computing system 164 or well site controller 160. Well site controller 160 or computing system 164, can be utilized to communicate with downhole tools of the BHA 110, such as sending and receiving telemetry, data, drilling sensor data, instructions, and other information, including but not limited to collected or measured parameters, location within the borehole 101, and cuttings information. A communication channel may be established by using, for example, electrical signals or mud pulse telemetry for most of the length from the drill bit 112 to the controller 160.


As stated above, the drill bit 112 penetrates the earth 102 and thereby creates the wellbore 101. BHA 110 provides directional control of the drill bit 112 as it advances into the earth 102. A tool string 114 of the BHA 110 can include the downhole tools of the BHA 110. Accordingly, the tool string 114 can be semi-permanently mounted with various measurement tools (not shown) such as, but not limited to, MWD and LWD tools, that may be configured to take downhole measurements of drilling conditions and geological formation of the earth 102. The measurement tools can include sensors, such as magnetometers, accelerometers, gyroscopes, etc. As noted herein, the sensors can be used to detect the presence of harmful vibrations, such HFTO.


In addition to the sensor measurements that are obtained during drilling, textual data is also generated during the drilling process, which is referred to herein as active operational data. The active operational data can be generated by users located at the wellbore 101, such as engineers and operators, and stored in a digital format on the well site controller 160, computing system 164, or both. The active operational data can relate to the various components of the drilling, such as, operation of the BHA 110, composition of the drilling mud, circulation of the drilling mud, complications with the drill string 120, etc. Pre and post operational data is also associated with the drilling. The well site controller 160, the computing system 164, or a combination of both can be part of a textual analysis system. For example, the well site controller 160 or the computing system 164 can be part of a data reservoir that stores oil and gas textual data. The data reservoir can also store curated domain prompts. The data reservoir, or at least a portion thereof, can be located remotely from the wellbore 101, such as in a data center, and receive the active operational data via a communications network. One or more of the well site controller 160 or the computing system 164 can also include an automated analyzer such as disclosed herein and/or an offline LLM, such as a fine-tuned model. Instead of an offline LLM, the well site controller 160 or the computing system 164 can be communicatively coupled to an online LLM. Drilling of the wellbore 101 is used as an example of a well operation that occurs downhole.


In addition to the active operational data, the data reservoir can also store the pre-operational and post-operational data. Continuing to use drilling as an example of an oil and gas operation, the pre-operational data is generated before the drilling of wellbore 101 begins and the post operational drilling is generated after drilling of the wellbore 101. Other users besides those located at the wellbore 101 can be involved with the generation of the pre and post operational data.



FIG. 2 is a block diagram of an example of a textual analysis system 200 constructed according to the principles of the disclosure. The textual analysis system 200 is a computing system that includes a data reservoir 210 and an automated analyzer 220. The data reservoir 210 and the automated analyzer 220 can be located proximate each other or remotely located from each other. Regardless, the data reservoir 210 and the automated analyzer 220 can be communicatively coupled together via a conventional connection.


The data reservoir 210 is data storage that is configured to store data, such as oil and gas textual data as represented by block 213. Additionally, the data reservoir 210 can store the curated domain prompts as represented by block 217. The oil and gas textual data 213 can be from multiple data sources and can cover various aspects of a well operation, such as pre, active, and post operation textual data as represented in FIG. 2. The curated domain prompts 217 are pre-configured domain prompts that are stored and used for the analysis of the oil and gas textual data 213. The data reservoir 210 can be one or more memories or data storage devices. The data reservoir 210 can be implemented in a data center or on a computing system such as computing system 164 of FIG. 1. The oil and gas textual data 213 and the curated domain prompts 217 can be uploaded to the data reservoir 210 or stored on the data reservoir 210 via another conventional means.


The automated analyzer 220 receives a curated domain prompt from the curated domain prompts 217 and corresponding oil and gas textual data from the oil and gas textual data 213, and cooperates with an LLM (or LLMs) to provide event data extracted from the oil and gas textual data. The textual analysis system 200 can send one or more curated domain prompts to both online and offline LLMs working together. For example, a chain of curated domain prompts can be used and the different curated domain prompts can be sent to one or more different LLMs, such as online, offline, and fine tuned LLMs.


The event summary correlates the event data to an oil and gas operation event and a parameter identified in the curated domain prompt. The automated analyzer 220 includes one or more interfaces represented by interface 222, one or more memories represented by memory 224, and one or more processors represented by processor 228. The interface 222 is a communication interface that receives the curated domain prompt and the oil and gas operation event provides the curated domain prompt and the oil and gas textual data to the LLM. As such, the interface 222 communicates with the data reservoir 210 to obtain the curated domain prompt and the oil and gas textual data for analysis and a LLM for performing the analysis. The interface 222 includes the necessary circuitry, software, or combination thereof to send and receive data. The interface 222 can be a conventional interface.


The memory 224 can store data and operating instructions that direct the operation of the processor 228. The memory 224 can include the necessary circuitry for storing data and the processor 228 can include the necessary computing circuitry for processing data. The operating instructions can be or can represent one or more algorithms directed to analyzing oil and gas textual data using one or more LLMs and one or more curated domain prompts. As illustrated in FIG. 2, one or more of the curated domain prompts can be stored on the memory 224, represented by curated domain prompts 226, instead of or in addition to the curated domain prompts 217 on the data reservoir. The processor 228 cooperates with the memory 224 to direct the analysis using the LLM. The processor 228 can direct the LLM analysis by obtaining and sending the curated domain prompt and the oil and gas textual data to the LLM for processing. The processor 228 can alter or modify editable fields of the curated domain prompt based on one or more custom requests received from a user via the interface 222. As such, the interface 222 can include a user interface, such as a keyboard, keypad, touchscreen, speaker, etc. The processor 228 is also configured to receive the event summary from the LLM and provide the event summary for use in an oil or gas operation, such as a drilling operation performed by system 100. The processor 228 can provide additional processing of the received event summary from the LLM before sending the event summary as an output for use in an oil and gas operation. For example, the processor 228 can aggregate data of the received event summary, add additional information related to the data of the received event summary, remove some of the data of the received event summary, rearrange the data into another format, etc. The processor 228 can also modify the event summary based on one or more custom requests received from the user. The event summary, as received from the LLM or modified by the automated analyzer 220, can be generated in a structured format and provided as an output for further analysis. The structured format can be, for example, CSV or Excel. The further analysis can be manual or automated and can be used for an existing operation or for future operations, such as for existing or future well operations. For example, the event summary can be provided to a well operator via a display screen and the well operator can perform or alter a well operation, such as fracking or drilling, based on the event summary. The event summary can also be in the domain of safety and be used to change operating procedures at a well site based on HSE reports that were analyzed. From the event summary, a recommendation can be provided for the oil or gas operation. Another example of the event summary can be recommendation on the drilling fluid due to tight pull observed. This may have details of the chemical treatment required for the drilling fluid to ease out the tight spots. The interaction of a LLM with an automated analyzer, such as automated analyzer 220, and examples of the different type of event summaries that can be generated and provided as outputs to an automated analyzer are illustrated in FIG. 3. The LLM of FIGS. 2 and 3 can be an online or offline LLM.



FIG. 3 is a diagram 300 that demonstrates an example of the interaction of a data reservoir 310, LLM (or LLMs) 320, and event summary 330 that is extracted from the oil and gas textual data of the data reservoir 310 via the LLM 320. An automated analyzer, such as automated analyzer 220, can coordinate the interaction thereof. The data reservoir 310, for example, can be the data reservoir 210 of FIG. 2. The data reservoir 310 includes non-limiting examples of the various types of oil and gas textual data and the event summary 330 illustrates non-limiting examples of event summaries that can be generated as outputs and provided to an automated analyzer. In addition to the example event summaries that can be provided as the event summary 330 to an automated analyzer, as illustrated in FIG. 3 the event summary 330 can be provided to the LLM model 320 for feedback. Accordingly, the analysis provided by the LLM model 320 can improve using information from the present event summary 330 that is generated by the LLM 320. An automated analyzer that receives the event summary 330 can modify the received event summary 330 before being provided as an output for use in an oil and gas operation. The example event summaries include identified operation-related descriptions, identified personnel and equipment related events, identified and categorized non-productive time (NPT), identified lost time, identified HSE issues, identified correct activity codes, an automated offset well analysis.



FIG. 4 is a more detailed diagram 400 demonstrating the interaction of the data reservoir 310 with the LLM 320 and the generation of the event summary 330. Diagram 400 includes a scheduler 410 and a vector store 420 that are used to provide the oil and gas textual data to the LLM 320 for analysis. Diagram 400 also includes generated reports 430 that are generated based of the event summary 330. The generated reports 430 can be generated by an automated analyzer, such as automated analyzer 220, and provided to the LLM 320 for feedback. The generated reports 430 represent aggregated information from past analysis that can be combined with the event summary 330 of the present analysis as feedback. The generated reports 430 can be saved in various formats, such as tables, databases, and text files, and provided to the scheduler 410 for processing by the LLM 320.


The scheduler 410 controls processing the oil and gas textual data for analysis by the LLM 320. The scheduler 410 can, for example, extract, transform and load (ETL) the oil and gas textual data for processing jobs by the LLM 320. The scheduler 410 can automatically ETL the oil and gas textual data into the vector store 420 in real time. The scheduler 410 can ETL according to a predetermined schedule. In addition to the oil and gas textual data, the scheduler 410 can also receive the feedback data via the generated reports 430 and provide the feedback data to the LLM 320.


The vector store 420 encodes the oil and gas textual data, divides in it into chunks of smaller text, and saves it in a database associated with the LLM 320. Whenever the LLM 320 is prompted by a curated domain prompt, the vector store 420 matches the prompt to the most relevant chunk or chunks of text from the oil and gas textual data that has been encoded and stored, i.e., query-based data retrieval. For example, the curated domain prompt can be directed to the drilling domain and the vector store 420 may select a chunk of oil and gas textual data that include DDR. In another example, the curated domain prompt can be directed to health or safety and the vector store 420 can select a chunk of oil and gas textual data that includes HSE reports. Oil and gas textual data from multiple different sources can be selected. The curated domain prompt and the selected chunk of oil and gas textual data is sent to the LLM 320 for augmented generation of the event data 330.


As noted above, the curated domain prompts used for the query-based data retrieval are predefined prompts that are single prompts or prompt chains and can include reflexion. The curated domain prompts can be used to create agents for the LLM 320. The agents can be generated for the particular domains based on the curated domain prompts. For example, LLM agents can be generated for particular domains of drilling, fracking, and health and safety based on curated domain prompts for the domains of drilling, fracking and health and safety, respectively. A chain of curated domain prompts can be used to generate the agents.


The operations represented in FIG. 4 can occur automatically. As such, a user can obtain one of more event summary 330 without the user generating a prompt for the LLM 320. Instead, the curated domain prompts can be obtained and the event summary 330 can be automatically generated. The operations of FIG. 4 can be automated according to a schedule. An automated analyzer, such as via processor 228, can provide the schedule according to a desired frequency. The schedule can correspond to the ETL of the scheduler 410. A user can provide a custom request with the curated domain prompt, such as via the automated analyzer 220, for a customized event summary 330.



FIG. 5 illustrates a flow diagram of an example method 500 of automatically analyzing oil and gas textual data carried out according to the principles of the disclosure. At least a portion of the method 500 can be carried out by a computing system, such as a textual analysis system having an automated analyzer as disclosed herein. At least a portion of method 500 can be performed by an automated analyzer, such as automated analyzer 220. The various steps of method 500 can correspond to one or more algorithms used to direct the operation of the textual analysis system or a part of the textual analysis system, such as the automated analyzer or one or more processors thereof. The method 500 begins in step 505.


In step 510, a curated domain prompt is obtained that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event. The curated domain prompt can be received by a computer or computing system, such as an automated analyzer, from a data reservoir, such as data reservoir 210 or 310, via a communications interface.


Oil and gas textual data is obtained in step 520. The oil and gas textual data can be retrieved from a data reservoir, which can be the same data reservoir storing at least some of the curated domain prompts. An automated analyzer can obtain the oil and gas textual data from the data reservoir via a communications interface, such as interface 222. The oil and gas textual data, or at least a portion of the oil and gas textual data, can be retrieved from other locations besides the data reservoir. For example, the oil and gas textual data or a portion thereof can be retrieved directly from the source of the oil and gas textual data. The oil and gas textual data can also be retrieved from a vector store. An appropriate chunk of oil and gas textual data as identified by the curated domain prompt can be retrieved from a vector store, such as vector store 420, via query-based data retrieval.


In step 530, the curated domain prompt and the oil and gas textual data are provided to an LLM. The LLM can be an online or offline LLM. An interface, such as interface 222, can be used to communicate with the LLM. The curated domain prompt and the oil and gas textual data can be provided to the LLM via query-based data retrieval as noted in FIG. 4. The curated domain prompt and oil and gas textual data can be provided to more than one LLM. The curated domain prompt and oil and gas textual data can be provided to an online LLM and an offline LLM, wherein the results from one LLM can be provided as feedback to the other LLM. Additionally, different curated domain prompts can be used with the different LLMs, such as when using a chain of prompts. For example, a first curated domain prompt can be sent to a first LLM that is offline and subsequent, different curated domain prompt (or prompts) can be sent to an online LLM when using a chain of curated domain prompts. FIGS. 7, 8, and 9 represent using a chain of different curated domain prompts wherein the analysis is performed in steps. For the first step of FIG. 7, a curated domain prompt can be sent to an offline LLM. For the second step of FIG. 8, the curated domain prompt can be sent to the same offline LLM, a different offline LLM, or an online LLM. Similarly, the third step of FIG. 9 can be sent to one of the previous offline LLMs, a previous online LLM, or a different offline LLM or online LLM. One or more of the LLMs can be a fine-tuned LLM. An automated analyzer can direct the delivery of the one or more curated domain prompts to the different LLMs.


The oil and gas textual data is filtered in step 540 based on the oil and gas operation event. The LLM performs the filtering using the oil and gas operation event identified in the curated domain prompt. FIG. 6 provides an example of filtering using DDR as the oil and gas textual data. Step 510, 520, 530, and 540 can each be automatically performed as indicated with respect to FIG. 4.


In step 550, the event data is automatically extracted from the filtered oil and gas textual data based on the oil and gas operation event and the parameter. The LLM can automatically extract the event data using augmented generation as noted in FIG. 4. The LLM can extract the event data directly from the oil and gas textual data without filtering the oil and gas textual data as performed in step 540. FIG. 6 provides an example of extracting using depth as the parameter.


An event summary is automatically generated in step 560 by the LLM that correlates the oil and gas operation event and the event data. Various examples of an event summary that can be generated are provided in the event summary 330 of FIGS. 3 and 4. FIG. 6 provides an example of extracting from filtered data using a downhole problem as the oil and gas operation event.


In step 570, an oil or gas operation is performed using the event summary. The event summary can be provided to an automated analyzer that provides the event summary to an oil and gas system for enacting or altering an oil or gas operation using the event summary. For example, the event summary can indicate a downhole problem, such as a tight pull, at certain depths of a wellbore and this information can be used when drilling a subsequent wellbore. The event summary can also be provided to an operator, such as via a display screen, for altering a fracking operation, which can be ongoing. The method 500 continues to step 580 and ends.



FIGS. 6 to 10 illustrates examples of automatically processing oil and gas textual data using DDRs for the oil and gas textual data, depth as a parameter, and downhole problems as the oil and gas operation event. FIGS. 6 to 10 illustrate workflows for processing the DDRs after receiving a curated domain prompt and using an LLM carried out according to the principles of the disclosure. Receiving the curated domain prompt and directing the LLM can be via an automated analyzer such as disclosed herein. The workflows in FIGS. 6 to 10, or at least a portion thereof, represent one or more algorithms for automatically processing oil and gas textual data using one or more curated domain prompts. FIG. 6 illustrates a workflow that begins with step 610. In step 610, the DDRs are filtered by the LLM based on a required category of interest, which is an oil and gas operation event identified by a certain curated domain prompt generated, for example, for the specific analysis of DDRs. In step 620, depths corresponding to downhole problems are extracted by the LLM from the filtered oil and gas textual data. In addition to the depths, operational parameters associated with the downhole problem at the depths can be extracted. In step 630, an event summary is generated using the extracted depths. The event summary includes the depth, the downhole problem, and the associated operational parameter. FIGS. 7, 8, and 9 provide more detailed examples of processing DDRs according to ordered steps, wherein the workflows of each of FIGS. 7, 8, and 9 represent a step. Different curated domain prompts would be used for the different steps, which are referred to as a chain of curated domain prompts.



FIG. 7 illustrates an example of a workflow 700 of zero-shot classification by the LLM. In workflow 700, a single DDR record is presented in step 710. In step 720, the text of the DDR 710 is analyzed by the LLM. As indicated in FIG. 7, the LLM can process the DDR one row at a time. In step 730, an event summary is generated that classifies oil and gas operation events using extracted data obtained from the analysis of step 720. In this example, the oil and gas operation event is downhole problems during drilling. A domain prompt for workflow 700 would be a prompt (including relevant domain knowledge, keywords etc) which asks the LLM to return a Yes/No response based on if the textual data (DDR) contains any downhole problems reported.


In FIG. 7, the DDR record 710 is not filtered. FIG. 8 illustrates a workflow 800 using a filtered DDR 810. The filter DDR 810 is a result of the workflow 700 and workflow 800 represent a second step in the example analysis process. A single DDR record of the filtered DDR is used. The text from the filtered record of the DDR 810 is analyzed in step 820 by a LLM. In step 830, an event summary is generated that classifies oil and gas operation events (downhole problems in this example) using extracted data obtained from the analysis of step 820. In step 830, the event summary is generated in a structured format of CSV. One skilled in the art will understand that other formats can be used. The curated domain prompt used for workflow 800 differs from the curated domain prompt used with workflow 700 and workflow 900 of FIG. 9. FIG. 11 provides an example of a curated domain prompt that can be used with FIG. 9.



FIG. 9 illustrates a workflow 900 using both the filtered DDR 810 and the event summary 830 in the analysis process. As such, workflow 900 builds on both workflow 700 and 800. In the workflow 900, the filtered DDR 810 and the event summary 830 are both provided to the LLM for analysis in step 920. In step 930 the LLM generates an event summary 930 from the analysis of step 820. A portion of an example of the event summary 930 is shown in FIG. 9. As with FIGS. 7-8, the analysis performed by the LLM 920 is row by row of the input data.



FIG. 10 illustrates an example workflow 1000 of automated generation of an event summary carried out according to the principles of the disclosure. In workflow 1000, an event summary 1030 is generated from an unstructured DDR 1010. Advantageously, the event summary 1030 can be automatically generated into a structured format by an LLM from the unstructured DDR 1010. The event summary 1030 can be generated according to a method or workflow disclosed herein. An automated analyzer can be used to obtain the raw DDRs and direct the LLM by providing the raw DDRs and a curated domain prompt to the LLM. A format for the event summary 1030 can be included within the curated domain prompt. Each different shaded row indicates a single DDR record, such as processed in workflows 700 and 800. Instead of single records, multiple records can also be processed together.



FIG. 11 illustrates an example of a curated domain prompt 1100 that can be used to obtain the event summary 930 of FIG. 9. Several components of the example curated domain prompt 1100 are identified. Curated domain prompt 1100 can be the third prompt of a chain of curated domain prompts used for the workflows of 700, 800, and 900. “Comma_separated_depths” provides an example of information provided from a previous one of the workflows. The oil and gas operation event is identified as a major downhole or sub-surface problem or certain events. The parameter associated with the oil and gas operation event is depth. The format for the event summary is also identified. Prompt 1100 also identifies the oil and gas textual data used in this example, which is the DDR textual data. An example of a fixed portion and an editable portion of the curated domain prompt 1100 are also identified. A custom request can be used to modify the keywords of the editable oil and gas operation events.


A portion of the above-described apparatus, systems or methods may be embodied in or performed by various analog or digital data processors, wherein the processors are programmed or store executable programs of sequences of software instructions to perform one or more of the steps of the methods. A processor may be, for example, a programmable logic device such as a programmable array logic (PAL), a generic array logic (GAL), a field programmable gate array (FPGA), or another type of computer processing device (CPD). The software instructions of such programs may represent algorithms and be encoded in machine-executable form on non-transitory digital data storage media, e.g., magnetic or optical disks, random-access memory (RAM), magnetic hard disks, flash memories, or read-only memory (ROM), to enable various types of digital data processors or computers to perform one, multiple or all of the steps of one or more of the above-described methods, or functions, systems or apparatuses described herein.


Portions of disclosed examples or aspects may relate to computer storage products with a non-transitory computer-readable medium that has program code thereon for performing various computer-implemented operations that embody a part of an apparatus, device or carry out the steps of a method set forth herein. Non-transitory used herein refers to all computer-readable media except for transitory, propagating signals. Examples of non-transitory computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floppy disks; and hardware devices that are specially configured to store and execute program code, such as ROM and RAM devices. Configured or configured to means, for example, designed, constructed, or programmed, with the necessary logic or features for performing a task or tasks. Examples of program code include both machine code, such as produced by a compiler, and files containing higher-level code that may be executed by the computer using an interpreter.


In interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced.


Those skilled in the art to which this application relates will appreciate that other and further additions, deletions, substitutions, and modifications may be made to the described aspects. It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the claims. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, a limited number of the exemplary methods and materials are described herein.


Unless otherwise specified, use of the terms “connect,” “engage,” “couple,” “attach,” or any other like term describing an interaction between elements is not meant to limit the interaction to direct interaction between the elements and may also include indirect interaction between the elements described. Unless otherwise specified, use of the terms “up,” “upper,” “upward,” “uphole,” “upstream,” or other like terms shall be construed as generally away from the bottom, terminal end of a well, regardless of the wellbore orientation; likewise, use of the terms “down,” “lower,” “downward,” “downhole,” “downstream,” or other like terms shall be construed as generally toward the bottom, terminal end of a well, regardless of the wellbore orientation. Use of any one or more of the foregoing terms shall not be construed as denoting positions along a perfectly vertical axis. In some instances, a part near the end of the well can be horizontal or even slightly directed upwards. Unless otherwise specified, use of the term “subterranean formation” shall be construed as encompassing both areas below exposed earth and areas below earth covered by water such as ocean or fresh water.


The disclosure provides the following aspects:

    • A. An automated analyzer for reviewing oil & gas textual data, comprising: (1) an interface configured to provide a curated domain prompt that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event and (2) one or more processors to perform operations including: directing a large language model (LLM) in extracting event data from oil and gas textual data based on the oil and gas operation event and the parameter, and providing an event summary that correlates the oil and gas operation event and the event data.
    • B. A method of automatically analyzing oil and gas textual data, including: (1) obtaining a curated domain prompt that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event, (2) automatically extracting, using a large language model (LLM), event data from oil and gas textual data based on the oil and gas operation event and the parameter, and (3) automatically generating an event summary that correlates the oil and gas operation event and the event data.
    • C. A textual analysis system for oil and gas textual data, comprising: (1) a data reservoir configured to store oil and gas textual data, and (2) an automated analyzer for reviewing the oil and gas textual data, including an interface configured to receive a curated domain prompt that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event and one or more processors to perform operations including instructing a large language model (LLM) to extract event data from oil and gas textual data based on the oil and gas operation event and the parameter and receiving an event summary from the LLM that correlates the oil and gas operation event and the event data.


Each of the aspects A, B, and C can have one or more of the following additional elements in combination: Element 1: wherein the extracting includes filtering the oil and gas textual data based on the oil and gas operation event and automatically extracting the event data from the filtered oil and gas textual data. Element 2: wherein the event summary is in a structured format. Element 3: wherein the oil and gas textual data includes unstructured text. Element 4: wherein the LLM is an online LLM. Element 5: wherein the LLM is an offline LLM. Element 6: wherein the offline LLM is a fine-tuned model. Element 7: wherein the directing includes sending the curated domain prompt and at least a portion of the oil and gas textual data to the LLM. Element 8 further comprising filtering the oil and gas textual data based on the oil and gas operation event, wherein the automatically extracting the event data is from the filtered oil and gas textual data. Element 9: wherein the event summary includes the parameters, the event data associated with the parameters, and the oil and gas operation event for each of the parameters. Element 10: wherein the oil and gas textual data includes pre-operation, active operation, and post operation textual data for the oil and gas operation event. Element 11: wherein the LLM is an offline LLM. Element 12: wherein the oil and gas textual data includes unstructured text. Element 13: wherein the automatically extracting includes augmented generation by the LLM. Element 14: wherein the automatically generating includes generating the event summary into a structured format identified in the domain prompt. Element 15: further comprising obtaining the oil and gas textual data and the curated domain prompt from a data reservoir. Element 16: wherein the instructing includes sending the curated domain prompt and the oil and gas textual data to the LLM. Element 17: wherein the operations further include sending the event summary to a well operation system for enacting or altering a well operation.

Claims
  • 1. An automated analyzer for reviewing oil & gas textual data, comprising: an interface configured to provide a curated domain prompt that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event; andone or more processors to perform operations including: directing a large language model (LLM) in extracting event data from oil and gas textual data based on the oil and gas operation event and the parameter, andproviding an event summary that correlates the oil and gas operation event and the event data.
  • 2. The analyzer as recited in claim 1, wherein the extracting includes filtering the oil and gas textual data based on the oil and gas operation event and automatically extracting the event data from the filtered oil and gas textual data.
  • 3. The analyzer as recited in claim 1, wherein the event summary is in a structured format.
  • 4. The analyzer as recited in claim 1, wherein the oil and gas textual data includes unstructured text.
  • 5. The analyzer as recited in claim 1, wherein the LLM is an online LLM.
  • 6. The analyzer as recited in claim 1, wherein the LLM is an offline LLM.
  • 7. The analyzer as recited in claim 6, wherein the offline LLM is a fine-tuned model.
  • 8. The analyzer as recited in claim 1, wherein the directing includes sending the curated domain prompt and at least a portion of the oil and gas textual data to the LLM.
  • 9. A method of automatically analyzing oil and gas textual data, comprising: obtaining a curated domain prompt that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event;automatically extracting, using a large language model (LLM), event data from oil and gas textual data based on the oil and gas operation event and the parameter, andautomatically generating an event summary that correlates the oil and gas operation event and the event data.
  • 10. The method as recited in claim 9, further comprising filtering the oil and gas textual data based on the oil and gas operation event, wherein the automatically extracting the event data is from the filtered oil and gas textual data.
  • 11. The method as recited in claim 9, wherein the event summary includes the parameters, the event data associated with the parameters, and the oil and gas operation event for each of the parameters.
  • 12. The method as recited in claim 9, wherein the oil and gas textual data includes pre-operation, active operation, and post operation textual data for the oil and gas operation event.
  • 13. The method as recited in claim 9, wherein the LLM is an offline LLM.
  • 14. The method as recited in claim 9, wherein the oil and gas textual data includes unstructured text.
  • 15. The method as recited in claim 9, wherein the automatically extracting includes augmented generation by the LLM.
  • 16. The method as recited in claim 9, wherein the automatically generating includes generating the event summary into a structured format identified in the domain prompt.
  • 17. The method as recited in claim 9, further comprising obtaining the oil and gas textual data and the curated domain prompt from a data reservoir.
  • 18. A textual analysis system for oil and gas textual data, comprising: a data reservoir configured to store oil and gas textual data; andan automated analyzer for reviewing the oil and gas textual data, including: an interface configured to receive a curated domain prompt that identifies an oil and gas operation event and a parameter associated with the oil and gas operation event; andone or more processors to perform operations including: instructing a large language model (LLM) to extract event data from oil and gas textual data based on the oil and gas operation event and the parameter, andreceiving an event summary from the LLM that correlates the oil and gas operation event and the event data.
  • 19. The well operation textual analysis system as recited in claim 18, wherein the instructing includes sending the curated domain prompt and the oil and gas textual data to the LLM.
  • 20. The well operation textual analysis system as recited in claim 18, wherein the operations further include sending the event summary to a well operation system for enacting or altering a well operation.