Modern drilling rigs or production platforms typically stream data from dozens or hundreds of sensors and IoT devices. This data can easily be viewed as logs, images, or sounds. Data scientists build artificial intelligence/machine learning models using historical data to be run against the same type of streams for prediction. Engineers need to analyze drilling or production performance using standard tools of the trade for business intelligence (BI).
A typical BI analysis involves comparing performance over time, or against plans, or against peers. The streaming data is considered as “fact”, with many such facts streaming in parallel at various frequencies. Engineer needs to establish “dimensions” against which comparable facts can be evaluated. Creating such dimensions takes a lot of manual efforts and is prone to errors as most data streams are difficult to identify, differentiate or group. As such, an automated technique that can accurately identify and “dimension” data streams from various oil field service companies and create an environment that may be used for BI analysis by engineers is needed.
Reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
Current real time systems in the petroleum industry do not automatically differentiate between various streams coming from various sensors and IoT devices of the service providers. But with IoTs and sensors becoming more prevalent and diverse, it is realized herein that differentiating streams from various sensors and IoTs manually will become more challenging, not only for the increasing numbers of the signals, but for the decreasing subtleties there between.
While taxonomies, ontologies, or semantic descriptions are still relatively new and abstract concepts in the petroleum industry, they may be used to differentiate or “dimension” the streams coming from different petroleum service providers and machines. It is also realized herein once the streams are logically dimensioned, the associated data or “facts” can be accessed through those logical dimensions from a standard BI system supporting oil field data.
The current disclosure introduces a technique that can identify each incoming data stream and associate them with descriptive taxonomies automatically so that they can be used to analyze the performance of an asset in a petroleum service environment. In one example, the introduced technique analyzes characteristics of incoming data streams and based on the analysis, identifies the sources of the data streams automatically. Using the identifies sources, the introduced technique maps the incoming data with descriptive taxonomies and stores the values and taxonomies of the data streams as facts and dimensions in a business intelligence (BI) data repository or warehouse.
By being able to automatically analyze and identify a source of each incoming data stream, the introduced technique can significantly reduce human intervention and efforts and hence reduce the number of human errors that were inevitable and costly. Also by being able to associate the incoming data stream with the taxonomies and store them as facts and dimensions in a business intelligence (BI) data repository, the technique can create an environment for BI analysis with various sets of logically nested dimensions and facts. This allows an end user to oversee various ongoing and past operations and analyze performances of assets in those operations. Furthermore, leveraging their similar hierarchical structures and elements, the introduced technique can expand and combine taxonomies and allow an end user to, not only analyze performances of assets in a specific rig or platform, but across multiple rigs or platforms and even across multiple projects, which were not possible before.
It is understood that the term “petroleum service environment” in the current disclosure is used to indicate any environment where petroleum services of various kinds can take place. The term “petroleum service environment” thus is not limited to any specific rig or platform or even any specific area or region. It is also understood in the current disclosure, the terms “data stream” and “signal” are used interchangeably.
In the illustrated example, the system 102 includes a centralized system 106 that is communicatively connected to the service providers 110, 112, 114 via multiple edge devices 107, 108. Each of the edge devices 107, 108 is located approximate to a location or region or to a specific rig or platform that each respective edge device 107, 108 is responsible for. For example, the edge device #1 107 may be located near the first location 150 where the service providers 110, 112 are operating, and the edge device #2 108 may be located near the second location 160 where the service provider 114 is operating.
Each of the centralized system 106 and the edge devices 107, 108 may be implemented using a computing system that is capable of performing the introduced signal discovery and taxonomy techniques. As such, the introduced techniques can be performed entirely by any one of the centralized system 106 and the edge devices 107, 108 or performed together, in portions. The centralized system 106 may be a remote server, such as a cloud server, that may be accessed using conventional means, such as internet, while the edge devices 107, 108 may be only accessible using a private network connection.
It is understood that the numbers of the systems 102 including the numbers of the centralized system 106 and the edge devices 107, 108 within the system 102 are not limited as shown and they can vary. It is also understood that since the centralized system 106 is capable of performing the introduced technique by itself, the edge devices 107, 108 may be omitted in some instances. Having the edge devices 107, 108, however, can act as an additional security measure due to their private nature and also as additional points of accessibility for the end user.
The service providers 110, 112, 114 represent companies that provide oilfield related services such as directional company, production company, mud logging company, mud provider, and along-strong measurement vendor. In the illustrated example, the service providers 110, 112 are operating at an oil rig located in one location 150 while the service provider 114 is operating at a platform that is located in another location 160. It is understood that the numbers of the service providers and the oil rig and platform locations are not limited as shown and they can vary.
The service providers 110, 112, 114 receive various data including measurements and commands from various machines 120, 122, 124, 126, 128 and streams them as signals to the system 102 in real-time for the signal discovery and taxonomy. The service providers 110, 112, 114 are communicatively connected to their respective machines 120, 122, 124, 126, 128 using conventional wired connection means, such as an umbilical (wired), Private Network, Virtual Private Network or wireless connection means such as internet.
Upon receiving a signal, the system 102 can automatically deduce a source of the received signal, i.e., from which sensor or IoT device in which of the in the machines 120, 122, 124, 126, 128 the signal was transmitted. Once the source is deduced, the system 102 can automatically associate values of the signal with one or more predefined taxonomies of the petroleum service environment 100 based on the source of the signal. These values and their taxonomies can be used on the fly before or stored in a BI data repository, e.g., data warehouse, for future BI analysis. The system 105 can utilize the BI data repository for analyzing a performance of any of an asset, e.g., 120, 122, 124, 126, 128 in the petroleum service environment 105.
The processor 210 implements an auto-discovery module (ADM) 212 and a taxonomy module (TXM) 214. The processor 210 may implement each of the modules 220 and 230 using a computer model such as a mathematical model or a machine learning model. The processor 210 can feed a signal received from the interface 220 to the ADM 212 and relay data processed by the ADM 212 and the TXM 214 to the interface 220. The processor 210 can also allow the ADM 212 and the TXM 214 to communicate with one another. The processor 210 may be any data processing unit such as a central processing unit, a graphics processing unit, and a hardware accelerator.
The interface 220 allows the system 200 to communicate with various service providers and machines in a petroleum service environment, and also with an end user, e.g., an engineer or analysist. Although illustrated as a single unit, the interface 220 may be separated into multiple units based on the ways or communication protocols the interface 220 needs to support. For example, there may be one interface interacting with the service providers and machines and another one for interacting with the users. The interface 220 may be a conventional networking interface controller that is discrete from the processor 210 or one that is integrated into the processor 210.
The memory 230 allows the system 200 to store and access data that is needed in processing the introduced technique. For example, the memory 230 may store a series of instructions that when executed, causes the processor 210 to perform methods, such as 300 and 500 in
In the illustrated example, when provided with a signal received by the interface 220, the ADM 212 can automatically deduce a source of the signal based on its characteristics and assigns a unique source identifier to the signal. Some of the characteristics of the signal that are considered include a sampling rate, a range of values of the signal, a signature pattern/fingerprint of the signal, a feature/behavior of the signal and a correlation of the signal with previously received identified or unidentified signals.
In the illustrated example, the TXM 214 provides an environment for a business intelligence (BI) analysis by creating and maintaining a BI data warehouse/repository. The TXM 214 defines one or more taxonomies of the petroleum service environment and populates the one or more taxonomies with hierarchies of elements. When a unique source identifier of a received signal is provided from the ADM 212, the TXM 214 updates the BI data repository by associating the signal with one or more taxonomies of the petroleum service environment and storing them in the BI data repository. Data stored in the BI data repository, e.g., values, sources and taxonomies of signals, can be sorted and accessed using the unique identifier as a key to analyze performance of an asset.
It is understood that as the system 200 may be implemented using multiple computing systems, e.g., as a combination of a centralized system and edge devices, and all of the functions and components of the system 200 do not have to be in a single system as shown. For example, the above described functions of the ADM 212 and the TXM 214 can be implemented using multiple computing systems, and the components such as the processor 210 and the memory 230 can be in separate computing systems.
The method 300 is generally divided into 4 stages. During the first stage 310, which includes Steps 305-309, the current signal is classified based on its generic and range-based categories. During the second stage 320, which includes Steps 321-324, the current signal is further processed to extract its feature/behavior. During the third stage, which includes Steps 331-334, possible sensors, from which the current signal may have been transmitted, are found. During the fourth stage, which includes Steps 341-344, the current signal is assigned with a unique ID that identifies the source of the current signal. The method 300 starts at step 302.
Going into the first stage 310, the ADM determines a generic category, i.e., a service provider and a format of the current signal, at step 305. The service provider and format of the current signal may be determined based on a transmitter of the current signal, a file extension and/or a communication protocol used to transmit the current signal. The communication protocol may include any conventional file or packet extension/form and data transfer protocol/standard/specification that are used in the petroleum industry, such as Wellsite Information Transfer Specification (WITS), Wellsite Information Transfer standard markup language (WITSML), OPC classic, OPC UA & DA, Modbus, Profinet, Profibus, and other proprietary and standard communication protocols. For example, the current signal may be determined to be from a specific drilling contractor at a particular rig/platform based on a stream or interface ID assigned to that drilling contractor, and the format of the signal may be determined based on an extension of the current signal, e.g., .xls for an excel file.
At step 306, the current signal is observed over a period of time. While or after observing, the ADM determines the range based category of the current signal over steps 307-309. In the illustrated example, the ADM determines a sampling rate, i.e., whether values of the signal are discrete or continuous over time, of the current signal at step 307, determines whether the minimum value of the current signal is greater, lesser or equal to zero at step 308, and determines a range of values of the current signal at step 309. It is understood that while they are performed in an ascending order in
Based on these determinations made during the first stage 310, the ADM can determine the generic and range-based categories the current signal belongs. These categories are used later in the method 300 to make an educated guess on the general identity of the signal. For example, continuous nature usually indicate that the current signal is measurement signal while discrete nature usually indicates that it is a command or set point from an operator. If the minimum value of the current signal is always greater than zero, it may indicate a measurement that is always positive such as a position above ground or sea level. Also if the range of values is around a big number such as 20 k, while it can indicate a depth measurements, it cannot indicates a volume or peak level of a tank as such values hover around 35.
At the second stage 320, the ADM processes the current signal to obtain its features. First, the ADM performs an outlier weight evaluation by clustering values of the current signal at step 321. This allows the ADM to distinguish the main range or body of the current signal. Some of the clustering algorithms that may be used at step 321 include: a k-means clustering, e.g., from 2 to 6, and a Density-Based Spatial Clustering of Application with Noise (DBSCAN).
At step 322, the ADM detects and removes outlier values from the signal. From the clustered values, the AMD can determine whether those values at the edges or outside of the main body are outliers by determining their consistency throughout the observed time period. For example, those values that consistently/repeatedly sit outside or at the edge of the main body across the entire observed time period cannot be considered as outliers whereas data that sit outside or at the edge of the main body only a few times, e.g., one or two times throughout the entire observed time period, can be considered as outlier.
Once removed, the ADM normalizes the values so that they can be in a notionally common scale with values of other signals to be compared at step 323. Any conventional normalization technique can be used.
At step 324, the ADM performs a signal transformation to the normalized values to extract a distinguishable feature/behavior of the current signal. Some of the signal transformation techniques that may be used includes: mathematical algorithms such as Discrete Fourier Transform (DFT), Fast FT, and deconvolution, or other feature extraction techniques such as those in the tsfresh library that provide standard behavior description algorithms such as Benford correlation.
During the third stage 330, the ADM matches the behavior of the data determined at step 324 with physics-based general behaviors of signals. First, the ADM compares the extracted behavior/feature of the current signal with physics-based general behaviors of signal to find one or more of the physics-based general behaviors that closely match the extracted behavior of the current signal at step 331.
Some of the general behaviors includes: a first behavior 510 illustrated in
It is understood that the physics-based general behaviors of signal is not limited to the illustrated behaviors, and may include other expected behaviors, such as a behavior that can be expected from values of a fluid or air temperature sensor that hold steady with slow evolution time, a behavior that can be expected from a vibration sensor of a downhole tool that vary widely but generally around one mean associated to a setting of the tool and with one or more frequencies associated with a formation response.
At step 332, the ADM searches a database of historic sensor signals for historic sensor signals that exhibit the one or more physics-based general behaviors found at step 331. The database may be initially prepared, e.g., before the method 300, using historical sensor signals of the current petroleum service environment, i.e., the one from which the current signal is streaming, or others that are located close to the current one.
At step 333, the ADM determines the current signal's correlations with sensors using the historic sensor signals found at step 332. The current signal's correlations can be determined by finding sensors that are correlated with the historic sensor signals found at step 332 as the current signal and the historic sensor signals of step 332 share similar features.
In one example, instead of going through steps 331-332, the ADM may directly determine the current signal's correlations with sensors using the features extracted at step 324. In that example, the method 300 may use a machine learning model that has been trained to determine correlated sensors of signal based on the signal's features. The machine learning model may be trained using historical sensor signals of the current petroleum service environment or others that are located close to the current environment.
At step 334, the ADM reviews previously identified sensor signals for reinforcement of the determination made at step 333. Of all the previously identified sensor signals, those signals that have been identified as to come from one of the sensors determined at step 333 would be of particular interest to the ADM. For example, comparing a number of signal streams from a particular sensor to a number of equipment that have the particular sensor at a rig, the ADM can deduce whether the current signal is coming from that particular sensor.
By the end of third stage 330, the ADM can make an educated guess of an identity of the current signal. While the guess may not be 100% accurate at this point, it may be good enough to make out at least a few generic taxonomic types, e.g., certain type or kind of sensors, for the current signal.
The method 300 proceeds to the fourth stage 340 where the ADM assigns a specific ID to the current signal. As a first step 341 of the fourth stage 340, the ADM selects one or more generic taxonomic types of the sensors determined at step 334. More specifically, the one or more generic taxonomic types of the sensors are selected based on the results from the first and third stages 310 and 330, such as the generic and range-based categories determined at steps 305 and 307-309, and the sensors identified at steps 333 and 334. For example, the ADM can determine based on the results of the first and third stages 310 and 330 that the source of the current signal is one of mud pump sensors at three mud pumps operated by a particular drilling service provider at a particular rig.
At step 342, the ADM narrows down the number of possible sensor choices by observing the current signal over known contextual events that involve one or more generic taxonomic sensor type of step 341. The contextual events may include certain operations that involve certain equipment and its sensors. For example, by observing the current signal during a series of mud pump tests that determine what is known as “Slow Circulating Rates” of each pump under the current mud conditions, the ADM can determine which of the pumps the current signal belongs to.
At step 343, the ADM determines the source of the current signal based on the results of step 342 and assigns a unique source identifier, which corresponds to one of sensors or IoTs in the petroleum environment. It is understood that the source determination process may not have to identical to the process discussed above. For example, instead of using the results of the first stage 310 at step 341, it can be used here at step 343 or even used with the machine learning model mentioned above.
At step 344, the current signal is continuously monitored for a predefined time period to confirm the source determined at step 343. The step 344 ends when either the time period expires or a confirmation or a fault (mismatch) is detected. The confirmation occurs when another characteristic of the source is found during step 344, and the fault occurs when the current signal behaves out of its character, e.g., when the current signal does not produce expected data, e.g., if the signal goes out of the expected range. When the fault occurs, the method 300 may loop back the beginning of the fourth stage 340 and try another generic taxonomic type or sensors. Once the confirmation is made, the method 300 moves to step 350 and ends.
The illustrated method 300 may be performed automatically, i.e., without any user intervention other than receiving a signal to be identified. Once started, the method 300 may execute each step and move on to the next step automatically until it reaches the last step 350. As the method 300 does not require any human intervention, it can significantly reduce the amount of time and effort that is normally required from human to manually analyze and identify signal. Also, because the method 300 utilizes a computer model, e.g., mathematical and machine learning model, the method 300 can also reduce the amount of human errors, intentional or unintentional, in analyzing and identifying signals.
It is understood that the method 300 does not need to be performed by a single computing system as illustrated. The method 300 may be performed using multiple computing systems, e.g., using processors of a centralized system and edge devices, with each system possibly carrying out different steps of the method 300.
At step 710, generic taxonomies of a petroleum service environment, such as 100 in
The taxonomies include, for example, a first taxonomy that is related to physical locations of equipment in the petroleum environment, a second taxonomy that is related to types of operations being performed in the petroleum environment, and a third taxonomy that is related to types of measurements being used in the petroleum system. Each of the elements represents equipment in the petroleum environment, a unit of measurement used in the petroleum environment, or an operation being performed in the petroleum environment depending on the taxonomy the elements belong.
For example, under the physical location taxonomy, a voltage sensor might be associated with a top drive rotation; a top drive; derrick and mast equipment, and a Rig, and a flow sensor might be associated with the pump #2 outflow; mud pumps; surface equipment; and a Rig. Under operation type taxonomy, a rotation of a voltage sensor may be associated with a drill pipe handling; lifting system; and a rig management, while pump #2 outflow of the flow sensor may be associated with a well inflow; a mud circulation; fluid circulation; fluid management; a rig management. Under measurement type taxonomy, rotation measurements from rotating equipment such as top drive, mud motors, etc. may be associated with each other, and pressure measurements from flow pumping equipment may be associated together.
It is understood that while specific presence or number of units, their characteristics, their associated sensors may vary, the general taxonomy of drilling rigs or production units do not vary much. As such, once higher-level taxonomy is built, only lower level may need to be updated over time. It is also understood that the taxonomies may include some elements that are calculated based on combination of sensor data.
At step 720, a specific sensor or IoT device, from which a signal was transmitted, is associated with at least one of the lowest elements in each type of taxonomies. The specific sensor or IoT device can be indicated by the unique source identifier of the signal. Step 720 can be performed automatically depending on the standard protocol used to stream the current signal (WITS, WITSML, MQTT, OPC UA/DA, Modbus, etc.). When there is no element in the taxonomies, with which the specific sensor/IoT ID can be associated, the method 700 moves back to step 710 to add a corresponding element to the taxonomies.
Since each taxonomy is a hierarchy of elements, when a sensor is associated with one of the lowest elements of a taxonomy, the signal becomes automatically associated with other elements in the taxonomy that are connected, directly or indirectly, to that lowest element, with which the signal was first associated. For example, if a pressure senor is associated with a first mud pump of a rig in a physical location taxonomy, that pressure sensor becomes automatically associated other equipment in the taxonomy that are connected to the first mud pump, such as the second and third mud pumps located at the same level of hierarchy as the first mud pump, along with other equipment that are located at higher levels of hierarchy.
Steps 730 and 740 are optional steps that may be performed when requested by a user. At step 730, signals from the sensors and IoT devices in the petroleum environment are streamed. The signals may show individual log traces, gauges, videos, etc. and variations of values over time (frequency might vary depending on the types and characteristics of sensors).
At step 740, at least one of the streamed signals is displayed in an advanced view using the element hierarchies of the taxonomies. For example, the advanced view may show a physical group taxonomy of an asset, such as a circulation system or a rig, allowing the user to have more holistic view of operations, e.g., Digital Twin.
Upon the association at step 720, the method 700 may automatically proceeds to step 750, where values of a signal are stored as facts in fact tables, with a unique source identifier of the signal as a key. Instead of soring values in a conventional system such as SCADA, DCS, and Historians, the values are brought into a BI data repository/data warehouse and stored as facts in fact tables in the method 700. Sensor metadata is stored in parallel, also with the unique source identifier is the key.
As values of the signal is stored as a fact table at step 750, elements of the taxonomies of the signal are stored as dimensions in a dimension table at step 755. Dimension represents the reference information that give context to the related facts. Similar to the facts, the unique source identifier serves as a key to the dimension table.
It is understood that using a BI data repository/data warehouse provides the following benefits. First, it allows an integration of data from multiple sources into a single database and data model; second, it mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long-running analysis queries in transaction processing databases; third, it maintains data history, even if the source transaction systems do not; fourth, it integrates data from multiple source systems, enabling a central view across the enterprise; fifth, it improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data; sixth, it presents the organization's information consistently and provides a single common data model for all data of interest regardless of the data's source; seventh, it restructure the data so that it makes sense to the business users and it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems; eighth, it adds value to operational business applications, notably customer relationship management (CRM) systems; and finally, it Make decision-support queries easier to write and organizes and disambiguates repetitive data.
At step 760, an environment for BI analysis is created from the BI data repository/warehouse containing the facts and dimensions. The step 760 may be performed automatically following steps 750 and 755. The warehouse holds everything needed for auto-generation of multi-level analysis, with the associations clearly matched.
At 770, an asset performance is analyzed. An end user can now use the facts and dimensions to interrogate its data and slice and dice it. For instance, a power consumption analysis can be performed by comparing all sensors from the medium and high amperage group. Also as the taxonomies may include or be supplemented with calculated elements that are based on combination of sensor data, an analysis can be made with respect to an entire rig or activity such as to analyze total consumption, total emissions, total fluid or solid disposal of an entire rig. The method 700 ends at step 775.
It is understood that the method 700 does not need to be performed by a single computing system as illustrated. The method 700 may be performed using multiple computing systems, e.g., using processors of a centralized system and edge devices, with each system possibly carrying out different steps of the method 700.
It is understood at least a portion of the method 700 may be automated. For example, once started, the method 700 may proceed automatically up to step 760 without any user intervention or interaction. Doing so, the method 700 can significantly reduce the amount of time and effort that is normally required from human to manually create a BI data repository and environment. Also, because the method 700 utilizes mathematical and machine learning model that are much more accurate than human, the method 700 can reduce the amount of human errors, intentional or unintentional as well.
It is also understood that the created BI data repository is not used only for the asset performance analysis. The BI data repository can be also used for other business purposes such as performing analytics that quantifies processes for a business to arrive at optimal decisions and to perform business knowledge discovery, conducting business reporting that informs business strategy, facilitating collaboration both inside and outside the business by enabling data sharing and electronic data interchange, and performing knowledge management concerned with the creation, distribution, use, and management of business intelligence, and of business knowledge in general.
A portion of the above-described apparatus, systems or methods may be embodied in or performed by various analog or digital data processors, wherein the processors are programmed or store executable programs of sequences of software instructions to perform one or more of the steps of the methods. A processor may be, for example, a programmable logic device such as a programmable array logic (PAL), a generic array logic (GAL), a field programmable gate arrays (FPGA), or another type of computer processing device (CPD). The software instructions of such programs may represent algorithms and be encoded in machine-executable form on non-transitory digital data storage media, e.g., magnetic or optical disks, random-access memory (RAM), magnetic hard disks, flash memories, and/or read-only memory (ROM), to enable various types of digital data processors or computers to perform one, multiple or all of the steps of one or more of the above-described methods, or functions, systems or apparatuses described herein.
Portions of disclosed examples may relate to computer storage products with a non-transitory computer-readable medium that have program code thereon for performing various computer-implemented operations that embody a part of an apparatus, device or carry out the steps of a method set forth herein. Non-transitory used herein refers to all computer-readable media except for transitory, propagating signals. Examples of non-transitory computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floppy disks; and hardware devices that are specially configured to store and execute program code, such as ROM and RAM devices. Examples of program code include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
In interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced.
Those skilled in the art to which this application relates will appreciate that other and further additions, deletions, substitutions and modifications may be made to the described examples. It is also to be understood that the terminology used herein is for the purpose of describing particular examples only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the claims. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, a limited number of the exemplary methods and materials are described herein.
Aspects disclosed herein include:
A. A method for analyzing a performance of an asset in a petroleum service environment, comprising: 1) upon receiving a signal from a petroleum service environment, deducing a source of the signal based on characteristics of the signal; and 2) upon said deducing, creating a business intelligence (BI) data repository by associating values of the signal with one or more taxonomies of the petroleum system based on the source of the signal.
B. A system for analyzing a performance of an asset in a petroleum service environment, comprising: 1) an interface that receives a signal from a service provider of a petroleum service environment; and 2) a at least one processor that is connected to the interface and: a) when the signal received, deduces a source of the signal based on characteristics of the signal; and b) when the source is deduced, create a business intelligence (BI) data repository by associating values of the signal with one or more taxonomies of the petroleum service environment based on the source of the signal.
Each of the disclosed aspects A, and B can have one or more of the following additional elements in combination. Element 1: said creating includes defining the one or more taxonomies; and building a hierarchy of elements for each of the one or more taxonomies. Element 2: wherein each of the elements of the taxonomies represents one of: an asset, a type of measurement, or an operation. Element 3: wherein the one or more taxonomies include a first taxonomy that is related to physical locations of assets within the petroleum service environment, a second taxonomy that is related to types of operations performed by the assets, and a third taxonomy that is related to types of measurements being used by the assets. Element 4: wherein the characteristics include a sampling rate, a range of values of the signal, a feature or behavior of the signal and a correlation of the signal with previously received signals. Element 5: wherein said deducing includes: determining a generic category of the signal by identifying a service provider and a format of the signal based on a transmitter of the signal and a communication protocol of the signal; and determining a range-based category of the signal by determining a sampling rate and a value range of the signal. Element 6: wherein said deducing includes: clustering values of the signal; detecting and removing outlier values from the signal; normalizing the values of the signal; and extracting a behavior of the signal from the normalized values. Element 7: wherein said deducing includes: matching the behavior of the signal with at least one general signal behavior; identifying historic sensor signals that exhibit the at least one general signal behavior; determining a correlation of the signal with sensors; selecting one or more generic taxonomic types of the sensors based on the generic and range-based categories and the sensors; determining the source of the signal observing the signal over at least one contextual event involving the one or more generic taxonomic types of the sensors; and assigning a unique source identifier, which corresponds to one of available sensors or IoTs in the petroleum service environment, to the signal. Element 8: further comprising analyzing a performance of an asset of the petroleum system using the BI data repository, and wherein the asset includes at least one equipment of the petroleum system. Element 9: wherein said creating includes storing the values of the signal and the associated taxonomies in the BI data repository as a fact and a dimension, respectively. Element 10: wherein the at least one processor creates the BI data repository by: defining the one or more taxonomies; and building a hierarchy of elements for each of the one or more taxonomies. Element 11: wherein the at least one processor deduces the source of signal by: determining a generic category of the signal by identifying a service provider and a format of the signal based on a transmitter of the signal and a communication protocol of the signal; and determining a range-based category of the signal by determining a sampling rate and a value range of the signal. Element 12: wherein the at least one processor deduces the source of signal by: clustering values of the signal; detecting and removing outlier values from the signal; normalizing the values of the signal; and extracting a behavior of the signal from the normalized values. Element 13: wherein the at least one processor deduces the source of signal by: matching the behavior of the signal with at least one general signal behavior; identifying historic sensor signals that exhibit the at least one general signal behavior; determining a correlation of the signal with sensors; selecting one or more generic taxonomic types of the sensors based on the generic and range-based categories and the sensors; determining the source of the signal observing the signal over at least one contextual event involving the one or more generic taxonomic types of the sensors; and assigning a unique source identifier, which corresponds to one of available sensors or IoTs in the petroleum service environment, to the signal. Element 14: wherein the at least one processor further analyzes a performance of an asset of the petroleum system using the BI data repository, and wherein the asset includes at least one equipment of the petroleum service environment. Element 15: wherein the BI data repository is created by storing the values of the signal and the associated taxonomies in the BI data repository as a fact and a dimension, respectively.