The present disclosure relates to methods and apparatus for pump failure prediction.
An electric submersible pump is a device that pumps oil from an oil well. It is widely used because of its operational applicability and capability in various environments. The pump is an important device for oil production and consists of various mechanical devices, such as motors, pumps, valves, etc. Any of the mechanical devices could fail due to mechanical wear caused by sand influx into a wellbore or corrosion by chemical reactions during the production of hydrocarbon. If some devices constituting the electric submersible pump fail, the pumping function of the pump may not be adequately performed. As a result, production rate is reduced or operation is halted, and time, costs, and effort necessary for inspection/repair could be required, resulting in economic losses.
The present disclosure is intended to propose methods and apparatus for pump failure prediction in which both supervised learning and unsupervised learning are used.
In an embodiment of the present disclosure, there is provided a pump failure prediction apparatus including: at least one processor; a storage, which is communicably connected with the processor and stores a program code which operates in the processor; and a communicator, which is communicably connected with the processor, wherein the program code includes: a data collection module which collects real-time data related to a state of a pump which is installed in an oil well and operates; and an abnormality detection module which detects three abnormalities-a first abnormality appearing before failure occurrence by inputting the real-time data into a first model which has performed supervised learning of history data, a second abnormality outside of a normal operation range of the pump by inputting the real-time data into a second model, which has performed unsupervised learning of normal operation data, and a third abnormality outside of an initial normal operation range of the pump by inputting the real-time data to a third model, which has performed unsupervised learning of initial operation data.
In an embodiment of the present disclosure, the history data may include two types of operation data; operation data labeled as normal and operation data labeled as abnormal. The first model may utilize linear discriminant analysis to classify the operation data comprised in the history data as a normal class or an abnormal class, and the abnormality detection module may determine that the first abnormality occurs when the real-time data are classified as the abnormal class according to linear discriminant criteria generated by the first model.
In an embodiment of the present disclosure, the normal operation data may be operation data obtained while the pump operates in a normal state. The second model may be generated and trained through obtaining an average of a plurality of normal section errors between original operation data (the normal operation data) and reconstructed operation data (an reconstruction data) calculated after application of a principal component analysis. In the second model a principal component analysis is applied to the original operation data so as to obtain feature vectors. Based on the feature vectors, the second model reconstructs operation data; and a process of obtaining a normal section error between the original operation data and the reconstructed operation data is repeated at every time step of the operation data in order to obtain the plurality of normal section errors. The abnormality detection module may determine that the second abnormality occurs when an error level of real-time data increases over time when real-time feature vectors are obtained by applying the principal component analysis to the real-time data. Based on the real-time feature vectors, real-time reconstruction data are obtained; a real-time error between the real-time data and the real-time reconstruction data is obtained; the error level is obtained by dividing the real-time error by an average of the plurality of normal section errors generated by the second model; and the error level is recorded over time.
In an embodiment of the present disclosure, initial operation data may be operation data obtained for a predetermined period from time at which the pump is assumed to operate normally in a stable condition without any issues. The third model may be trained by applying principal component analysis to the initial operation data. After applying principal component analysis to the data, feature vectors which consist of at least two components are obtained. Then, a center point of the feature vectors is obtained based on distribution of the vectors in multi-dimensional space. Based on the distribution of the feature vectors and the center point of the initial operation data, the third model calculates a Mahalanobis distance from the center point to each feature vectors of the data. From the range of the Mahalanobis distance, a reference distance is statistically determined. Then, the principal component analysis is applied to the real-time operation data and the third model calculates Mahalnobis distance from the center point to each real-time data and determines whether the state of a pump is abnormal if the Mahalanobis distance is greater than the reference distance.
In an embodiment of the present disclosure, a threshold distance may be determined as a multiple of an average value of a plurality of Mahalanobis distances obtained from the initial operation data.
In an embodiment of the present disclosure, the program code may further include a reporting module which provides a notification that abnormality is detected in the pump, based on an output of the abnormality detection module.
In an embodiment of the present disclosure, the second model may be generated and trained even before a pump normally operates by using normal operation data of another pump, which operates in a similar environment, and may be regenerated by updating the normal operation data with its own operation data in a normal state.
In an embodiment of the present disclosure, the program code may further include a training data generation module which generates training data for training the first model, the second model, and the third model by using preprocessed operation data in such a manner that the real-time data are recorded over time to generate operation data, and the operation data are preprocessed. The training data generation module may generate history data, to generate training data of the first model, in such a manner that time series data are standardized in the operation data. Operation data corresponding to a failure state recorded previously are labeled as abnormal, and data of a normal operation section are labeled as normal. The training data generation module may generate the normal operation data, to generate training data of the second model, in such a manner that a principal component analysis is applied to the operation data at every time step thereof so as to extract feature vectors. Reconstruction data are obtained by reconstructing the feature vectors. An error between the reconstruction data and the operation data is calculated. When the error is smaller than or equal to a reference error, the operation data are comprised in the normal operation data; and when the error is greater than the reference error, the operation data are excluded from the normal operation data. The training data generation module may generate operation data, which are collected for a predetermined period from time at which the pump operates in a normal state after the pump is installed, as the initial operation data for generating training data of the third model.
In an embodiment of the present disclosure, a pump failure prediction method including: collecting the real-time data related to the state of the pump installed in the oil well; generating training data by preprocessing operation data collected by recording the real-time data over time; training the first model, the second model, and the third model by using the training data; detecting the first abnormality occurring before failure occurrence by inputting the real-time data of the pump to the first model which has performed supervised learning of history data, detecting the second abnormality outside the normal operation range of the pump by inputting the real-time data of the pump to the second model which has performed unsupervised learning of the normal operation data, and detecting the third abnormality outside of the initial normal operation range of the pump by inputting the real-time data of the pump to the third model which has performed unsupervised learning of the initial operation data; and providing the notification that abnormality is detected in the pump, based on an output of the detecting of the abnormalities.
In an embodiment of the present disclosure, in detecting of the abnormalities, it may be determined that the first abnormality occurs when the real-time data are classified as the abnormal class according to a linear discriminant criteria generated by the first model, based on the first model which is a linear discriminant analysis model trained to classify the history data comprising operation data labeled as normal and operation data labeled as abnormal into the normal class and the abnormal class.
In an embodiment of the present disclosure, in detecting of the abnormalities, it may be determined that the second abnormality occurs when an error level of real-time data increases over time when a principal component analysis is applied to the normal operation data comprising operation data obtained while the pump operates in a normal state so as to obtain feature vectors. Reconstruction data are obtained by reconstructing the feature vectors; a process of obtaining a normal section error between the normal operation data (the normal operation data) and the reconstruction data (the reconstruction data) calculated after application of a principal component analysis is repeated at every time step of the operation data to obtain a plurality of normal section errors. Based on the second model generated through training of obtaining an average of the plurality of normal section errors, the principal component analysis is applied to the real-time data to obtain real-time feature vectors. Based on the real-time feature vectors, real-time reconstruction data are obtained; a real-time error between the real-time data and the real-time reconstruction data is obtained; the real-time error is divided by the average of the plurality of normal section errors generated by the second model to obtain the error level; and the error level is recorded over time.
In an embodiment of the present disclosure, in detecting of the abnormalities, it may be determined that the third abnormality occurs when a Mahalanobis distance is greater than a reference distance when a principal component analysis is applied to the initial operation data, which are operation data obtained for a predetermined period from time at which the pump is assumed to operate normally in a stable condition without any issues. The third model may be trained by applying principal component analysis to the initial operation data. After applying principal component analysis to the data, feature vectors which consist of at least two components are obtained. Then, a center point of the feature vectors is obtained based on distribution of the vectors in multi-dimensional space. Based on the distribution of the feature vectors and the center point of the initial operation data, the third model calculates a Mahalanobis distance from the center point to each feature vectors of the data. From the range of the Mahalanobis distance, a reference distance is statistically determined. Then, the principal component analysis is applied to the real-time operation data and the third model calculates Mahalnobis distance from the center point to each real-time data and determines whether the state of a pump is abnormal if the Mahalanobis distance is greater than the reference distance.
In an embodiment of the present disclosure, in generating of the training data, in order to generate training data of the first model, the history data may be generated in such a manner that time series data are standardized in the operation data, operation data corresponding to a failure state recorded previously are labeled as abnormal, and data of a normal operation section are labeled as normal, in order to generate training data of the second model. The normal operation data may be generated in such a manner that a principal component analysis is applied to the operation data every specific time thereof to extract feature vectors. Reconstruction data are obtained from the feature vectors. An error between the reconstruction data and the operation data is calculated; when the error is smaller than or equal to a reference error, the operation data are comprised in the normal operation data; and when the error is greater than the reference error, the operation data are excluded from the normal operation data. In order to generate training data of the third model, operation data collected for a predetermined period from time at which the pump operates in a normal state after the pump is installed may be generated as the initial operation data.
In an embodiment of the present disclosure, in training of a model, the second model may be generated by using normal operation data of another pump, which operates in a similar environment, and may be regenerated by updating the normal operation data with its own operation data in a normal state.
Features and advantages of the present disclosure will become more apparent with the following detailed description based on the accompanying drawings.
Prior to this, the terms or words used in the present specification and claims should not be construed in a conventional and dictionary meaning and should be interpreted as a meaning and concept consistent with the technical idea of the present disclosure, based on the principle that the inventor can appropriately define the concepts of terms in order to explain the present disclosure in the best way.
In some embodiments of the present disclosure, occurrence of a new type of failure can be predicted while a failure with a history that occurred in the past is accurately predicted.
The above and other objectives, features, and other advantages of the present disclosure will be more clearly understood from the following detailed description when taken in conjunction with the accompanying drawings, in which:
The purpose, advantages, and features of the present disclosure will become more apparent from the following detailed description and exemplary embodiment taken in conjunction with the accompanying drawings, but the present disclosure is not necessarily limited thereto. In addition, in explaining the present disclosure, when it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the present disclosure, a detailed description thereof will be omitted.
In assigning reference numerals to the components of the drawings, it should be noted that the same components are given the same reference numerals as much as possible even though the components are shown in different drawings, and similar reference numerals are given to similar components.
Terms used to describe the embodiment of the present disclosure are not intended to limit the present disclosure. It should be noted that the singular expression includes the plural expression unless the context dictates otherwise.
In this document, expressions such as “has”, “may have”, “includes”, or “may include” indicate the presence of associated features (e.g., a numerical value, function, operation, or a component such as a part) and do not exclude the presence of additional features.
Terms such as “one”, “other”, “another”, “first”, “second”, etc. are used to distinguish one component from other components, and components are not limited to the terms.
The embodiment described in this document and the accompanying drawings are not intended to limit the present disclosure to a specific embodiment. The present disclosure should be understood to cover various modifications, equivalents, and/or alternatives of the embodiment.
Hereinafter, the embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. In this document, “a pump 2” may mean “an electric submersible pump 2”.
According to the method and apparatus 10 for pump failure prediction of the present disclosure, it is possible to pre-notify failure of the pump 2, which is installed in an oil well 1 and produces oil by predicting the failure of the pump 2. The oil well 1 is a borehole formed in the stratum to pump up oil buried underground. The pump 2 installed in the oil well 1 may be of various types. The pump 2 may include an electric submersible pump 2. The electric submersible pump 2 may include a motor, a gas separator, a pump, a cable, production tubing, and various other devices. Devices that constitute the electric submersible pump 2 may fail due to various reasons. According to the method and apparatus for predicting failure of the pump 2 according to the present disclosure, it is possible to predict the failure of the pump 2 by detecting abnormalities occurring in the electric submersible pump 2 before the failure of the pump 2. The pump failure prediction apparatus 10 can report a notification to an administrator 4 when an abnormality is detected.
An oil production unit may include the oil well 1, the pump 2, a pipe, an oil tank, a junction box, a pump controller 3, and various other devices. The pump failure prediction apparatus 10 may collect data from various sensors installed in the oil production unit. The pump failure prediction apparatus 10 may collect data from the pump controller 3, which controls the electric submersible pump 2. The pump failure prediction apparatus 10 may use data input by the administrator 4. The pump failure prediction apparatus 10 may be configured as a computer device. The pump failure prediction apparatus 10 may be configured as a PC, a server, a tablet PC, a remote operating system, and a device, which performs other information processing functions, as non-limiting examples.
The pump failure prediction apparatus 10 may include at least one processor 11, a storage 12 that is communicably connected with the processor 11 and stores program codes, which operate in the processor 11, and a communicator 13, which is communicably connected with the processor 11. The pump failure prediction apparatus 10 may further include an input/output interface 14, which is communicably connected with the processor 11, receives commands or data from the administrator 4, displays the state of the pump 2 for the administrator 4, and provides data or an abnormality detection notification. The storage 12 may store a first model M1, a second model M2, and a third model M3, which are trained. The storage 12 may store program codes written to perform the method of predicting the failure of the pump 2, and the program codes may be written in module units.
Based on an embodiment, a module may be a component in which hardware and software are combined. The module may operate in the processor 11. The module may include a data collection module 21, a training data generation module 22, a training module 23, an abnormality detection module 24, and a reporting module 25. The data collection module 21, the training data generation module 22, the training module 23, the abnormality detection module 24, and the reporting module 25 may cooperate to perform the method of predicting the failure of the pump 2. The pump failure prediction apparatus 10 may execute the modules in such a manner that a program code, which is software stored in the storage 12 is executed by the processor 11. The data collection module 21, the training data generation module 22, the training module 23, the abnormality detection module 24, and the reporting module 25 may be stored in the storage 12 by being written in the program code and may be executed by the processor 11.
The program code may include: the data collection module 21, which collects the real-time data related to the state of the pump 2, which is installed in the oil well 1 and operates; the training data generation module 22, which generates operation data by recording collected real-time data over time and generates the training data by preprocessing the operation data; and the training module 23, which trains the first model M1, the second model M2, and the third model M3 by using the training data.
The data collection module 21 may collect the real-time data related to the state of the pump 2, which is installed in the oil well 1 and operates, from the oil production unit. The data collection module 21 may generate the operation data by recording the real-time data over time. The data collection module 21 may store the real-time data and the operation data in the storage 12.
The data collection module 21 may collect well data WD. The well data WD are data related to the oil production unit. The well data WD may include monitoring data MD and static data SD. The static data SD are data that do not change after the pump 2 is installed in the oil well 1. The static data SD may include a wellbore diagram, directional survey, an installation report, a workover report, and various other data. The static data SD may be changed when the data are updated by the administrator 4, or when the specification of the pump 2 is changed. The data collection module 21 may receive the static data SD from the administrator 4.
The monitoring data MD is data measured while the pump 2 operates. The monitoring data MD may include first changing data CD1 and second changing data CD2. The first changing data CD1 may be a parameter measured daily. The first changing data CD1 may include daily production data (e.g., oil, gas, and water), and daily wellhead pressure data (e.g., tubing pressure and casing pressure). The second changing data CD2 may be a parameter that is measured every 15 minutes or at predetermined time intervals. The second changing data CD2 may include alarm data, and sensor data. Sensor Data may include a bus voltage, a fluid temperature, a motor current, motor frequency, a motor temperature, and various other elements.
In the embodiment, the real-time data collected by the data collection module 21 may be a portion of the monitoring data MD. For example, the real-time data may include a portion of the second changing data CD2, which is measured every 15 minutes. The elements of the real-time data or the operation data based on an embodiment may include a bus voltage, a fluid temperature, a motor current, motor frequency, and a motor temperature among the second changing data CD2. The elements of the real-time data or the operation data based on an embodiment may include motor frequency, inflow rate into the pump 2, a motor temperature, a fluid temperature, a motor voltage, a bus voltage, a motor current, and temperature difference between a fluid and the pump 2. The real-time data or the operation data may have values of a plurality of elements at each measurement time.
Reference is made back to
The training data generation module 22 may generate the operation data by recording the real-time data over time and may generate the training data for training the first model M1, the second model M2, and the third model M3 by using preprocessed operation data after preprocessing the operation data.
The training data generation module 22 may generate history data for training the first model M1. The history data may include operation data labeled as normal and operation data labeled as abnormal.
The training data generation module 22 may generate normal operation data for training the second model M2. The normal operation data may include operation data obtained while the pump 2 is operating in a normal state.
The training data generation module 22 may generate initial operation data for training the third model M3. The initial operation data may include operation data obtained for a predetermined period from time at which the pump 2 is assumed to operate normally in a stable condition without issues.
The training module 23 may train the first model M1, the second model M2, and the third model M3 by using the training data generated by the training data generation module 22 and may store the first model M1, the second model M2, and the third model M3, which are trained, in the storage 12. The training module 23 may perform the process of training the first model M1, the second model M2, or the third model M3 independently.
The program code may include the abnormality detection module 24, which detects three abnormalities-a first abnormality appearing before failure occurrence by inputting real-time data into the first model M1, which has performed supervised learning of the history data, a second abnormality outside of a normal operation range of the pump 2 by inputting the real-time data to the second model M2, which has performed unsupervised learning of the normal operation data, and a third abnormality outside of an initial normal operation range of the pump 2 by inputting the real-time data to the third model M3, which has performed unsupervised learning of the initial operation data.
The abnormality detection module 24 may predict whether a failure will occur by using the trained first model M1. When the abnormality detection module 24 detects the first abnormality by using the first model M1, the abnormality detection module 24 may predict that failure with an existing occurrence record will occur in the near future. Since the first model M1 has received the supervised learning of the history data, the first model M1 may determine the state of the pump 2 based on past normal or abnormal operation data and may predict failure when detecting abnormality. However, the first model M1 may have limitation when the first model M1 is applied to a pump 2, which does not have a past operation history.
Even if a pump has no history of failure in the past, the abnormality detection module 24 may detect an abnormal state (that is, abnormality) of the pump by using the second model M2 and the third model M3, which are trained. The second abnormality, which the abnormality detection module 24 detects by using the second model M2 or the third abnormality, which the abnormality detection module 24 detects by using the third model M3, is not determined based on failure occurring in the past. The second abnormality or the third abnormality means that the pump 2 is out of a normal state thereof. The second model M2 and the third model M3 may detect abnormality based on a difference between real-time data and normal operation data and thus may predict even a new type of failure.
The abnormality detection module 24 may predict various failures which may occur in the pump 2 by using the first model M1, the second model M2, and the third model M3 together.
The program code may further include the reporting module 25 providing a notification that an abnormality has been detected in the pump 2, based on the output of the abnormality detection module 24. When abnormality is detected in each of the first model M1, the second model M2, and the third model M3, the reporting module 25 may provide an abnormality detection notification thereof.
The reporting module 25 may determine whether abnormality occurs by comprehensively judging the outputs of the first model M1, the second model M2, and the third model M3. The reporting module 25 may provide comprehensive abnormality detection notifications when abnormalities are detected in at least two models of the first model M1, the second model M2, and the third model M3. While providing the comprehensive abnormality detection notifications, the reporting module 25 may provide an individual model in which abnormality is detected.
The abnormality detection notification may indicate the first abnormality detected by using the first model M1, the second abnormality detected by using the second model M2, and the third abnormality detected by using the third model M3. The abnormality detection notification may indicate comprehensive abnormalities. The abnormality detection notification may be provided visually or aurally to the administrator 4 through the input/output interface 14. The abnormality detection notification may be provided to the administrator 4 by means of a phone call, SMS, and e-mail, etc. The input/output interface 14 may display the first abnormality, the second abnormality, and the third abnormality by using a display or an LED.
The method of predicting the failure of the pump 2 based on an embodiment of the present disclosure may include: collecting real-time data related to the state of the pump 2 assumed to be operating normally in stable condition without any issues at S10; generating training data by preprocessing operation data formed by recording the real-time data over time at S20; training the first model M1, the second model M2, and the third model M3 by using the training data at S30; detecting the first abnormality occurring before failure occurrence by inputting the real-time data of the pump 2 into the first model M1, which has performed the supervised learning of the history data, detecting the second abnormality outside of the normal operation range of the pump 2 by inputting the real-time data of the pump 2 into the second model M2, which has performed the unsupervised learning of the normal operation data, and detecting the third abnormality outside of the initial normal operation range of the pump 2 by inputting the real-time data of the pump 2 into the third model M3, which has performed the unsupervised learning of the initial operation data at S40; and reporting a notification that an abnormality has been detected in the pump 2 at S50 based on the output of the detecting of the abnormalities at S40.
The collecting of data at S10 may be performed in the data collection module 21. In the collecting of data at S10, real-time data are collected from the oil-drilling rig and are stored in the storage 12 to generate operation data. The collecting of data at S10 may be repeated in real time. The collecting of data at S10 may always be performed while the oil production unit is operated. In the collecting of data at S10, the administrator 4 may store the static data SD in the storage 12. In the collecting of data at S10, the data collection module 21 may collect the monitoring data MD every predetermined time from sensors installed in the oil production unit, the pump controller 3, or other electronic devices.
The generation of the training data at S20 may be performed in the training data generation module 22. The generation of the training data at S20 may further include the process of preprocessing the operation data. In generating of the training data at S20, the history data for training the first model M1, the normal operation data for training the second model M2, and the initial operation data for training the third model M3 may be independently generated.
The training data generation module 22 may generate the history data which are the training data for generating the first model M1 by performing the generating of training data at S20. To generate the training data of the first model M1 in generating of training data at S20, time series data are standardized in the operation data, and operation data corresponding to a failure state recorded in the past are labeled as abnormal, and data of a normal operation section are labeled as normal so that the history data can be generated.
In generating of training data at S20, in order to generate the training data of the second model M2, a principal component analysis is applied to the operation data at every specific time to extract feature vectors. Reconstruction data is obtained from the feature vectors, and an error between the reconstruction data and the operation data is calculated. When the error is smaller than or equal to a reference error, the operation data are included in the normal operation data, and when the error is greater than the reference error, the operation data are excluded from the normal operation data, so that the normal operation data can be generated.
In generating of training data at S20, in order to generate the training data of the third model M3, operation data collected for a predetermined period from time at which the pump 2 is operated in a normal state after the installation of the pump 2 may be generated as initial operation data.
The generation of training data at S20 is performed when generating a model for the first time or when the model is required to be updated, so that the training data can be updated.
The training of a model at S30 may be performed in the training module 23. The training of a model at S30 is a process of allowing a model to learn the training data. In the training of a model at S30, the first model M1, the second model M2, and the third model M3 may be trained independently of each other. The training of a model at S30 may be performed when a model is generated for the first time or when the model is required to be updated.
The training module 23 may generate the first model M1, which classifies the history data as a normal class or an abnormal class by performing the training of a model at S30. In the training of a model at S30, operation data included in the history data may be classified as a normal class or an abnormal class by using linear discriminant analysis (LDA).
The training module 23 may generate the second model M2, which can learn a normal range in the normal operation data by performing the training of a model at S30. In the training of a model at S30, feature vectors are obtained by applying the principal component analysis to the normal operation data, and reconstruction data are obtained on the basis of the feature vectors, and the process of obtaining normal section errors between the normal operation data and the reconstruction data is repeated at every time step of the operation data to obtain the plurality of normal section errors, and the second model M2 may be generated by learning to obtain the average of the plurality of normal section errors between original operation data (the normal operation data) and reconstructed operation data (the reconstruction data) calculated after application of a principal component analysis.
The training module 23 may generate the third model M3, which can learn a normal range in the initial operation data by performing the training of a model at S30. In training of a model, a process of obtaining at least two feature vectors in order in which the features of the initial operation data are considered the most by applying the principal component analysis to the initial operation data is repeated at every time step of the initial operation data to obtain the distribution and center point of the feature vectors so that the third model M3 can be generated.
The detection of an abnormality at S40 may be performed in the abnormality detection module 24. In detecting of an abnormality at S40, real-time data may be analyzed by using each of the first model M1, the second model M2, and the third model M3. In detecting of an abnormality at S40, detecting whether the first abnormality occurs by using a first module, detecting whether the second abnormality occurs by using a second module, and detecting whether the third abnormality occurs by using a third module may be performed independently of each other.
In detecting of abnormalities at S40, it is possible to detect abnormality, which may occur before failure occurrence by inputting the real-time data to the first model M1. In detecting of abnormalities at S40, it may be determined that the first abnormality occurs when the real-time data is classified as an abnormal class according to linear discriminant criteria generated by the first model M1 on the basis of the first model M1, which is a linear discriminant analysis model trained to classify history data including operation data labeled as normal and operation data labeled as abnormal into a normal class and an abnormal class.
In detecting of abnormalities at S40, it is possible to detect abnormality outside a range in which the pump is normally operated by inputting the real-time data to the second model M2. In detecting of abnormalities at S40, the principal component analysis is applied to normal operation data including operation data obtained while the pump 2 operates in a normal state to obtain feature vectors. Reconstruction data are obtained on the basis of the feature vectors, and the process of obtaining the normal section errors between the normal operation data and the reconstruction data is repeated at every time step of the operation data to obtain the plurality of normal section errors. Based on the second model M2 generated through the training of obtaining the average of the plurality of normal section errors between original operation data (the normal operation data) and reconstructed operation data (the reconstruction data) calculated after application of a principal component analysis, the principal component analysis is applied to the real-time data to obtain real-time feature vectors. Real-time reconstruction data are obtained on the basis of the real-time feature vectors, and a real-time error between the real-time data and the real-time reconstruction data is obtained. Real-time error is divided by the average of the plurality of normal section errors generated by the second model M2 to obtain an error level, and the error level is recorded over time. When the error level increases over time, it may be determined that the second abnormality occurs.
In detecting of the third abnormality at S40, it is possible to detect an abnormality outside an initial state in which the pump 2 is assumed to operate normally in a stable condition without any issues by inputting the real-time data to the third model M3. In detecting of the third abnormality at S40, the principal component analysis is applied to the initial operation data, which are operation data obtained for a predetermined period from time at which the pump 2 is assumed to operate normally in a stable condition without any issues.
The third model M3 may be trained by applying principal component analysis to the initial operation data. After applying principal component analysis to the data, feature vectors which consist of at least two components are obtained. Then, a center point of the feature vectors is obtained based on distribution of the vectors in multi-dimensional space. Based on the distribution of the feature vectors and the center point of the initial operation data, the third model calculates a Mahalanobis distance from the center point to each feature vectors of the data. From the range of the Mahalanobis distance, a reference distance is statistically determined. Then, the principal component analysis is applied to the real-time operation data and the third model calculates Mahalnobis distance from the center point to each real-time data and determines whether the state of a pump is abnormal if the Mahalanobis distance is greater than the reference distance.
Reporting of a notification at S50 may be performed in the reporting module 25. In reporting of a notification at S50, based on the output of detecting of abnormalities at S40, a notification that abnormality occurs in the pump 2 may be reported to the administrator 4.
In reporting of a notification at S50, the abnormality detection notification that the first model M1 has detected abnormality may be provided to the administrator 4 when the first abnormality is detected by the output of an abnormal class for real-time data in detecting of abnormalities S40.
In reporting of a notification at S50, the abnormality detection notification that the second model M2 has detects abnormality may be provided when the second abnormality is detected by outputting the result that the error level of real-time data analyzed in detecting of abnormalities at S40 is increasing.
In reporting of a notification at S50, the abnormality detection notification that the third model M3 has detected abnormality may be provided when the third abnormality is detected by outputting the result that the Mahalanobis distance of the real-time data analyzed in the detecting of abnormalities at S40 is greater than the reference distance.
In reporting of a notification at S50, the abnormality detection notification that at least two models have detected abnormalities may be provided when at least two abnormalities of the first abnormality, the second abnormality, and the third abnormality are detected in detecting of abnormalities at S40.
By checking the abnormality detection notification, the administrator 4 may recognize that the failure of the pump 2 is predicted. The administrator 4 may check the state of the pump 2 and get a chance to check the pump 2 before the failure of the pump 2 is realized. The pump failure prediction apparatus 10 according to the present disclosure can reliably predict the failure of the pump having a failure history, and can reliably recognize a state of the pump out of a normal operation and thus does not generate unnecessary notifications. Accordingly, it is possible to reduce efforts required to check wrong notification and the pump 2.
The generation of training data for generating the first model M1, training of the first model M1, and abnormality detection by the first model M1 will be described with reference to
The first model M1 utilizes the linear discriminant analysis model trained to classify the history data as a normal class or an abnormal class. When the first model M1 receives the real-time data of the pump 2, the first model M1 is trained to classify the data as any one of the normal class and abnormal class. The history data on which the first model M1 is trained may include operation data labeled as normal and operation data labeled as abnormal.
In generating of training data at S20, to generate the training data of the first model M1, the training data generation module 22 may standardize time series data in the operation data. The process of standardizing the time series data in the operation data may increase the versatility of the first model M1.
As for the history data, among operation data formed by storing real-time data collected by the data collection module 21 in the storage 12, operation data measured in a normal state are labeled as normal, and operation data measured in a failure state according to a failure record are labeled as abnormal. Here, the failure state may include a state in which the pump 2 is stopped due to a failure or a state just before the pump 2 is stopped due to a failure. Accordingly, the first model M1 may learn operation data of a state just before failure occurs and may detect the state just before the failure as the first abnormality. Operation data labeled as abnormal are data of a state just before failure occurs, and thus when similar operation data are detected, it can be predicted that a failure will occur in the near future.
The history data may include motor frequency, inflow rate into the pump 2, a motor temperature, a fluid temperature, a motor voltage, a bus voltage, a motor current, and temperature difference between a fluid and the pump 2 among the elements of the operation data, and values of the elements may be measured at each predetermined time. The operation data may be labeled as normal or abnormal at every time thereof. In an exemplary, history data LD1 illustrated in
In training of a model at S30, the training module 23 may generate the first model M1 which utilizes the linear discriminant analysis model by obtaining a decision boundary which classifies operation data included in the history data as a normal class or an abnormal class by using the linear discriminant analysis (LDA).
The training module 23 may search the decision boundary, which can classify data of each time in the history data into two types of classes having a normal class and an abnormal class. The decision boundary may be obtained to maximize a distance between the normal class and the abnormal class and to minimize distribution inside the classes. The trained first model M1 may minimize the distribution of the operation data included in the classes and may have the decision boundary between the classes at which the distribution of the operation data is maximized. When the first model M1 classifies the history data into the normal class and the abnormal class, the training of the first model M1 may stop. The training module 23 may store the trained first model M1 in the storage 12.
In detecting of abnormalities at S40, the abnormality detection module 24 may determine which class real-time data belong to, based on the decision boundary of the first model M1. The abnormality detection module 24 may determine that the first abnormality occurs when the real-time data are classified as an abnormal class according to the linear discriminant criteria generated by the first model M1. The linear discriminant criteria may include the decision boundary. In detecting of abnormalities at S40, the real-time data determined by using the first model M1 may have the same elements as the history data and may have no label.
The abnormality detection module 24 inputs the real-time data to the trained first model M1, wherein the first model M1 may be described to output input real-time data, which belong to one of a normal class and an abnormal class. The abnormality detection module 24 may determine that the first abnormality occurs when data which the first model M1 outputs are classified as abnormal. The abnormality detection module 24 may provide the result that the first abnormality is detected or the result that the real-time data are determined as the abnormal class to the reporting module 25.
In reporting of a notification at S50, the reporting module 25 may provide the abnormality detection notification when the first model M1 outputs the real-time data as the abnormal class. The reporting module 25 does not determine that abnormality occurs when data which the first model M1 outputs are classified as a normal class. When the reporting module 25 receives an abnormal class from the abnormality detection module 24, the reporting module 25 may determine that the first abnormality occurs and may provide abnormality detection notification. For example, the reporting module 25 may display the presence of the first abnormality on the display of the input/output interface 14 and may automatically transmit a call or text to contact of the administrator 4. Through the abnormality detection notification, the administrator 4 may recognize that a failure having an occurrence history will occur.
The update of the first model M1 will be described. When the pump 2 fails, the training data generation module 22 may update history data, and the training module 23 may retrain the first model M1 by using updated history data, and the abnormality detection module 24 may detect an abnormality by using retrained first model M1. The update of the first model M1 may be performed in case of failure occurrence, and decision that a failure has occurred is made by the administrator 4, and accordingly, the update of the first model M1 may be performed irregularly according to the instruction of the administrator 4.
The first model M1 may be trained to have three classes, such as a normal class, an abnormal class, and an unstable class. In a case in which the first model M1 is trained to have the three classes and real-time data are determined as one of the three classes 3, a user may recognize the state of the pump 2 in more detail.
In generating of training data at S20, the training data generation module 22 may generate history data, which include operation data labeled as normal, operation data labeled as abnormal, and operation data labeled as unstable. Here, the operation data labeled as normal may be operation data obtained in a normal state, the operation data labeled as abnormal may be operation data measured for a predetermined period during failure or before failure, and the operation data labeled as unstable may be operation data in a normal or non-failure state. The operation data labeled as unstable are operation data measured between a normal state and a failure state.
In training of a model at S30, the training module 23 may generate the first model M1 by utilizing the linear discriminant analysis model to classify operation data included in history data as a normal class, an abnormal class, or an unstable class. The trained first model M1 may classify the history data into zones corresponding to three classes as shown in
In detecting of abnormalities at S40, the abnormality detection module 24 may determine that first abnormality occurs when real-time data are classified as an abnormal class or an unstable class according to the linear discriminant criteria generated by the first model M1. When the real-time data are classified as the unstable class in detecting of abnormalities at S40, it may be preset that the administrator 4 determines that the first abnormality is not present.
In reporting of a notification at S50, when real-time data are classified as the unstable class or the abnormal class, the reporting module 25 may provide abnormality detection notification to include an associated class as which the real-time data are classified. The administrator 4 may receive the abnormality detection notification when it is determined that the first abnormality occurs since the real-time data are classified as the unstable class, or when it is determined that the first abnormality is not present since the real-time data are classified as the unstable class.
The horizontal axis of
The reporting module 25 may display a result output by the abnormality detection module 24 over time to form the graph of
The generation of training data for generating the second model M2, training of the second model M2, and abnormality detection by using the second model M2 will be described with reference to
The second model M2 learns a range in which the pump 2 normally operates by learning the normal operation data, and this is intended to determine whether the real-time data is out of a normal range.
In generating of training data at S20, by preprocessing entire operation data stored in the storage 12, data to be included in the training data and data to be excluded from the training data may be separated. The normal operation data is required to include operation data collected while the oil production unit and the pump 2 normally operate.
For this reason, in order to generate the training data of the second model M2, the training module 23 extracts feature vectors by applying the principal component analysis to the operation data every specific time of the operation data, obtains reconstruction data from the feature vectors, and calculates an error between the reconstruction data and the operation data. When the error is greater than the reference error, the operation data may be excluded from the normal operation data. When the error is smaller than or equal to the reference error, the operation data may be included in the normal operation data. While the pump 2 operates normally, the deviation of an error between the operation data and the reconstruction data is not great. Operation data at time at which the error is greater than the reference error may indicate that the pump 2 is out of a normal operation state.
Alternatively, the training module 23 extracts feature vectors by applying the principal component analysis to the operation data every specific time of the operation data, obtains reconstruction data from the feature vectors, and calculates errors between the reconstruction data and the operation data, the errors are listed over time, and operation data in a section in which the errors continuously increase may be excluded from the normal operation data. Operation data in a section in which the errors are maintained within a predetermined range may be included in the normal operation data.
The normal operation data may include motor frequency, inflow rate into the pump 2, a motor temperature, a fluid temperature, a motor voltage, a bus voltage, a motor current, and temperature difference between a fluid and the pump 2 among the elements of the operation data, and values of the elements may be measured at each predetermined time. The normal operation data have no label. In the exemplary normal operation data (LD2) shown in
In training of a model at S30, the second model M2 may be generated by learning the normal operation data with a principal component analysis model. In training of a model at S30, the second model M2 may be generated through the training of obtaining the average of a plurality of normal section errors when the principal component analysis is applied to the normal operation data to obtain feature vectors. Based on the feature vectors, the reconstruction data are obtained; and the process of obtaining a normal section error between the original operation data (the normal operation data) and the reconstructed operation data (the reconstruction data) is repeated at every time step of the operation data to obtain the plurality of normal section errors. The trained second model M2 may be stored in the storage 12.
A principal component analysis model may reduce the dimension of data while maintaining the distribution of multidimensional data as much as possible. By applying the principal component analysis, a plurality of feature vectors that considers the features of data may be obtained. When original data are reconstructed by using the plurality of feature vectors, an error may occur. To compare an error obtained from the normal operation data with an error obtained from the real-time data, first, the principal component analysis is applied to the normal operation data to be reconstructed to obtain an error, which is the process of generating the second model M2.
The process of generating the second model M2 will be described. For example, when the principal component analysis is applied to normal operation data including eight elements, a plurality of feature vectors may be obtained. Among the plurality of feature vectors, two feature vectors may be selected in order in which the features of the normal operation data are considered the most. Three or more feature vectors may be selected. Based on two selected feature vectors, eight elements may be reconstructed to obtain reconstruction data. For the eight elements, differences between operation data and the reconstruction data are obtained, and the differences are squared, and the squared differences are added together, so normal section errors can be obtained. For example, a normal section error may be square of difference between the operation data and reconstruction data of a first element+square of difference between the operation data and reconstruction data of a second element+ . . . +square of difference between the operation data and reconstruction data of eighth elements. The normal section error may be obtained at each time and thus a plurality of normal section errors may be obtained. For example, when normal operation data are present at 15-minute intervals and there are 1000 times, one thousand normal section errors may be obtained. Finally, the second model M2 may be generated by obtaining an average of the plurality of normal section errors between original operation data (the normal operation data) and reconstructed operation data (the reconstruction data) calculated after application of a principal component analysis.
In detecting of abnormalities at S40, it may be determined that the second abnormality occurs when an error level of real-time data increases over time when real-time feature vectors are obtained by applying the principal component analysis to the real-time data. Based on the real-time feature vectors, real-time reconstruction data are obtained. A real-time error between the real-time data and the real-time reconstruction data is obtained; the error level is obtained by dividing the real-time error by the average of a plurality of normal section errors generated by the second model M2; and the error level is recorded over time. The real-time data input to the second model M2 may have the same type as the operation data included in the normal operation data.
In detecting of abnormalities at S40, the abnormality detection module 24 may obtain the real-time feature vectors by applying the principal component analysis to the plurality of elements of the real-time data. The abnormality detection module 24 may obtain real-time reconstruction data by using the real-time feature vectors. Differences between the real-time reconstruction data and the real-time data are obtained and squared and the squared differences are all added to obtain a real-time error. When the real-time error is divided by the average of the normal section errors, the error level may be obtained. The error level is a value indicating how large the real-time error is relative to the average of the normal section errors.
When the error level is recorded over time, a graph shown in
In reporting of a notification at S50, the reporting module 25 may provide the abnormality detection notification when the second model M2 detects the second abnormality.
When receiving an input that the result of evaluating the output of the second model M2 from the abnormality detection module 24 is abnormal, the reporting module 25 may determine that the second abnormality occurs and may provide the abnormality detection notification. For example, the reporting module 25 may display the presence of the second abnormality on the display of the input/output interface 14 and may automatically transmit a call or text to the contact of the administrator 4. Through the abnormality detection notification, the administrator 4 may recognize that a new type of failure may occur instead of failure with an occurrence history.
The process of updating the second model M2 will be described.
In training of a model at S30, before the pump 2 is installed in the oil well 1 and is assumed to operate normally in a stable condition without any issues, the second model M2 may be generated by using the normal operation data of another pump, which has specification similar to pump 2, and after the pump 2 is installed in the oil well 1 and assumed to operate normally in a stable condition without any issues, the normal operation data of the pump 2 may be updated every preset period to regenerate the second model M2.
Even if the same type of pump 2 is used, the operation data of the pump 2 installed in a different oil well 1 have different values. This is because the features of the oil well 1 are different depending on the location of a geological stratum. Each of the pumps 2 may have different operation data for various reasons. Accordingly, the normal operation data of a specific pump 2 cannot be obtained until the specific pump 2 is actually installed and operated in the oil well 1. Accordingly, the second model M2 may not be used immediately after the pump 2 is installed in the oil well 1. To solve this problem, immediately after the pump 2 is installed in the oil well 1, the second model M2 may be generated by using normal operation data obtained from a pump 2 installed in a different oil well 1. This is because even the pump 2 installed in the different oil well 1 may have similar normal operation data when features of the different oil well 1 are similar, the type of the pump 2 is similar, and the operation mode of the pump 2 is similar.
When the normal operation data of a specific pump 2 are sufficiently obtained while detecting an abnormality by using the second model M2 generated by using the normal operation data of the pump 2 installed in the different oil well 1, the second model M2 may be updated. When the specific pump 2 starts operating normally, the second model M2 may be regularly updated every preset period. While the pump 2 operates, the state of each of the oil well 1 and the pump 2 continuously changes, so the second model M2 is also required to be updated regularly.
The generation of training data for generating the third model M3, training of the third model M3, and abnormality detection by the third model M3 will be described with reference to
The third model M3 learns a normal range of an initial section in which the pump 2 is installed and normally operates by learning the initial operation data, and this is intended to determine whether real-time data are outside the normal range of the initial section.
The third model M3 is a model that compares the current state of the pump 2 with the state of the pump 2 immediately after the pump 2 is installed in the oil well 1 and assumed to normally operate in a stable condition without any issues. In an initial stage at which the pump 2 is installed in the oil well 1, it is very unlikely that the pump 2 will fail for other reasons without aging. Accordingly, the third model M3 may learn operation data collected at an initial stage in which the pump 2 is installed and operated.
Operation data that may be included in the initial operation data are data included in a predetermined period after the pump 2 starts to operate normally. For example, the initial operation data may include data for seven days after the pump 2 starts to operate normally. In the embodiment, the initial operation data may include operation data collected for a period of about two to seven days after the pump 2 is installed. Periods for which the initial operation data are collected may vary. However, data of a day on which the pump 2 is installed are preferably excluded, and operation data collected over a week after the pump 2 is installed are difficult to be classified as initial data and thus are preferably excluded.
When oil is produced by using the pump 2 in the oil well 1 while an oil production unit is operating, the state of the oil well 1 changes and the pump 2 ages. Accordingly, initial operation data obtained in a state in which the oil well 1 and the pump 2 are in a normal state without aging are preset as a standard for determining abnormality.
The Initial operation data may include motor frequency, inflow rate into the pump 2, a motor temperature, a fluid temperature, a motor voltage, a bus voltage, a motor current, and temperature difference between a fluid and the pump 2 among the elements of the operation data, and values of the elements may be measured at each predetermined time. The initial operation data have no label. In an exemplary initial operation data (LD3) shown in
In training of a model at S30, the third model M3 may be generated by learning the normal operation data with the principal component analysis model. In training of a model at S30, the principal component analysis is applied to the initial operation data, and the process of obtaining at least two feature vectors in order in which the features of the initial operation data are considered the most is repeated at every time step of the initial operation data to obtain the distribution and center point of the feature vectors such that the third model M3 can be generated. Trained third model M3 may be stored in the storage 12.
The process of generating the third model M3 will be described. For example, when the principal component analysis is applied to initial operation data including eight elements, a plurality of feature vectors may be obtained. Among the plurality of feature vectors, the predetermined number of the feature vectors may be selected in order in which the features of the initial operation data are considered the most. In the embodiment, two feature vectors may be selected. Two feature vectors may be obtained by applying the principal component analysis to the initial operation data at every time of the initial operation data. The feature vectors obtained at every time may be arranged in space having the predetermined number of dimensions. In the embodiment, two feature vectors are selected and thus may be arranged in two-dimensional space. If three feature vectors are selected, the feature vectors may be arranged in three-dimensional space. A plurality of feature points displayed in the dimensional space may be distributed in a specific form according to the features of the initial operation data. The third model M3 may be generated by obtaining the center point of the feature vectors, based on the distribution of the feature vector. As shown in
In the detecting of abnormalities at S40, based on the distribution of the feature vectors and the center point (CP) of the initial operation data, the third model M3 calculates a Mahalanobis distance from the center point (CP) to each feature vectors of the data. From the range of the Mahalanobis distance, a reference distance is statistically determined. Then, the principal component analysis is applied to the real-time operation data and the third model M3 calculates Mahalnobis distance from the center point (CP) to each real-time data and determines whether the state of a pump is abnormal if the Mahalanobis distance is greater than the reference distance.
In detecting of abnormalities at S40, the abnormality detection module 24 may obtain the real-time feature vectors by applying the principal component analysis to the plurality of elements of the real-time data. The abnormality detection module 24 may arrange the real-time feature vectors in the dimensional space of the third model M3. The abnormality detection module 24 may obtain a Mahalanobis distance Md2 between the center points CP of the real-time feature vectors RP arranged in the dimensional space of the third model M3 and the feature vectors generated by the third model M3. When the Mahalanobis distance Md2 is greater than the reference distance, it may be determined that the third abnormality occurs.
The reference distance may be determined as a multiple of the average value of a plurality of Mahalanobis distances Md1 after obtaining the plurality of Mahalanobis distances Md1 by repeating the process of obtaining the Mahalanobis distance Md1 between the center points CP of feature vectors obtained by the third model M3 from the initial operation data and the real-time feature vectors at every time. In the embodiment, the reference distance may be determined as a value which is three times the average value of the plurality of Mahalanobis distances Md1, which the third model M3 obtains from the initial operation data.
In
In reporting of a notification at S50, the reporting module 25 may provide an abnormality detection notification when the third model M3 detects the third abnormality.
When the reporting module 25 receives from the abnormality detection module 24 that the result of evaluating the output of the third model M3 is abnormal, the reporting module 25 may determine that the third abnormality occurs and may provide the abnormality detection notification. For example, the reporting module 25 may display the presence of the third abnormality on the display of the input/output interface 14, and may automatically transmit a call or text to contact of the administrator 4. Through the abnormality detection notification, the administrator 4 may recognize that a new type of failure may occur instead of a failure having an occurrence history.
The third model M3 uses the operation data of an initial stage at which the pump 2 is installed, and the initial operation data are selected only once and are not updated. When repairing the pump 2, there is a possibility that the generation of the initial operation data and the training of the third model M3 are additionally performed according to the command of the administrator 4.
In
The reporting module 25 may provide an abnormality detection notification to the administrator 4 by including the graph shown in
The administrator 4 may receive the abnormality detection notification and may stop the operation of the pump 2. The reporting module 25 may stop the operation of the pump 2 upon detecting an abnormality. If the operation of pump (2) is stopped, it may prevent further escalation of any abnormality occurring in pump (2).
The present disclosure has been described in detail through the specific embodiment. The embodiment is intended to specifically describe the present disclosure, and the present disclosure is not limited thereto. It is clear that the embodiment may be modified or improved within the technical spirit of the present disclosure by those skilled in the art.
All simple modifications or changes of the present disclosure fall within the scope of the present disclosure, and the specific protection scope of the present disclosure will be clarified by the appended claims.