The present application is related to systems, methods, and apparatuses for an additive life consumption model for predicting remaining time-to-failure of machines. The technology described herein may be generally applied, for example, to determine the time-to-failure for gas turbines or other complex machines.
A proper maintenance strategy is crucial for operating an expensive machine. Traditional methods include corrective maintenance and planned maintenance. The first happens only after the machine fails. This can be very costly if the damage happens to be catastrophic. The second approach is scheduled based on manufacturer recommendation and appears to be smarter. However, it often occurs that when the scheduled maintenance happens, the machine is still in good shape. Performing maintenance in such cases can cause unnecessary cost and loss of revenue.
In recent years, condition-based maintenance has received more attention. It happens only when the failure of a machine is predicted to happen soon. Otherwise, the machine will be let run. The success of this approach can not only save cost, but also shed light on how to operate a machine such that it can produce the most profit before breakdown.
Many condition-based maintenance approaches make the following assumption: there exists a measurable quantity indicating the aging process of the machine. As time goes by, this aging indicator will monotonically increase or decrease. When this value hits certain threshold, the machine is likely to fail. This indicator can have physical meanings such as wear, corrosion, fracture or deformation and is often assumed to have an exponential form over time. Various methods have used neural networks, kernel machines combined with particle filters to predict (simulate) the progress of this indicator and when it might hit the alarming threshold. However, in many applications, there may not be such aging indicator available.
Another major set of approaches are based on survival analysis, particularly the Proportional Hazards Model (PHM). PHM assumes that the total time-to-failure follows a distribution such as Weibull distribution. However, the actual shape or a parameter of the distribution depends on some variables of the machine. This makes the PHM adaptive to individual machines. It is easy to predict the remaining time-to-failure based on the current time by just using the conditional probability of the above distribution. However, the past operation of the machine, a multivariate time series, is often only summarized in one single number, which can cause significant information loss.
Accordingly, it is desired to provide new techniques for predicting the remaining time-to-failure of a machine and other condition-based machine maintenance.
Embodiments of the present invention address and overcome one or more of the above shortcomings and drawbacks, by providing methods, systems, and apparatuses related to an additive life consumption model for predicting remaining time-to-failure of a machine. Briefly, the techniques described herein define a new concept of “life,” which is unobserved. By monitoring multivariate time series data generated by the machine over time, a life consumption rate is derived that is used, in turn, to determine the time-to-failure for the machine.
According to some embodiments of the present invention, a method for predicting time-to-failure of a machine includes receiving or retrieving, by a computing system operably coupled with the machine, multivariate time series data observed a plurality of times. The computer system infers state variables from the multivariate time series data, each state variable describing an operating condition of the machine at a particular time and computes an average life consumption rate by applying a life consumption rate model to state variables. The computer system next computes time-to-failure for the machine based on the average life consumption rate. Then, the computing system may report the time-to-failure for the machine to one or more users.
In some embodiments of the aforementioned method, the life consumption rate model is learned by receiving training multivariate time series data observed over a training time period and inferring a plurality of training state variables from the training multivariate time series data. Each training state variable describes a past operating condition over the training time period. These state variables may comprise discrete values that, together with the multivariate time series data, form a hidden Markov model. Alternatively, the state variables may be continuous values that, collectively with the multivariate time series data, form a Kalman filtering model. A constrained optimization problem is created which independently models life consumption of each of training state variables. In embodiments, where the state variables comprise continuous values, the constrained optimization problem may be modeled using a non-linear black box model (e.g., a neural network). Once created, the constrained optimization problem may be solved by the computing system using a suitable technique (e.g., gradient-based algorithm, interior point algorithm, least square algorithm, etc.) to yield the life consumption rate model.
According to another aspect of the present invention, a system for predicting time-to-failure of a machine includes one or more processors and a non-transitory, computer-readable storage medium in operable communication with the processor(s). The computer-readable storage medium contains one or more programming instructions that, when executed, cause the processor(s) to implement the methods discussed above as being performed by the aforementioned computing system.
According to other embodiments, a machine comprises one or more processors and a non-transitory, computer-readable storage medium in operable communication with the processor(s), and a display. The computer-readable storage medium contains one or more programming instructions that, when executed, cause the processor(s) to collect multivariate time series data at a plurality of times; infer a plurality of state variables from the multivariate time series data, each state variable describing an operating condition of the machine at a particular time; compute an average life consumption rate by applying a life consumption rate model to state variables; and compute time-to-failure for the machine based on the average life consumption rate. The display presents the time-to-failure for the machine to one or more users. The machine in these embodiments may be, for example, a gas-turbine or similar complex machine.
Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments that proceeds with reference to the accompanying drawings.
The foregoing and other aspects of the present invention are best understood from the following detailed description when read in connection with the accompanying drawings. For the purpose of illustrating the invention, there is shown in the drawings embodiments that are presently preferred, it being understood, however, that the invention is not limited to the specific instrumentalities disclosed. Included in the drawings are the following Figures:
The following disclosure describes the present invention according to several embodiments directed at an additive life consumption model for predicting remaining time-to-failure of a machine. The techniques described herein define a new concept of “life,” which is unobserved. Life starts with a certain value such as 100 and is consumed gradually as the machine runs. When life is consumed completely or becomes zero, the machine breaks down. Additionally, as described in further detail below, the operating conditions of a machine are divided into different states. It is assumed that under different states, the life is consumed differently. In other words, different states have different life consumption rates. In cases of discrete states, quadratic programming is used to learn life consumption rate. In cases of continuous states, general constrained optimization methods are used to learn life consumption rate. The techniques described herein may be applied to gas turbines or other complex machinery.
Resolution of time t can be minute, hour, day, week or month, depending on applications. When t=1, the machine just starts a new cycle after the first installation or after a major inspection and maintenance; at that time, the machine can be assumed to be problem-free. When t=T, the machine breaks down and a maintenance must be carried out. T is the total time-to-failure. The goal is to predict the remaining time-to-failure, or T−t. Based on this prediction, the next maintenance event can be predicted.
Two hidden variables are introduced, denoted by white nodes in the model shown in
The second hidden variable Δlifet denotes the life consumed at t. We assume that every machine gets a total life=100 to consume. In other words,
life=Δlife1+Δlife2+ . . . +ΔlifeT=100. (1)
Note that the life defined here is different from life commonly used in terms like remaining useful life where life is same as time-to-failure. The model described herein is called “additive” because of the additive nature as shown in (1). It is sensible that under different operating conditions, a machine consume life differently. Therefore, a life consumption rate w is defined. The value of w depends on state x and can be viewed as a function over x. If w and x are known, the life consumed at that may be determined as follows:
Δlifet=w(xt). (2)
To be able to use the model, the parameter of life consumption rate w is learned. Suppose that there are N time series, each representing a cycle of a machine. The n-th time series starts from 1 and ends at Tn (when the machine breaks). For time series n, the states x1:T
The value of w is required to be nonnegative because life can only be consumed and not regained.
In the case of discrete states with at most K different possibilities, w can be represented by K numbers w1, w2 . . . wK, or briefly as a column vector w=[w1 w2 . . . wK]′. Then, (3) can be rewritten as:
δ(x) is 1 when x=0 or 0 everywhere else. Thus, Σt=1T
In the case of continuous states, wθ(x) can be modeled using nonlinear black box models such as, for example, neural networks with parameters θ. Equation (3) can be turned into a standard constrained optimization problem:
Equation (5) can be solved, for example, using a gradient-based algorithm such as the interior point algorithm.
Once the life consumption rate model w, an observed time series y1:t from a test machine may be used to predict its remaining time-to-failure T−t. To do this, first y1:t is estimated from x1:t using Bayesian networks or other standard techniques generally known in the art. Then, the learned model is used to compute the life already consumed at time t as follows:
consumed life=w(x1)+w(x2)+ . . . +w(xt). (6)
The remaining life is simply:
remaining life=100−consumed life. (7)
Next, the past is generalized to forecast how the future life is going to be consumed. The average life consumption rate
The time-to-failure is how long it takes for the remaining life to be consumed using average life consumption rate:
Note that sometimes the goal is not predicting time-to-failure, but remaining power or remaining equivalent baseload hours that the machine can produce before breakdown. This can be simply incorporated as follows. Suppose that the quantity we want to predict is z. The average production rate z/t may be computed from the past time series. Then, this may be extended to future to get the remaining z:
Starting at step 205, a training multivariate time series data is received or retrieved from a data source associated with the machine. In some instances, the data source may be included in the machine itself; while in other instances the database is remotely located (e.g., in a database of information from a particular production plant using the machine). The training multivariate time series includes data observed over a training time period. In general any time period may be selected, although there should be two considerations in selecting a suitable time period. First, the accuracy of the model is correlated with the amount of data used for training. Thus, as large a time period as practicable may be used. Secondly, depending on the type of data and the operations of the machine, certain data may be required to be sampled by a higher rate to capture all possible behaviors of the system.
Continuing with reference to
Once training is complete, the life consumption rate model can be used to predict the remaining time-to-failure of the machine. Starting at step 225, new multivariate time series data is received or retrieved, either from machine directly or from a data source corresponding to the machine. In embodiments where the process 200 is implemented in the machine, the multivariate time series data may be collected directly by the machine itself based on periodic observations of the data being generated by the machine.
At step 230, a plurality of state variables are inferred from the multivariate time series data. As with the training data, each state variable inferred at step 230 describes an operating condition of the machine at a particular time. Next, at step 235, an average life consumption rate is computed by applying a life consumption rate model to the plurality of state variables. This step may apply Equations (6) and (7) to compute consumed life and remaining life, respectively. Then average life consumption rate may be computed according to Equation (8). It should be noted that these equations are exemplary and other equations may be used in different embodiments for computing the respective values. At step 240, time-to-failure for the machine is computed based on the average life consumption rate, for example, as set forth in Equation (9).
Once the time-to-failure value is determined, at step 245, it is reported to one or more users. For example, in some embodiments, all calculations are performed inside the machine and a display on the machine provides information detailing the time-to-failure. In other embodiments, a message may be sent to a user or the time-to-failure information can be recorded in a database for later use. In one embodiment, the aforementioned database is used to generate periodic alerts of machines with short time-to-failure. Alternatively (or additionally), the database can be used to generate reports for time-to-failure across an entire enterprise of machines. In this way, users can schedule down-time and budget for new machines according to the time-to-failure information.
As shown in
The computer system 310 also includes a system memory 330 coupled to the bus 321 for storing information and instructions to be executed by processors 320. The system memory 330 may include computer readable storage media in the form of volatile and/or nonvolatile memory, such as read only memory (ROM) 331 and/or random access memory (RAM) 332. The system memory RAM 332 may include other dynamic storage device(s) (e.g., dynamic RAM, static RAM, and synchronous DRAM). The system memory ROM 331 may include other static storage device(s) (e.g., programmable ROM, erasable PROM, and electrically erasable PROM). In addition, the system memory 330 may be used for storing temporary variables or other intermediate information during the execution of instructions by the processors 320. A basic input/output system 333 (BIOS) containing the basic routines that help to transfer information between elements within computer system 310, such as during start-up, may be stored in ROM 331. RAM 332 may contain data and/or program modules that are immediately accessible to and/or presently being operated on by the processors 320. System memory 330 may additionally include, for example, operating system 334, application programs 335, other program modules 336 and program data 337.
The computer system 310 also includes a disk controller 340 coupled to the bus 321 to control one or more storage devices for storing information and instructions, such as a magnetic hard disk 341 and a removable media drive 342 (e.g., floppy disk drive, compact disc drive, tape drive, and/or solid state drive). The storage devices may be added to the computer system 310 using an appropriate device interface (e.g., a small computer system interface (SCSI), integrated device electronics (IDE), Universal Serial Bus (USB), or FireWire).
The computer system 310 may also include a display controller 365 coupled to the bus 321 to control a monitor or display 366, such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user. The computer system includes an input interface 360 and one or more input devices, such as a keyboard 362 and a pointing device 361, for interacting with a computer user and providing information to the processor 320. The pointing device 361, for example, may be a mouse, a trackball, or a pointing stick for communicating direction information and command selections to the processor 320 and for controlling cursor movement on the display 366. The display 366 may provide a touch screen interface which allows input to supplement or replace the communication of direction information and command selections by the pointing device 361.
The computer system 310 may perform a portion or all of the processing steps of embodiments of the invention in response to the processors 320 executing one or more sequences of one or more instructions contained in a memory, such as the system memory 330. Such instructions may be read into the system memory 330 from another computer readable medium, such as a hard disk 341 or a removable media drive 342. The hard disk 341 may contain one or more datastores and data files used by embodiments of the present invention. Datastore contents and data files may be encrypted to improve security. The processors 320 may also be employed in a multi-processing arrangement to execute one or more sequences of instructions contained in system memory 330. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, embodiments are not limited to any specific combination of hardware circuitry and software.
As stated above, the computer system 310 may include at least one computer readable medium or memory for holding instructions programmed according to embodiments of the invention and for containing data structures, tables, records, or other data described herein. The term “computer readable medium” as used herein refers to any medium that participates in providing instructions to the processor 320 for execution. A computer readable medium may take many forms including, but not limited to, non-volatile media, volatile media, and transmission media. Non-limiting examples of non-volatile media include optical disks, solid state drives, magnetic disks, and magneto-optical disks, such as hard disk 341 or removable media drive 342. Non-limiting examples of volatile media include dynamic memory, such as system memory 330. Non-limiting examples of transmission media include coaxial cables, copper wire, and fiber optics, including the wires that make up the bus 321. Transmission media may also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
The computing environment 300 may further include the computer system 310 operating in a networked environment using logical connections to one or more remote computers, such as remote computing device 380. Remote computing device 380 may be a personal computer (laptop or desktop), a mobile device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to computer system 310. When used in a networking environment, computer system 310 may include modem 372 for establishing communications over a network 371, such as the Internet. Modem 372 may be connected to bus 321 via user network interface 370, or via another appropriate mechanism.
Network 371 may be any network or system generally known in the art, including the Internet, an intranet, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a direct connection or series of connections, a cellular telephone network, or any other network or medium capable of facilitating communication between computer system 310 and other computers (e.g., remote computing device 380). The network 371 may be wired, wireless or a combination thereof. Wired connections may be implemented using Ethernet, Universal Serial Bus (USB), RJ-11 or any other wired connection generally known in the art. Wireless connections may be implemented using Wi-Fi, WiMAX, and Bluetooth, infrared, cellular networks, satellite or any other wireless connection methodology generally known in the art. Additionally, several networks may work alone or in communication with each other to facilitate communication in the network 371.
The embodiments of the present disclosure may be implemented with any combination of hardware and software. In addition, the embodiments of the present disclosure may be included in an article of manufacture (e.g., one or more computer program products) having, for example, computer-readable, non-transitory media. The media has embodied therein, for instance, computer readable program code for providing and facilitating the mechanisms of the embodiments of the present disclosure. The article of manufacture can be included as part of a computer system or sold separately.
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
An executable application, as used herein, comprises code or machine readable instructions for conditioning the processor to implement predetermined functions, such as those of an operating system, a context data acquisition system or other information processing system, for example, in response to user command or input. An executable procedure is a segment of code or machine readable instruction, sub-routine, or other distinct section of code or portion of an executable application for performing one or more particular processes. These processes may include receiving input data and/or parameters, performing operations on received input data and/or performing functions in response to received input parameters, and providing resulting output data and/or parameters.
The functions and process steps herein may be performed automatically, wholly or partially in response to user command. An activity (including a step) performed automatically is performed in response to one or more executable instructions or device operation without user direct initiation of the activity.
The system and processes of the figures are not exclusive. Other systems, processes and menus may be derived in accordance with the principles of the invention to accomplish the same objectives. Although this invention has been described with reference to particular embodiments, it is to be understood that the embodiments and variations shown and described herein are for illustration purposes only. Modifications to the current design may be implemented by those skilled in the art, without departing from the scope of the invention. As described herein, the various systems, subsystems, agents, managers and processes can be implemented using hardware components, software components, and/or combinations thereof. No claim element herein is to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for.”