This description relates to component maintenance in production facilities.
Production activities for physical goods that are manufactured or otherwise produced for sale are typically subject to constraints regarding, for example, timeliness, efficiency, reliability, safety, or volume. For example, a manufacturing facility may be required to produce a certain type of item for sale within a certain time limit of orders being received, while meeting a monthly production quota and minimizing an amount of downtime experienced by the production system. If such goals are met, then related goals of profitability and customer satisfaction are also more likely to be met.
In order to meet these and other goals, it is helpful to maximize efficient use of available production equipment, while minimizing associated costs and downtime. For example, production equipment is typically subject, over time, to malfunction, breakage, and/or degraded performance due to general wear and tear. Consequently, repair, replacement, and/or other maintenance are required for continued fulfillment of production goals.
However, it is often difficult to determine how to implement such maintenance activities. For example, a production facility may include many different types of production equipment, which may degrade at different rates or be subject to varying levels of likelihood of breakage. If too little maintenance is undertaken, then equipment is more likely to malfunction over time, thereby leading, for example, to increases in total equipment downtime and repair costs, or, in some cases, to increases in accidents that may result in human safety and/or environmental concerns. On the other hand, if too much maintenance is undertaken, excess costs associated with any unnecessary maintenance are wasted.
Accordingly, techniques may be implemented that allow accurate prediction of a need for maintenance activities with respect to associated production equipment. Moreover, such predictions may be made with respect to equipment components that are determined to be critical for maintenance purposes. For example, analysis may determine components which precede dependent components within production operations. Consequently, such critical components, were they to malfunction, would cause a chain reaction of malfunctions or unavailability of the related, dependent components. Similarly, critical components may be defined with respect to safety or environmental concerns that would be present in the event of failure thereof. By predicting maintenance requirements for such critical components, maintenance costs and associated downtime may be reduced, while profitability, along with employee and customer satisfaction, may be increased.
According to one general aspect, a system includes at least one processor, and instructions recorded on a non-transitory computer-readable medium, and executable by the at least one processor. The system includes a maintenance data collector configured to collect maintenance data characterizing maintenance events associated with maintaining operations of a plurality of components, and a critical component identifier configured to identify, from the plurality of components and based on the maintenance data, critical components that contribute disproportionately to production losses caused by the maintenance events. The system also includes a causality analyzer configured to determine causal connections between the maintenance events, based on operational dependencies between pairs of the plurality of components, and a maintenance policy generator configured to generate a maintenance policy governing future maintenance events for the plurality of components, based on the identified critical components and the causal connections.
According to another general aspect, a computer-implemented method for executing instructions stored on a non-transitory computer readable storage medium may include collecting maintenance data characterizing maintenance events associated with maintaining operations of a plurality of components, and generating a criticality score for each of the plurality of components, based on a comparison of each criticality score to a threshold, wherein each criticality score is calculated as an aggregation of factors related to production losses caused by the maintenance events. The method may include identifying, from the criticality scores, critical components that contribute to the production losses, determining causal connections between the maintenance events, based on operational dependencies between pairs of the plurality of components, and generating a maintenance policy governing future maintenance events for the plurality of components, based on the identified critical components and the causal connections.
According to another general aspect, a computer program product may be tangibly embodied on a non-transitory computer-readable storage medium and may include instructions. The instructions, when executed, are configured to cause at least one processor to collect maintenance data characterizing maintenance events associated with maintaining operations of a plurality of components, and identify, from the plurality of components and based on the maintenance data, critical components that contribute disproportionately to production losses caused by the maintenance events. The instructions, when executed, are further configured to cause the at least one processor to determine causal connections between the maintenance events, based on operational dependencies between pairs of the plurality of components, and generate a maintenance policy governing future maintenance events for the plurality of components, based on the identified critical components and the causal connections.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.
More particularly, the maintenance manager 102 may include a maintenance data collector 112 that is configured to collect various types of maintenance data. In the example of
For example, the components 104-110 should be understood generally to represent virtually any physical components that may be involved in operations of a production facility. For example, such production facilities may include manufacturing facilities designed to construct physical goods for sale. In other examples, the components 104-110 may be related to physical sorting or other movement of physical goods that have already been constructed, such as may occur in a warehouse, inventory management, or shipping facility, or in an oil or gas production facility.
Thus, the types of physical components represented by the components 104-110 are far too numerous to list here in detail, but would be apparent to one of skill in the art. By way of non-limiting example, however, it may be appreciated that the components 104-110 may represent, e.g., conveyer belts, assembly equipment, transportation equipment, robotic assistance, computers, safety equipment, tools, and virtually any other type of physical component that may be used in the types of production facilities referenced above, or in other production facilities.
By their nature, all such physical components are prone to eventual performance degradation or failure. Without preventive maintenance, such performance degradations or failures may lead to safety concerns, such as when equipment failure injures an employee of the production facility. Similarly, such performance degradations or failures may lead to environmental hazards, such as when equipment designed to handle hazardous waste malfunctions. Further, such performance degradations and failures may result in production delays within the production facility, resulting in lost profits and decreased customer satisfaction.
Moreover, even when preventative maintenance is undertaken, it may be necessary to take one or more components offline in order to perform a repair or other maintenance activity. The resulting downtime for such components may thus also lead to production delays and other production losses. Moreover, costs are incurred by such maintenance activities, including costs for employees or other persons responsible for executing the maintenance activities, costs for temporary or permanent replacement parts, costs associated with taking components offline and then putting the maintained components back online, and various other associated costs. Therefore, although preventative maintenance may reduce a likelihood of safety and environmental concerns, and may reduce a likelihood of abrupt component failures in critical situations and/or situations in which repair would be difficult or impossible, excessive or unnecessary preventative maintenance may nonetheless result in unnecessary delays, reductions in profitability or customer satisfaction, or other production losses, as compared to scenarios in which optimal maintenance policies are enacted.
Therefore, in the example of
In the example terminology of
For example, in some implementations, various ones of the components 104-110 may include, or be associated with, software that is designed to automatically generate report data in the event of a failure or other malfunction. Similarly, repair equipment used to repair a particular component may be configured to transmit repair activities undertaken. In other examples, repair personnel may be provided with appropriate hardware/software (e.g., by way of a graphical user interface of a repair device or computer associated therewith), so as to thereby provide the maintenance event data in a convenient, consistent manner. Additional details regarding example event data and event data formatting, including an example data schema for the event data repository 114, are provided below, e.g., with respect to
As referenced above, maintenance data collected by the maintenance data collector 112 may also include condition data collected by one or more condition sensors 116 and stored within the condition data repository 118. In this regard, such condition data may be understood to include virtually any data collected by an appropriate, corresponding type of sensor, which characterizes relevant conditions within the production facility, or associated with the production facility, that may potentially affect operations of the components 104-110.
For example, such condition data may be collected from the condition sensor 116 without being limited to a condition of a particular one of the components 104-110, or with respect to a particular maintenance event or activity associated with a particular one of the components 104-110. Instead, in example implementations, the condition sensor 116 may be positioned to collect condition data representing prevailing conditions within a vicinity of, or local to, one or more of the components 104-110.
By way of non-limiting example, such condition data may thus include temperature or pressure readings, weight or volume measurements, characterizations of ambient light, noise, or vibration, a presence or absence of a particular chemical or other substance, or virtually any other measurable or quantifiable condition that may exist within or around the production facility in question. Further, in some cases, the condition data may relate directly to specific ones of the components 104-110, or operations thereof. For example, many or all of the types of condition data just referenced may be collected with respect to operations of a particular component. Additionally, characterizations of such component operations also may be collected, such as a speed of component operation, a number of component operation in a given time period, a reliability of component operations or virtually any other metric that may potentially be related to characterizing a current or future need for maintenance activity. Further examples of types of condition data are provided below, e.g., with respect to
Thus, maintenance data collected by the maintenance data collector 112 generally includes any and all available information related to past or potential maintenance activities with respect to all of the components 104-110. Moreover, as referenced above, by their nature, all of the components 104-110 are prone, to varying degrees, to eventual performance degradation or failure. However, it also may be true that, even though all of the components 104-110 are subject to eventual performance degradation or failure, some of the components 104-110 may be more critical than others with respect to development of an optimal maintenance policy.
Consequently, the maintenance manager 102 is illustrated as including a critical component identifier 120 that is configured to identify such critical components from among the components 104-110. In this regard, as explained in detail below, such a critical component should be understood to include a component, or type or category of component, that, when experiencing failure, repair, or other maintenance activity, contributes disproportionately to overall production losses associated with maintenance of the components 104-110 as a whole, and/or that has a causal effect on downtime of other, related components.
In this regard, as also described in detail below, production losses should be understood in a broad and general sense. For example, such production losses would obviously include literal reductions in revenue and profitability that result directly from money spent on repair or other maintenance activities, and/or sales lost due to lack of timely availability of products for sale. Production losses should also be understood to include any indirect or even intangible losses that may occur, such as insurance or health costs associated with accidents or injuries experienced by employees, or customer dissatisfaction or general loss of reputation associated with environmentally harmful accidents that occur as a result of a failed maintenance policy. Production losses should also be understood to include reductions in items being produced for sale, including, e.g., a reduced number of individual items being produced (such as toys, clothes, cars, or any consumer good), or a reduced volume of a material being transported or produced (such as oil or gas). Thus, production losses as used herein should be understood to include all such actual costs and opportunity costs associated with failures of maintenance policy, as well as the types of less tangible factors just referenced, to the extent that they may be quantified for use in calculations performed by the maintenance manager 102, as described herein.
Additional example operations of the critical component identifier 120 are provided below, e.g., with respect to
Specifically, as shown, a downtime calculator 124 may be configured to calculate a downtime index characterizing a length of time during which a particular component or type of component is non-operational due to a component failure, or other maintenance activity which requires the component to be taken offline. As referenced above, a degree of criticality of a particular component may be characterized with respect to a relative or proportional contribution of that component (or failure thereof) to overall downtime or other metric related to production losses. In a simplified example, it may occur that, within a given time period, e.g., a month, the components 104-110 may experience a total, accumulated downtime among them of 4 days. If, however, the component 106 experiences a downtime of 3 of those 4 days, then the component 106 may be judged to contribute a high downtime index score (here, 75%) for inclusion within the overall, aggregated criticality score.
Somewhat similarly, a safety calculator 126 may be configured to compute a safety component of the overall criticality score. For example, the safety calculator 126 may access the event data repository 114 and/or the condition data repository 118, in order to determine a number or type of accident that may have occurred in conjunction with a failure of any one of the components 104-110, or some other safety metric. Again, a relative or proportional contribution of any one such component, or type of component, may be calculated.
Also within the score calculator 122, an environment calculator 128 may be configured to utilize available maintenance data to quantify or otherwise characterize environmental impact factors associated with failures of the components 104-110. As described above with respect to the calculators 124, 126, the environment calculator 128 may determine a relative or proportional occurrence or impact of environmental incidents associated with a particular component or type of component, as compared to overall types or quantities of environmental incidents experienced by the production facility as a whole, or by defined subsets thereof, within a given time period.
Upon completion of operation of the calculators 124-128, the score calculator 122 may proceed to compute a weighted, combined score for each included component. For example, as described in detail below, administrators of various production facilities may wish to give greater or lesser weight to the factors of downtime, safety, and/or environmental impact, and the score calculator 122 may be configurable in this regard.
Moreover, it will of course be appreciated that the various factors considered by the example score calculator 122 of
Further in
In the simplified example of
As a result of such dependencies between pairs of components, it may be difficult to determine whether and how to execute maintenance activities. For example, it may occur that the component 108 has a high criticality score, and, for example, may experience significant downtime due to malfunction and associated repair activities. Meanwhile, the component 106 may experience less downtime, and may experience lower repair costs when such downtime occurs. Nevertheless, if failure of the component 106 is a direct cause of failure of the component 108, it would be unwise to construct a maintenance policy focusing on maintenance of the component 108, since production losses would be minimized in a more efficient and cost effective manner by prioritizing maintenance activities (including preventative maintenance) with respect to the component 106.
In some cases, it may be straightforward to determine and characterize causal connections existing in conjunction with operational dependencies between pairs of components. For example, it may occur that the component 108 is a delicate component, which includes a number of interacting parts, which may be difficult, expensive, and time consuming to replace or repair. Meanwhile, the component 106 may be a component that exerts force during operation, such as a conveyer belt or transportation arm. Then, during a malfunction of the component 106, physical damage to the component 108 may occur, thereby necessitating repair and associated downtime for the component 108, which, in the example, may be significantly more costly and time consuming than the associated repair for the component 106.
In many cases, however, causal connections between pairs of components may not be so obvious or easy to identify or quantify. For example, in some cases, even though operations of the component 106 may precede operations of the components 108, 110 within an overall workflow of the production facility, failure of the component 106 may not necessarily result in any failure or associated downtime of one or both of the components 108, 110. For example, the component 108 may include a conveyer belt that is used to convey various types of equipment, and may depend on operation of the component 106 in the sense that items output by the component 106 are conveyed using the conveyer belt. Nonetheless, the conveyer belt may also be used to convey various other items or types of items, and failure of the component 106 to produce items for inclusion and operations of the conveyer belt will neither cause failure of the conveyer belt, nor inability of the conveyer belt to convey items produced by other components.
More generally, as described below with respect to
Thus, in practice, a causality analysis function library 132 may be constructed and utilized to quantify and characterize such causal connections, and to predict an efficacy of a potential maintenance policy. More specifically, and as described in detail below, the causality analysis function library 132 may be utilized to store available algorithms or other functions for characterizing a type or extent of causality that exists between two or more components.
In general, maintenance data from the repositories 114, 118 may be examined by the causality analyzer 130, and one or more functions from the causality analysis function library 132 may be utilized to analyze such historical maintenance data and thereby derive and characterize causal connections between components. For example, in the examples of
Based on outputs of the critical component identifier 120 and the causality analyzer 130, a maintenance policy generator 134 may be configured to provide a maintenance policy governing a type, extent, and timing of future maintenance activities. For example, such a maintenance policy might specify that the component 106 undergo specified types of maintenance activities twice a month, while the component 108 is scheduled for different maintenance activities according to a different schedule, e.g., monthly.
In addition to specifying component level maintenance activities as part of such maintenance policies, the maintenance policy generator 134 is capable of quantifying and otherwise characterizing relative benefits of potential maintenance policies with respect to actual or potential production losses occurred. For example, the maintenance policy generator 134 may provide a number of different potential maintenance policies, along with associated information regarding corresponding production and production losses, so that a user of the system 100 may select an appropriate, desired maintenance policy. Similarly, the maintenance policy generator 134 may provide an appropriate graphical user interface for such a user to explore various “what-if” scenarios with respect to relative effects of potential changes to the existing maintenance policy, as quantified with respect to associated potential production losses.
Put another way, the maintenance policy generator 134 essentially has access to a large solution space of potential maintenance policies provided by the critical component identifier 120 and the causality analyzer 130, and predicated on maintenance data received from the maintenance data collector 112. This solution space for potential maintenance policies may be explored manually, as just referenced, or may be explored using available algorithms. For example, the maintenance policy generator 134 may utilize a greedy algorithm, a genetic algorithm, or some other suitable algorithm, to thereby explore the available solution space until some suitable threshold or other metric is reached.
In the example of
Thus, for example, the at least one computer 136 may represent two or more computers operating in communication with one another. The at least one processor 138 may represent two or more processors operating in parallel, and the non-transitory computer readable storage medium 140 may represent virtually any storage medium that is capable of storing instructions which, when executed by the at least one processor 138, causes the at least one processor 138 to execute the various functions described herein with respect to the maintenance manager 102.
Of course,
Similarly, the maintenance manger 102 and the at least one computer 136 may be associated with an appropriate monitor or other display, to thereby enable a user of the system 100 to interact with the maintenance manager 102. For example, as referenced, the maintenance policy generator 134 may provide a suitable interface for allowing the user of the system 100 to explore and select from among available maintenance policies. More generally, one or more suitable user interfaces may be analyzed to allow the user of the system 100 to configure any of the maintenance data collector 112, the critical component identifier 120, the causality analyzer 130, or any other portion or sub-portion of the maintenance manager 102.
Further, although the maintenance manager 102 is illustrated as including a number of separate modules, it may be appreciated that the maintenance manager 102 of
In the example data processing operation 202, event data 208 and condition data 218 correspond generally to data stored in the event data repository 114 and the condition data repository 118, respectively. In the example of
By way of further example,
Further in
Also in
Finally in the example of
Referring back to
With reference to
Thus, as described above with respect to
For example, as may be appreciated from the above descriptions of
Then, it may be possible for the maintenance data collector 112 to infer, deduce, or otherwise obtain at least an approximate replacement value for any such missing data values within the event data 208. As a simplified example, it may occur that the event data 208 includes, for a specific failure, a known failure type 220 associated with a first valve. However, the corresponding failure location 212 may not be known from reported event data. Then, the maintenance data collector 112 may review the condition data 218 to determine a time of high pressure 222 and/or failed valve status 220, and may use a location of the one or more condition sensors 116 that detected such pressure/status condition data, in order to fill in a corresponding failure location 212 within the event data 208.
Then, during the critical component identification stage 204, the maintenance data 226 may be utilized during a critical component scoring operation 228. As described above with respect to the score calculator 122, and included calculators 124, 126, 128, such critical component scoring 228 may include 3 axes, illustrated in
As a result of the critical component identification stage 204, criticality scoring 236 may be provided from the critical component identifier 120 for use in the causality analysis stage 206 performed by the causality analyzer 130 of
In the example of
The ARIMA algorithm 242 is another example of a data mining algorithm that may be included within the library 238. The ARIMA algorithm 242 refers to the use of an Auto Regressive Integrated Moving Average model, which is particularly suited for time series analysis of data. That is, by sitting an ARIMA model to time series data, future points in the series may be predicted.
Further details regarding example implementations of the decision tree algorithm 240 or the ARIMA algorithm 242 are not provided herein, for the sake of conciseness. Instead, a Bayesian network 244 is utilized, e.g., with respect to
In the example of
In this way, a likelihood of a particular type and extent of total production losses associated with a specific maintenance policy under consideration may be estimated. Then, such resulting potential maintenance policies may be explored or considered by the maintenance policy generator 134, using manual or automated techniques, as described herein.
In the example of
From the plurality of components and based on the maintenance data, critical components that contribute disproportionately to production losses caused by the maintenance events may be identified (404). For example, the critical component identifier 120 may be configured to identify such critical components by implementing the type of scoring calculations described with respect to the score calculator 122. In this way, for example, it may be determined that, within a time period in which the components 104 and 106 experienced the only failures experienced by the components 104-110 of a given production facility, the component 106 was associated with a large majority of associated production losses, while the component 104 caused a relatively smaller contribution to such production losses. In this way, as described herein, subsequent maintenance policy analysis may precede with a greater focus on, in the example, the component 106.
Causal connections between the maintenance events may be determined, based on operational dependencies between pairs of a plurality of components (406). For example, the causality analyzer 130 may determine that the components 108, 110 exhibit operational dependencies on preceding component 106, and may investigate and characterize a type and extent of a causal connection between a maintenance event experienced by the component 106 and one or more maintenance events experienced by one or both of the components 108, 110.
For example, as described herein, in some scenarios, a failure of the component 106 will directly cause a corresponding failure of one or both of the components 108, 110. In many other scenarios, however, there may be a correlation between such failures or other maintenance events, which may or may not rise to a level of actual or direct causality. For example, in the examples provided below in which a Bayesian network is utilized, conditional probabilities associating a failure of a particular component with one or more preceding conditions, including failure of a preceding component, may be characterized. Thus, it may be appreciated that the term causal connection or causality should be understood to include potential or inferred causation, thereby including correlations and probabilities of relationships between failures or other maintenance events.
A maintenance policy governing future maintenance events for the plurality of components may be generated, based on the identified critical components and the causal connections (408). For example, the maintenance policy generator 134 may be utilized to explore, manually or in an automated fashion, a solution space of potential maintenance events and associated scheduling thereof, so as to thereby obtain one or more maintenance policies that will be acceptable to an administrator or other user of the system 100 of
Specifically, for example, a data processing stage 502 is illustrated as including data collecting (508) followed by data cleaning (510), to thereby populate a database 512 of maintenance data. As may be appreciated from the above descriptions of
Within the critical component identification stage 504, normalized, accumulated downtime may be calculated (514). For example, the downtime calculator 124 of the score calculator 122 may calculate a normalized score for equipment downtime of various types of equipment or other components, which may be characterized in proportion to a total downtime of components within a given production facility and within a given period of time.
Similarly, a normalized safety index value may be calculated (516), along with a normalized environment index value (518). As described above, although not specifically illustrated in the example of
Then, during the causality analysis stage 506, causality analysis may be executed (520), e.g., by the causality analyzer 130 of
In the example implementation of
In the example of
Once collected, the critical component identifier 120 may proceed to calculate total downtime for all components 104-110 within a specified period of time, as well as a downtime experienced by each component or type of component within the production facility (or portions thereof) in question (604). For example, assuming that the event data repository 114 maintains maintenance event data in accordance with the event data schema of
Then, component downtime may be normalized between values of 0 and 1, including finding a proportion of component downtime to total downtime, to thereby obtain a downtime index (606). In other words, as referenced above, within a total downtime calculated for components 104-110, a proportion of downtime experience by, for example, the component 104, relative to the total downtime, may be computed. Of course, similar calculations of proportional downtime may be executed for remaining ones of the components 106-110, or, may specifically, for any such component which experienced downtime within the relative timeframe. In this regard, it may be appreciated that each of the components 104-110 should be understood to represent, for example, a single component, or in other implementations, may represent a number of components which share a certain type or characteristic.
Then, continuing the example described above with respect to pseudo code 1, the normalized component downtime may be determined by first selecting a particular component or type of component, and then taking a ratio of a summation of all downtime for the component or type of component in question, relative to total downtime calculated using pseudo code 1, above. In this way, downtime by component may be calculated for each component or group of components, and normalized as a proportion to thereby obtain a normalized value between 0 and 1. Example pseudo code for performing such normalized component downtime calculations is provided below with respect to Pseudo code 2:
Similarly, a safety index may be calculated by finding a proportion of accidents for a given component or type of component, relative to a total number of accidents (608). Further, an environmental index may be calculated by finding a proportion of environmental incidents for a component or type of component, relative to a total number of environmental incidents (610).
More specifically, in continuing the example above as provided with respect to Pseudo code 1 and Pseudo code 2, the failure table Fail_Tab may be accessed to count a number of total accidents, as well as individual accidents in conjunction with corresponding components. Then, for each component or type of component, the count of accidents therefore may be compared to the total number of accidents, and the various components or types of components may be grouped to obtain a relative proportion for each. Similarly, a count for environmental incidents may be obtained from the failure table Fail_Tab, and individual components or groups of components may be identified, so as to again obtain a proportional contribution of each to the total count of environmental incidents. Example pseudo code associated with operations 608, 610 is provided below as Pseudo code 3:
Finally in
Score(A)=alpha*ACCU_downtime(A)+beta*safety_index+gamma*environment_index Equation 1
In equation 1, alpha, beta and gamma represent weight values, e.g., specified between 0 and 1, so that specific values for alpha, beta, and gamma may be set by users of the system 100, based on their preference or specific domain knowledge. A threshold may be selected, so that components having scores higher than threshold are determined to be critical components.
Thus, in the example of
In general, by itself, such a network structure may be available as part of a design of a production facility in question, and may be supplemented or otherwise leveraged by a domain expert utilizing system 100 of
Once the network structure has been determined, probability tables may be calculated from available maintenance data (704), as illustrated and described below with respect to
Thus, it may be appreciated that operations 702, 704 may be conducted by the causality analyzer 130, and may be dependent upon receipt of output of the maintenance data from the maintenance data collector 112, along with critical component scores received from the critical component identifier 120. For example, in determining the network structure in operation 702, the causality analyzer 130 may utilize only critical components identified by the critical component identifier 120. In other example implementations, the network structure may include all available and included components, but may perform analysis with respect to identified critical components (e.g., may calculate probability tables and associated potential maintenance policies only with respect to such critical components).
Then, operations 706-712 may be implemented by the maintenance policy generator 134. For example, as shown and described with respect to
In the example of
As already described, and as may be observed from the examples of FIGS. 7, operations 706-712 thus represent an iterative process for exploring a total solution space of possible maintenance policies. Such a solution space may be very large, particularly for production facilities having large numbers of components. In addition to potentially large numbers of components, potential maintenance policies may be relatively open-ended with respect to factors such as scheduling option. For example, in some cases maintenance scheduling may be essentially completely open-ended, in that the user of the system 100 is authorized to schedule maintenance as frequently as possible, given output of the system 100. In other example scenarios, scheduling constraints may exist, such as an intermittent availability of a supplier or repair personnel. In such cases, such maintenance constraints may be utilized to effectively reduce the otherwise available solution space of maintenance policies.
As also referenced, the iterative operations 706-712 may be executed in an automated fashion, so as to explore the solution space of possible maintenance policies in an efficient and thorough manner. For example, a genetic algorithm, a greedy algorithm, or other known technique exploring large solution spaces may be utilized.
In the example of
Specifically, in
Then, through analysis of available maintenance data, a number of times that the child node has a value of true (i.e., experiences a failure) when the parent node also has a value of true (i.e., also experiences a failure) may be counted. In a simplified example, the count of times when a child node has a value of true when the parent node has a value of true may be 3 within a given time period, while a count of a number of times that the child node equals false (i.e., does not fail) when the parent node has a value of true (i.e., experiences a failure) may equal 7. Then, the conditional probability that the child node has a value of true or false, given a value of the parent node as true, may be calculated as P(child=true|parent=true)=3/(7+3)=0.3, and P(child=false|parent=true)=7/(7+3)=0.7.
Thus, in the preceding example, it may be observed that a failure of the parent, more often than not, does not result in a failure of the child. In another example, if P(child=false|parent=false)=0.9, then it may be observed that a high correlation exists between parent and child components, because a continuing operation of the parent component is highly correlated with continuing operation of the child component. On the other hand, if P(child=false|parent=false)=0.01, then it may be observed that a failure of the parent node has a very small effect on a failure of the child node, so that the parent failure is not considered causal with respect to the child failure.
Thus, in the example of
Similarly, a table 906 illustrates conditional probabilities for an alternator failure associated with the node 806, depending on whether bad weather has a value of true or false. As shown, the probability of an alternator failure when bad weather=true is 0.1, while a probability of alternator failure not occurring in the presence of bad weather is 0.9. Meanwhile, a probability of alternator failure when bad weather has a value of false is 0.02, while the probability of alternator failure not occurring when bad weather has not occurred is 0.98. In table 912, the probability of an oil filter failure when bad weather=true is 0.1, while a probability of oil filter failure not occurring in the presence of bad weather is 0.9. Meanwhile, a probability of oil filter failure when bad weather has a value of false is 0.02, while the probability of oil filter failure not occurring when bad weather has not occurred is 0.98.
In a table 914, a probability of a monitoring failure is 0.02, while a probability of a monitoring failure not occurring is 0.98. Thus, in general, and as may be appreciated from the above discussion, a probability table of a child node of one or more parent nodes may be represented as being conditional upon one or more of the preceding parent nodes. For example, as shown in a table 916, a probability of a fuel filter failure represented by the node 816 may be represented as having a 0.3 chance of being true when the failure of the oil filter, represented by the node 812, is true, and has a value of 0.7 when the failure of the oil filter is false. The probability of fuel filter failure when the monitoring failure is true and the oil filter failure is false is 0.04, while the probability of the fuel filter failure not occurring when the monitoring failure is true and the oil filter failure is false is 0.96. As may be observed from
Then, probabilities of production loss, associated with the node 810, may be represented by the table 910. As illustrated therein, and as just described, conditional probabilities for such production loss may be calculated as accumulated probabilities of each branch of parent nodes. That is, for example, in table 910, the first row can be understood as follows: given that MF=T, FFF=T and BF=T, the probability of production loss (PL)=True is 0.4 and the probability of PL=False is 0.6. In other words, if all of the monitoring filter, fuel filter and battery have experienced failure, the probability of production loss is 40%. Similar comments apply to table 916.
Of course,
Moreover, various techniques may be used to calculate the aggregated conditional probabilities, where some such techniques will depend on external factors and on historical data, as described herein. Further, in practice, a binary representation for production loss may be insufficient. For example, continuous intervals may be used to replace true and false in the probability tables. For example, the probability of production loss in an amount between 0-100 liters may be 0.1, the probability of production loss in an amount between 100-1000 liters may be 0.2, and so on for all relevant intervals. Nonetheless, in such scenarios, associated calculations could be performed as described herein with respect to the binary example of
In practice, conditional probabilities reflecting an effect of the various maintenance activities 1002-1010 may be obtained from the historical maintenance data, and/or may be predicted based on a classifier train using the Bayesian network algorithm, or other appropriate data mining algorithm. Moreover, as described, parameters for the various maintenance nodes 1002-1010 may be varied, either manually or automatically, so as to attempt to minimize the value of production loss represented by the node 810.
In this way, an impact of each component may be quantitatively characterized with respect to a final result in terms of production loss, and, similarly, a quantitative impact of one or more maintenance activities may also be assessed. For example, from statistical information obtained through an analysis of historical maintenance data, a conclusion such as “maintenance of component A three times this month will result in a component failure with probability of XX %” may be obtained. Then, a corresponding Bayesian networks structure with a maintenance input of maintenance node 1002-1010 may be trained in accordance with the example of
Thus, the features and functions of the systems and methods described above with respect to
Implementations of the various techniques described herein may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. Implementations may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program, such as the computer program(s) described above, can be written in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps may be performed by one or more programmable processors executing a computer program to perform functions by operating on input data and generating output. Method steps also may be performed by, and an apparatus may be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. Elements of a computer may include at least one processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer also may include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory may be supplemented by, or incorporated in special purpose logic circuitry.
To provide for interaction with a user, implementations may be implemented on a computer having a display device, e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
Implementations may be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation, or any combination of such back-end, middleware, or front-end components. Components may be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
While certain features of the described implementations have been illustrated as described herein, many modifications, substitutions, changes and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the scope of the embodiments.