Statistical analysis such as Weibull analysis may be used to analyze reliability statistics, such as for component parts of data processing equipment such as computer equipment and the like. Such analyses may be accomplished utilizing commercial statistics packages, which can facilitate the analysis process by providing some good-of-fit statistics and data plots. Using such tools, iterative refinement of such analyses may involve manual assessment of graphical plots by an expert analyst.
For a detailed description of various examples, reference will now be made to the accompanying drawings, in which:
Manufacturers, such as computer equipment manufacturers, may offer a wide range of products which incorporate many tens of thousands or more of distinct types of field-replaceable parts. A manufacturer may implement systems for identifying and tracking individual components, which may number in the billions. Hence analyzing the reliability of field-replaceable components can present a significant data processing challenge.
In some approaches to reliability analysis, predictive models may be generated based on available empirical data. An assessment of the predictive models can provide an analyst with information that suggests the need for iterative refinement of the modeling process, such as by further modeling of different subsets of the empirical data. Assessment of predictive models can likewise enable an analyst to eliminate some predictive models from any further consideration.
It may be desirable to provide for the automatic assessment and classification of predictive reliability models at earlier stages of an analysis, such that certain models may be eliminated from further consideration before human intervention. Such automatic classification may increase the efficiency of reliability analysis. Automatic classification may thus improve the quality and value of reliability analysis, by reducing the amount of human intervention devoted to elimination of less useful models and focusing the inherently limited availability of human intervention on the more relevant models.
Often, manufacturers implement multi-source policies such that individual parts may be sourced on average from two, three, or more different suppliers. As used herein, the term “part” refers to any item or component, particularly field-replaceable components of computing and data processing systems and the like, for which reliability over time is of concern. Tracing each part by supplier individually can be a significant burden on reliability engineers. The burden is such that reliability analyses may only be undertaken after a strong suspicion of a serious reliability issue has arisen, such as concerns based on statistics reflecting field replacements spikes or customer escalations.
Reliability analyses may use statistical analyses such as Weibull distribution models to derive failure distribution plots enabling quality teams to predict failure rates and estimate ongoing reliability risks and future warranty costs from replacements. Accurate reliability analyses can assist in addressing reliability issues predictively rather than reactively.
Often, there is variation in reliability between suppliers or between design versions of a particular component, leading to more complicated and labor-intensive analysis of all the potentially different supplier/design version combinations. Even if the statistical analyses for possible combinations are automated, a human expert may still need to validate the prediction for each, such as by visually examining data plots and the corresponding statistics for the data to determine, for example, whether a given plot statistically “fits” the data, whether there is sufficient data to support meaningful analysis, or whether there an indication of an existing or future reliability problem. As used herein, the expression “statistical fit” and related terms, such as “goodness of fit,” or “substantiality of fit,” refers to a degree of correlation between one set of data and another, for example, the degree of correlation between empirical data and a predictive model based on that data. Statistical fit, and the “goodness” or substantiality thereof, may not be susceptible to precise definition, in that assessment of the extent or “goodness” substantiality of statistical fit may involve a degree of subjective or relative judgment, even though certain mathematical or statistical characterizations of statistical fitness may provide some guidance in such assessments.
Particularly for the purposes of identifying and predicting reliability issues, the analysis necessary is performed repeatedly and frequently. For manufacturers with large numbers of field-replaceable parts, the scope of the analytical task may be tedious if not impossible to be performed by humans. Examples are provided herein which utilize statistical modeling to generate Weibull distribution models from empirical reliability data sets and to apply a machine-learned algorithm to automatically classify reliability models based on predicted failure rates and the confidence intervals of such predictions. The application of machine learning processes to Weibull analysis advantageously allows for deeper insights to be drawn into the characteristics of numerous component subpopulations, and improves the both overall value of the analyses and the performance of computational hardware that can be utilized to perform the analyses.
In this description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the examples disclosed herein. It will be apparent, however, to one skilled in the art that the disclosed example implementations may be practiced without these specific details. In other instances, structure and devices are shown in block diagram form in order to avoid obscuring the disclosed examples. Moreover, the language used in this disclosure has been principally selected for readability and instructional purposes and may not have been selected to delineate or circumscribe the inventive subject matter, resorting to the claims being necessary to determine such inventive subject matter. Reference in the specification to “one example” or to “an example” means that a particular feature, structure, or characteristic described in connection with the examples is included in at least one implementation.
The terms “computing system” and “computing resource” are intended broadly to refer to at least one electronic computing device that includes, but is not limited to including, a single computer, virtual machine, virtual container, host, server, laptop, and/or mobile device, or to a plurality of electronic computing devices working together to perform the function(s) described as being performed on or by the computing system or computing resource. The terms also may be used to refer to a number of such electronic computing devices in electronic communication with one another, such as via a computer network.
The term “computer processor” is intended broadly to refer to one or more electronic components typically found in computing systems, such as microprocessors, microcontrollers, application-specific integrated circuits (ASICS), specifically-configured integrated circuits, and the like, which may include and/or cooperate with one or more memory resources, to perform functions through execution of sequences of programming instructions.
The terms “memory” and “memory resources” are intended broadly to refer to devices providing for storage and retrieval of data and programming instructions, including, without limitation: one or more integrated circuit (IC) memory devices, particularly semiconductor memory devices; modules consisting of one or more discrete memory devices; and mass storage devices such as magnetic, optical, and solid-state “hard drives.” Semiconductor memory devices fall into a variety of classes, including, without limitation: read-only-memory (ROM); random access memory (RAM), which includes many sub-classes such as static RAM (SRAM), dynamic RAM (DRAM), non-volatile RAM (NVRAM), and others; electrically-alterable memory; flash memory; electrically-erasable programmable read-only memory (EEPROM), and others.
The term “non-transitory storage medium” is intended broadly to include any and all of the above-described forms of memory resources, and one or more such resources, comprising physical, tangible storage media that store the contents described as being stored thereon.
The term “cloud,” as in “cloud computing” or “cloud resource,” refers to a paradigm that enables ubiquitous access to shared pools of configurable computing resources and higher-level services that can be rapidly provisioned with minimal management effort; often, cloud resources are accessed via the Internet. An advantage of cloud computing and cloud resources is that a group of networked computing resources providing services need not be individually addressed or managed by users; instead, an entire provider-managed combination or suite of hardware and software can be thought of as an amorphous “cloud.”
The terms “application,” “function,” and “module” refer to one or more computing, programs, processes, workloads, threads and/or sets of computing instructions executed by a computing system, and to the computing hardware upon which such instructions may be performed. Example implementations of applications, functions, and modules include software modules, software objects, software instances and/or other types of executable code. The use of the term “application instance” when used in the context of cloud computing is intended to refer to an instance within the cloud infrastructure for executing applications (e.g., for a resource user in that user's isolated instance).
Any application, function, module described herein may be implemented in various hardware arrangements and configurations to embody the operational behavior of the application, function, or module described. As a non-limiting example, an application, function, or module may be implemented in hardware including a microprocessor, microcontroller, or the like, incorporating or cooperating with program storage hardware embodying instructions to control the hardware to operate as described. As another non-limiting example, an application, function, or module may be implemented in hardware including application-specific integrated circuitry (ASIC) tangibly embodying the function of such application, function, or module as described.
The term “machine learning” refers to algorithms and statistical models, that computers and computing systems use to perform specific tasks without using explicit instructions, instead relying on models, inference and other techniques. Machine learning is considered a subset of the broader field of artificial intelligence. “Machine-learned algorithms” are algorithms which generally involve accepting and processing input data according to a desired function and/or to generate desired output data. The desired function of a machine-learned algorithm, typically implemented by a computer processor, is established by a using one or more sample datasets, known as “training data,” to effectively “program” the processor to perform the desired function. Thus, machine-learned algorithms enable a processor to perform tasks without having explicit programming to perform such tasks.
For example, if a desired task is to recognize the presence of a particular data pattern within a given data object, the training data for a machine learning algorithm may include data objects known to be with and without the pattern to be recognized. Once trained, a system including a processor implementing the machine-learned algorithm takes an unknown dataset as input, and generates an output or performs some desired function according to its training. In the foregoing pattern recognition example, application of the machine-learned algorithm on a data object (the input to the algorithm) may cause the data object to be classified according to whether or not the pattern was recognized in the data object. In this example, classification of the input data object according to the training of the algorithm constitutes the desired output of the machine-learned algorithm.
Referring to
In one example, partitioned dataset 106 may be created through use of a distributed processing database. The distributed processing database allows for the distributed processing of large data sets across clusters of computers using simple programming models. The distributed processing database can scale up from single servers to thousands of machines, each offering local computation and storage, making the framework suitable for the purposes of this example, where very large amounts of raw data, such as part consumption data 102 and part returns data 104, may be involved.
A reliability model generator module 108 in system 100 operates on partitions of data in partitioned dataset 106 to perform a statistical data fit operation on data partitions, in order to generate reliability models from data partitions. In this example, a Weibull distribution analysis is performed on a partition to generate a statistical model applying a Weibull two-parameter distribution approach to estimate the probability density function (PDF) over a desired confidence interval using the time-to-failure data for each part based on unit shipment date. In one example, reliability model generator module 108 utilizes the R programming language and software environment to generate the statistical models. The R language and environment is widely used among statisticians and data analysts for data modeling, particularly when, as in this example, potentially large quantities of data and large numbers of data partitions in partitioned dataset 106 may be involved and scalability is desirable.
In
In
In
In
Each of the reliability models of
A Weibull distribution model which may be considered to have a favorable goodness-of-fit metric may nevertheless provide a poor explanation of underlying empirical data if it misses important differences in the sub-populations or errors in the data, such as mistaken assumptions about when a unit started life. A reliability engineer may be required to visually and subjectively determine whether a given model is sufficient and/or if there are any unusual features which the model does not explain, indicating the need for further investigation, and potentially the first sign of a new failure mode.
With continued reference to
In one example, each function of a model and its underlying empirical data is converted into the desired input matrix format as follows:
Empirical input data may already be in a Cumulative Distribution Probability format, or can be converted to such a format, such that conversion module 110 may convert the empirical data to a set of a predetermined number of data points, for example, 100 data points, where the probability on the empirical data is between 0.01 and 0.99 divided evenly. Conversion module 110 may change these values to improve accuracy on certain models.
For each median confidence function of a reliability prediction model, which is a function representing a “best fit” to the empirical data attainable by reliability model generator module 108, the specific probability range to be used may be unknown without prior knowledge of the function. For that reason, in one example conversion module 110 evaluates the median confidence function in a two-pass approach, where first the function is evaluated in a first probability range, for example, [0.0001 to 0.1], to generate a larger number of data points for example, 10,000 points. For reliability analyses, the basic behavior of median confidence function is to receive a failure probability as input and to output a number of days for a component to reach that failure probability. The period of interest for any given analysis may differ. In one example, a period of interest corresponding to a warranty period for an item may be desired. Thus, on a second pass, conversion module 110 trims the evaluation to yield only the valid probabilities within the period of interest. Based on this valid range the function is then evaluated again to derive a predetermined number of data points, for example, 100 data points.
For each of the upper and lower confidence functions, conversion module 110 may employ the same approach as used with the median confidence function, namely a two-pass approach resulting in derivation of a predetermined number of data points, for example, 100 data points, for each of the upper and lower confidence functions.
Conversion module 110 thus produces an input matrix consisting of the collections of data points for the empirical data, the median confidence function, and the upper and lower confidence functions. For example, a resulting input matrix derived by conversion module 110 may consist of a total of 400 data points for a given model, with 100 points from the empirical data, 100 points from the evaluation of the empirical function evaluated from a probability of [0.0001] to a probability PM at or near the endpoint of the interval of interest, 100 points from the upper confidence function evaluated from a probability of [0.0001] to a probability PUCI at or near the endpoint of the interval of interest, and 100 points from the upper confidence function evaluated from a probability of [0.0001] to a probability PLCI at or near the interval of interest.
It is to be noted that PM≠PUCI≠PLCI since the behaviors of the median confidence function and the upper and lower confidence functions are different, and each has its own probability range. To this list an N input parameter may be added, where N represents the number of empirical data values available. The addition of this parameter may increase the overall machine learning algorithm accuracy, since it may facilitate finding patterns in datasets normally deemed to have an insufficient number of data points.
Referring to
A “no fit” class, represented by block 114 in
A “fit and high rate” class of an input matrix, represented by block 116 in
A “fit and low rate” class of an input data matrix, represented by block 118 in
Finally, in this example, an “inconclusive” class of an input data matrix, represented by block 120 in
Depending upon the class to which a given input matrix is assigned by machine-learned algorithm module 112, different actions may be taken. In the example of
Different machine learning algorithms may be utilized in the implementation of machine-learned algorithm module 112, including, for example, logistic regression algorithms, functions from the open-source XGBoost software library, k-nearest neighborhood (k-NN) algorithms, artificial neural networks, and decision trees, and random decision forest (random forest) algorithms. In one example, a random forest algorithm provides desirable results, as reflected particularly in its Receiver Operating Characteristic, Multi-class Area Under the Curve (“ROC-MAUC”) metrics. The following Table 1 lists the values of all the metrics from a random forest algorithm in accordance with one example.
In one example, machine-learned algorithm module 112 is trained using a grid search on multiple combinations of algorithm parameters. In one example, twelve combinations of the following three parameters may be used: (1) minimum sample split, which is the minimum number of samples required to split an internal node; (2) maximum depth, which is used to limit overtraining; and (3) minimum samples leaf, which is the minimum number of samples required to be at a leaf node. A combination yielding a ROC-MAUC score of at least a predetermined minimum value may then be selected for the algorithm. In one example, a minimum sample split of 30, a maximum depth of 5 and a minimum samples leaf of 10 yields an ROC-MAUC score of 0.905240059, as shown in the following Table 2:
In one example, machine learned algorithm module 112 uses 100 numerical decisions trees and creates a bootstrapped dataset (which allows repetition) with random numbers of features at each stage for each tree. After the bootstrapped dataset is created for each tree the Gini index is used as the split quality criteria. (A Gini index is a measure of statistical dispersion intended to represent the distribution of data within a data set). For algorithm validation a new record which is not part of the training data is passed against all built trees and an aggregate decision is made from the ensemble results of all the trees (normally called a “bagging” process).
As shown in
Typically, memory such as memory 406 in memory resource 404 in
Turning to
In block 504, an implementation of reliability model generating module 108 from
In block 506, a matrix of data points is generated, to include a plurality of data points representing each of the functions comprising the reliability model (e.g., upper, median, and lower confidence functions) as well as the empirical data.
In block 508, the matrix generated in block 506 is applied as input to a machine-learned algorithm module to automatically classify the model generated in block 504 into one of a predetermined plurality of classes, as described herein.
Certain terms have been used throughout this description and claims to refer to particular system components. As one skilled in the art will appreciate, different parties may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not function. In this disclosure and claims, the terms “including” and “comprising” are used in an open-ended fashion, and thus should be interpreted to mean “including, but not limited to . . . . ” Also, the term “couple” or “couples” is intended to mean either an indirect or direct wired or wireless connection. Thus, if a first device couples to a second device, that connection may be through a direct connection or through an indirect connection via other devices and connections. The recitation “based on” is intended to mean “based at least in part on.” Therefore, if X is based on Y, X may be a function of Y and any number of other factors.
The above discussion is meant to be illustrative of the principles and various implementations of the present disclosure. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.