The present disclosure relates to machine learning and, more specifically, to rule generation for classifying good quality products from bad quality products based on database variables available in process monitoring data.
There presently is no method that can confirm weld quality in ultrasonic welding of sheet metals. In the part, confirming weld quality have included tedious feature identification and building black-box classifiers to ascertain quality from process monitoring data. This manufacturing process is so sensitive to environmental variables such as, the welding machine, ambient temperature and, humidity, tool wear etc., that every minor change in any of these requires the entire exercise from identifying important features to building a black-box classifier to be repeated manually. Furthermore, the black-box classifiers do not yield themselves to understanding the physics of this process.
The background description provided here is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
A system includes at least one processor and a memory coupled to the at least one processor. The memory stores a dimensionally aware model generated based on a training set and guided by feature dimensions and instructions for execution by the at least one processor. The instructions include, in response to receiving a set of data from a user device, identifying a set of features from the set of data and applying the dimensionally aware model to the set of features by implementing a boundary representation. The instructions include classifying the set of features as acceptable in response to the implementation of the boundary representation indicating the set of features are outside the boundary representation, classifying the set of features as unacceptable in response to the implementation of the boundary representation indicating the set of features are inside the boundary representation, and generating, for display on the user device, an alert based on the classification.
In a continuous manufacturing process, such as ultrasonic welding, the overall quality of the process depends on machining quality at every time step and their coordination with the past and future steps. Such a manufacturing process needs to be analyzed and monitored at every time step to look for signature properties of measurable features denoting the quality of the product until the current time step to decide whether the manufacturing process must be continued to its completion or should be rejected due to aberrations already observed. Machine learning methods are typically employed from existing data of a manufacturing process to bring out acceptable signatures.
Although machine learning methods can learn the hidden rules associating features of time series data, the derived rules are often meaningless and often do not even conform to a dimensionally correct rule. In this project, a dimensionally aware rule mining approach has been developed based on genetic programming and recently developed automated rule discovery methods to decipher rules that have a physical meaning. In addition to finding a suitable classifier for evaluating whether a manufactured product is a ‘pass’, another motivation for our study is to come up with a better physical and scientific insight to the complex manufacturing process from the derived, dimensionally aware, and meaningful rules.
The present disclosure develops a data classification technology that receives raw manufacturing time series data for a physical process as input and provides the user with dimensionally meaningful rules involving process features which discriminate good (‘acceptable’) and bad (‘un-acceptable’) cases. Any classification task is preceded by “feature creation” and “feature selection” tasks that are traditionally performed manually by domain experts.
The present new classification technology uses features created using basic mathematical functions such a differentiation, integration, and Fourier transform from time series of supplied manufacturing data and proposes a bi-objective optimization based machine learning approach to automatically deduce meaningful rules. This method is able to find simple-structured rules involving only a few features (two to four), thereby allowing engineers to isolate and comprehend a few critical features and their relationships for classifying good manufacturing processes from bad ones. Furthermore, the evolved rules are adapted to be dimensionally correct as much as possible by using problem constants, so that the rules are physically meaningful. The overall procedure is generic and ready to be applied to other similar manufacturing problems.
Further areas of applicability of the present disclosure will become apparent from the detailed description, the claims, and the drawings. The detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the disclosure.
The present disclosure will become more fully understood from the detailed description and the accompanying drawings.
In the drawings, reference numbers may be reused to identify similar and/or identical elements.
To classify whether a production event resulted in an acceptable or unacceptable product, a dimensionally aware rule extraction system generates a machine learning system to classify an individual production event based on an identified set of salient production features. For example, a set of training data for both good (acceptable) and bad (unacceptable) production items, such as a welded item, is used to create a machine learning model. The machine learning model is trained using time series production data, for example, from welding of the weld item. From the time series data, a machine learning model is generated using genetic programming to identify the set of salient features from the training data, which may be the base features or non-linear combination thereof, and determine boundaries between the good and bad data using linear regression.
In various implementations, the machine learning model is trained and generates a set of decision boundaries in form of mathematical expressions composed of base features or non-linear combination thereof. The method uses genetic programming based bi-objective population based optimizer for learning the structure of constituent sub-expressions of these decision boundaries, which is followed by linear regression for learning the coefficients of these constituents. Each boundary or equation of the set of boundaries may have a different rate of error as well as a different complexity. To select one of the boundaries as a threshold equation, the dimensionally aware rule extraction system may identify which boundary includes an acceptable amount of error as well as an acceptable amount of complexity. In various implementations, the dimensionally aware rule extraction system may output the set of boundaries for a user to select, which the machine learning model then implements to classify incoming data.
The machine learning method generates a set of Pareto optimal or PO classifiers. An additional element of the dimensionally aware rule extraction system is the dimensional awareness. When generating the machine learning model and analyzing the time series data, the machine learning model can be provided additional user preference on acceptable dimensional inconsistency. An example of dimensionally inconsistent expression is one in which a feature having the units of distance (for example) is added to another feature having the dimensions of power. If the user prefers solutions with no dimensional inconsistency, then the machine learning model can be used to either filter out such solutions from the set of trade-off classifiers or use this metric to promote solutions with lower dimensional inconsistency during optimization. This results in the generation of boundaries that make practical sense and can be adjusted or implemented during production of the weld item to increase the likelihood that the weld item is good. Furthermore, such dimensionally consistent rules lend themselves to physical understanding of the system as well.
The user may also decide to use the rule generation in tandem with dimensional consistency check so that the dimensionally consistent rules can be preferred and promoted during the optimization process and not just at the end of it.
The dimensionally aware rule extraction system is designed to develop a computationally efficient machine learning methodology for extracting classification rules from time series data involving a routine manufacturing application. For example, as lowering of battery costs is driving the sales and projections of electric vehicles up, so has the research interest in understanding the underlying physics of core manufacturing processes involved in manufacturing Lithium-Ion batteries.
This system aims at learning interpretable and meaningful classification rules relating features of time series data of a manufacturing process so that the rules can be used to determine the quality of the product manufactured. The term “interpretable-rules” in the context of this system refers to rules in the form of mathematical expressions/equations involving the process features, process constants, and some simple operations such as addition, subtraction, multiplication, and division. The term “meaningful-rules” in the context of this system refers to the idea of aforementioned expressions being physically meaningful by being dimensionally consistent.
In the machine learning literature, classifiers that are most accurate are also least interpretable. Linear classifiers, such as Linear Support Vector Machines, lie at one end of the spectrum of classifiers that are easy to interpret but have poor performance on realistic complex data. On the other hand, something like Deep Neural Networks perform very well on complex data yet are very hard to interpret by humans.
In various implementations, the system interprets and classifies weld quality. For each weld produced, particular time series data is obtained. For example, the following time series sensor data can be available for the weld duration: (i) power consumed by the ultrasonic transducer in Watts, (ii) sonotrode tip movement along the direction of clamping force in mm, and (iii) acoustic data from a fixed ultrasonic microphone in Pascals. Such time series data is shown in
The three aforementioned data can be recorded at a sampling rate of 100,000 samples per second. In an example system, a constant stream of weld data is forwarded to a classifier that can successfully classify the Go/NoGo (e.g., good/not good) classes with zero false positives (type-II error). The inputs to the classifier include power data, acoustic data, sonotrode tip movement data, and noise respectively.
Furthermore, once the classifier is performing “reasonably” well, characterized by a suspect rate for the current batch K of welds below a user defined value a, another machine learning method learns dimensionally consistent rules that exist in the Go welds and not in NoGo welds or vice versa. This classifier is also known as Dimensionally Aware Genetic Programming or “DAGP.”
In this system, three tasks are of interest. Task-1 pertains to generation of features and task-2 pertains to feature selection and classifier identification. Task-3 pertains to providing the user additional information about classifier in regards to its adherence to the law of dimensional homogeneity.
In traditional machine learning methods, once the data is cleaned, the first task is to create a set of features. Most of the times, domain knowledge is used to create these features from cleaned data. However, manually coming up with features is difficult and time consuming. In the present disclosure, Genetic Programming or GP is used to create features from cleaned time series data using some basic mathematical constructs, such as addition, subtraction, differentiation, integration, etc.
Once a set of features has been generated, the next task in any classifier building process is to first identify a small subset of features deemed most fit to yield high classification accuracy. This step is known as feature selection. Subsequently, building a classifier from this small subset of “high performing” features entails optimizing the parameters of some classifier model, given this feature set. The feature selection and optimizing of a classification model is inherently a Bi-level optimization problem, with feature subset selection being a higher level decision and classifier building being a lower level decision. However, to reduce the complexity of this problem, a small feature subset is first selected using manual methods, such a principal component analysis (PCA), univariate selection, correlation matrix with heat map, and even genetic algorithms. Then, optimization of the parameters of the classification model is performed using such a set of features. A GP is implemented in the dimensionally aware classification system to achieve automated feature generation, feature engineering, feature selection, selection of classification model, and then optimization of parameters of classification model, all in one algorithm.
Preferring dimensional consistent information (data) is a task unique to the classifier. It will also provide the user with additional information about how well some classification rule adheres to the law of dimensional homogeneity. If two rules have similar classification accuracy, then the rule that is dimensionally consistent can be chosen by the user. Furthermore, a rule which is not only accurate in classification accuracy but also dimensionally consistent, is a prime candidate for understanding the science of the underlying process producing the date. In our case, this data is the USW process. The motivation for such a strategy is to have a better physical insight to the complex manufacturing process from the derived, dimensionally aware and meaningful rules.
GPs have been known to be excellent for non-linear symbolic regression and a number of commercial software that are based on the same. However, knowledge discovery discovers symbolic regression in that the model shall not only fit the data well but also be plausible and human interpretable. The key to inducing such knowledge is to incorporate semantic content and heuristics encapsulating the human interpretability and plausibility aspect into the search process. In this system, dimensional consistency is chosen to be a guiding principle in discovering rules that not only have low error of fit on data but are also dimensionally consistent.
The strategy of the DAGP is learning the structure and weights of a rule separately, which has shown to be a good strategy. The DAGP breaks the problem of learning rules into two parts: (i) learning the structure and (ii) learning the weights. It uses a GP for finding the optimal structure of a rule and some classical method, OLS regression in symbolic regression task and linear SVM in binary classification task, for learning the weights in a rule. Furthermore, DAGP solves a bi-objective problem to effectively control bloating which is a very common problem encountered with single objective GP algorithms. For classification problems with highly biased class data, it is important to produce synthetic data using algorithms such as ADASYN so that classification algorithms can perform satisfactorily.
The classification data, including synthetic minority class data, is used in visualization algorithms such as t-SNE to get some qualitative learnings about the data, as described in
Now referring to
LVT data is time series that captures the movement of the sonotrode tip orthogonal to the direction of sonotrode vibration by a linear variable differential transformer sensor. It is recorded at a sampling rate of 100 kHz.
ASO data is a time series that captures the sound data during a weld using a highly sensitive microphone (mic) with an audio range of 20 Hz to 40 kHz. It is recorded at a sampling rate of 100 kHz.
FQS data is a time series that captures the vibratory movement of a sonotrode tip. The parent sensor of this data is provided by the manufacturer of weld equipment. Every sonotrode has a slightly different resonance frequency in the ball park of 20 kHz. Hence, this time series is nothing but a sinusoid of constant frequency for entire duration of a weld. This data may be used for detecting a change in the tool. It is recorded at a sampling rate of 100 kHz.
PWS data is a time series that can be obtained from PWL data by taking data corresponding to the duration of the weld and then down sampling it to 100 Hz. An example of this time series is shown in
Referring to
The classification module 208 classifies the production data based on a machine learning model generated by a model generation module 216. As described above, the classification module 208 may calculate where the production data is classified based on the boundary described by an equation that includes variables that represent particular features of the production data. In various implementations, a salient features database 220 may instruct the data analysis module 204 as to which features the raw production data should be transformed into. In this way, the data analysis module 204 can extract the salient features of the production data. Additionally or alternatively, the model generation module 216 can directly instruct the data analysis module 204 which features are relevant to the presently implemented machine learning model version.
As shown in the dimensionally aware rule extraction system 200, each machine learning model generated by the model generation module 216 can store which features are salient to that particular model in the salient features database 220. In various implementations, a display module 224 can obtain the set of salient features from the set of salient features database 220 and present the salient features to a user. The display module 224 may be incorporated into the computer 202 that has a display 226 implemented by a processor with a memory. The display 226 may be used to generate alerts or messages corresponding to whether the data is unacceptable or unacceptable as will be described in more detail below. Then, the user can relate the salient features to the production process. For example, if the time to weld is particularly relevant and a main feature included in a boundary equation, once the user is in possession of this information (including the boundary equation), the user can adjust the production process as needed to increase the likelihood that a particular weld event will result in an acceptable weld.
Once the classification module 208 calculates a location of the production data with respect to the boundary equation, the classification module 208 forwards to an alert module 228 whether the production data indicates though an indicator that the corresponding production event was “acceptable” or “unacceptable” with an indicator that illustrated in the display 226. The alert module 228 may generate an alert (visual, haptic, oral) indicating when the production data indicates that the corresponding production event is unacceptable. Then, the alert condition may be forwarded to the display module 224 for display to a user, for example, if the alert is visual, such as through the indicator on the display 126. In various implementations, the display module 224 also displays an indication when the production event was acceptable. Additionally, in example implementations, the production data may only be stored in the production time-series database 212 when the production data is classified as acceptable.
Each weld had a unique ID referred to as Weld ID (WID). For each weld, two kinds of data are obtained: (a) weld inspection quality values and (b) raw time series data. The inspection quality data carried information on whether a weld belonged to the Go class or the NoGo class. The raw data obtained for each weld is shown and described with respect to
Before extracting features from the weld data, first the location of the weld is identified in the time series corresponding to the welding process. For example, as shown in
The DAGP then learns rules at 324, which is described in detail in
The user may also decide to use modules 324 and 328 in tandem so that the dimensionally consistent rules can be preferred and promoted during the optimization process and not just at the end of it.
To quantify dimensional mismatch penalty in a rule found by DAGP, for example, the rule learning part of DAGP may be used for solving a symbolic regression problem relating regress and (y) and regressors (xk, k ∈{1,2, . . . , nx}), which yields a set of PO rules. An example PO rule is:
where w0 is a bias term, nt is the total number of terms, wi is the regression coeffcient for term ti and ti is some function of regressors xk, k ∈{1, 2, . . . , nx}.
Different classification methods generally offer a trade-off between classification accuracy and human interpretability. A practitioner has to choose in the early stages of a classification task what is more important to them. The best classification accuracy is typically achieved by black-box models such as neural networks, random forests, kernel based SVMs, or a complicated ensemble of all of these methods. On the other hand, models whose predictions are easy to interpret and communicate are usually very poor in their predictive capabilities, such as linear SVMs or a single decision tree.
The power of human interpretability of a model or classifier lies in the potential (of such a model) for knowledge discovery. Take the example of face recognition algorithms using deep learning (DL). If a deep learning model of face recognition can be human interpreted to discover that the relative linear proportions of eye-brows, nose, and lips over the face are the most important features based on which a facial recognition decision is made, then that is a great discovery.
In the context of classification of the ultrasonic weld data, any knowledge about: (i) what features are important in deciding the quality of a weld and (ii) how different features of the welds interact with each other to decide the quality of a weld, can be considered vital knowledge.
DAGP learns a rule of the form given by the above equation by letting GP optimize the structure of rules and letting some efficient classical method to optimize the corresponding weights in those rules. For a symbolic regression task, this classical method is OLS method of estimation. For the binary classification task, a linear SVM for this job is chosen. This is because the results of linear SVM are considered very interpretable. The challenge lies in finding the right number of higher dimensions and the right features/derived-features corresponding to those dimensions in which the data is linearly separable. In such a space, a linear SVM will be able to find out an appropriate separation plane with relative ease, provided that the decision boundary is not discontinuous. Derived features are features that are composed from the initial set of hand crafted features using basic operations such as addition, subtraction, multiplication, and division.
Referring now to
y=−x
1
2+2.02x1·x2−3.05x22+1.98=0
where x1 and x2 are the two features for this data. The data of hypothetical Go class (y<0) is shown in green and the data of hypothetical NoGo class (y≥0) is shown in red. Clearly, the above equation for
Now consider the following three features, namely x12, x22, and x1·x2. These three features are called derived features as they were not provided with the original features of the problem but are derived from the same. Now, if these three features are provided to a linear SVM algorithm, it will perform exceedingly well on the same data. The reason being that in this modified 3-dimensional feature space, the data is linearly separable. Working with a derived feature space has the advantage of keeping the classifier more interpretable and not obfuscating the derived features by performing complex operations on the original feature space.
Referring now to
In a further example, consider a classification problem with n0 observations, nx number of features (xi, i ∈{1, 2, . . . , nx}), and no binary class labels (yi∈{0,1}, ∀i∈{1,2, . . . , n0}) initially provided with the problem. When solving a classification problem using DAGP, consider a DAGP individual with same rule structure as shown in the PO rule equation. The terms ti can be considered as derived features obtained by simple operations of {+, −, ×, ÷,} on the original features. The weights of this individual are then learned using a linear SVM method and the misclassification error at the end of weight optimization by SVM is assign as error fitness to the individual. The complexity fitness is calculated same as in case of the symbolic regression case, i.e. total number of tree nodes in the terms of rule corresponding to the DAGP individual.
Note that for the USW data, the cost of misclassifying NoGo should be much more than the cost of misclassifying Go weld data. For this reason, the cost matrix used by the linear SVM for arriving at the weights is kept so that the cost of making type-II error on the training set is set 25 times higher than cost of making a type-I error.
Referring to
Once this parent population is ranked, the parent selection 616 process produces list a of parents that are allowed to reproduce children for the next generation. DAGP uses tournament selection for selecting parents to reproduce. Such a parent selection process promotes the fittest individuals in the population to mate more often. Once these parents are selected, they go through genetic operations of crossover 620 and mutation 624 to produce a child population of N individuals. DAGP uses two types of crossovers namely low-level crossover and a high-level crossover. Any two parent individuals chosen to reproduce undergo a crossover with a probability pc. With a (preferably) small probability when the individuals do not go through a crossover operation, the outcome of the crossover operation are two child individuals that are identical copies of their parents.
When crossover does happen, then it can either be of high-level type with a probability of pch or of low-level type with a probability pcl=1−pch. Consider two individuals from the parent pool, having three and two terms respectively. Then for a high level crossover to occur between these two individuals, DAGP randomly chooses one term from each individual to cross and then swaps them between the individuals to create two children. If a low level crossover need to be carried out, then DAGP first chooses one term from each parent to cross and then carries out a subtree crossover among those two terms.
After the crossover operation, the N child individuals undergo mutation operation. For an individual, a mutation is carried out with probability pm otherwise the child individual is left unchanged. In DAGP, to mutate an individual, first one of the terms is randomly selected for carrying out the mutation operation and then a sub-tree mutation is carried out on the tree of that term.
After undergoing the crossover and mutation operations, DAGP evaluates 628 the fitness of the N child individuals. Now these N children are combined with the N parent individuals of the current generation to obtain a merged population 632 of size 2N. This population of 2N individuals is passed on to the survivor selection 636 procedure, where all the 2N individuals are again ranked and assigned crowding distances before selecting N individuals using the crowded tournament selection operator. This population of N individuals is again assigned rank and crowding distance 640 values.
If termination condition 644 is not met, these N individuals become the parent population for the next generation returning to 616. This process goes on until the termination condition is met and the final PO set of solutions is reported 648.
Referring to
Control continues to 716 to input the corresponding features of the received data (for example, at 708 control calculates the salient features of the production data) into the boundary equation to calculate a classification value of the received data or an output. Then, control continues to 720 to determine if the boundary equation output (that is, the classification value) is within the boundary defined by the boundary equation. If yes, control proceeds to 724 to identify the received data as unacceptable. As shown in
Returning to 720, if the boundary equation output is not within the boundary defined by the boundary equation, control continues to 732 to identify the received data as acceptable, which indicates that the item is acceptable. Control then continues to 736 to store the received data in a database for use in development of a further machine learning model. Then, control ends.
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of an embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.
The term “module” or the term “controller” may be replaced with the term “circuit.” The term “module” may refer to, be part of, or include: an Application Specific Integrated Circuit (ASIC); a digital, analog, or mixed analog/digital discrete circuit; a digital, analog, or mixed analog/digital integrated circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor circuit (shared, dedicated, or group) that executes code; a memory circuit (shared, dedicated, or group) that stores code executed by the processor circuit; other suitable hardware components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip. While various embodiments have been disclosed, other variations may be employed. All of the components and function may be interchanged in various combinations. It is intended by the following claims to cover these and any other departures from the disclosed embodiments which fall within the true spirit of this invention.
This application is a non-provisional application of 62/987,142, filed Mar. 9, 2020. The entire disclosures of the above applications are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62987142 | Mar 2020 | US |