This disclosure relates generally to methods and systems for managing industrial processes, and in particular to systems and methods for identifying process-to-product causal networks and generating process insights.
Monitoring and managing complex production processes is an ongoing challenge for engineers and researchers. A fundamental task in performing various production processes is to understand the context of the processes, find underlying causal relations within the context, and use the identification of these causal relationships to derive meaningful insights that can be leveraged for process management and improvement activities.
Systems for autonomous predictive real-time monitoring of faults in process and equipment are known (See for example, U.S. patent documents US20190384255A1, US20230052691A1, U.S. Ser. No. 10/360,527B2, U.S. Ser. No. 10/739,752B2). In known solutions, a monitoring component can record a dynamic system, and a predictive model can predict a trend of failure. However, without understanding the context of the complex production processes, fault diagnosis or system health monitoring are always passive responses. Isolating factors that are a source of the defects/faults in complex production processes are challenging since the processes require multiple steps, various materials, and distinct physical/chemical/biological conditions, among other things. The process variables relating to defects/faults may not be the underlying causes.
Finding underlying causal relations is known as causal discovery. A traditional way to discover causal relations is to use interventions or randomized experiments. However, experiments during production processes are expensive, time consuming, and constrained by the nature of the production processes. Therefore, in the context of industrial processes, causal discovery is often based on gathering purely observational data, turning those observations into causal knowledge, and applying that causal knowledge in planning and prediction.
Causal knowledge can be modelled as causal networks. One kind of representation of a causal network is the directed graphical causal model (DGCM), which is composed of variables (nodes), directed connections (edges) between pairs of variables, and a joint probability distribution over the possible values of all the variables.
Machine learning based prediction usually relies on two main approaches: either finding a function mapping inputs to class labels or finding the probability distributions over the variables and then using these distributions to answer queries about new data points. When a DGCM uses probability distributions over all the variables, and then marginalizes and reduces over these variables according to new data points to get the probabilities of classes, the DGCM provides inference over the joint distributions. This allows users to explore and exploit causality based on data.
Causal discovery relies on data, and the data are produced by not only the underlying causal process but also the sampling process. In practice, to achieve reliable causal discovery, specific challenges are addressed by estimating the causal generating processes for a time series. According to Granger Causality [Granger, C. (1969). Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica, 37(3), 424-438. https://doi.org/10.2307/1912791], if a time series X Granger-causes another time series Y, then predicting future values of Y using the knowledge of the past values of X is better than using the past values of Y alone. However, Granger causality is very sensitive to temporal aggregation or subsampling. If data are subsampled or temporally aggregated due to the measuring device, sampling procedure, or storage limitations, the true causal relations may not be identifiable. Therefore, combining product data with process data in real-time is necessary for capturing causal relations.
In the context of industrial processes such as manufacturing processes, understanding the context of the process and effectively leveraging this information for real-time monitoring of the process are important for improving product insights through available process data. Known solutions fail to provide robust and consistent causal insights for the prevention of process and product issues, manifesting in a disconnect between unstructured expert knowledge and the generic learning strategies of a learning system. The unstructured and fragmented nature of expert knowledge, coupled with the challenges of expert participation in the knowledge capture process, create significant barriers to the integration of that knowledge into a real-time system. On the contrary, the expense of real-world experimentation or the incongruence of simulated data result in learning systems that are unable to provide robust, interpretable and real-time insights in manufacturing environments.
Accordingly, there is a need for intelligent systems and methods that can be applied to understand the context of industrial processes, to find underlying causal relations between processes and the final products of such processes and make use of this contextual and causal relation data to provide meaningful and actionable insights for the processes.
According to an example aspect, a computer implemented method and system is described for a system and method for context prediction via application-specific process-to-product causal networks for generating manufacturing insights.
According to a first example aspect of the disclosure, a method of generating a causal network representation of an industrial process that is configured to perform one or more process operations to generate a product is disclosed. The method includes obtaining a process graph that comprises a set of data nodes for the industrial process. The set of data nodes included (i) for at least some of the process operations, a respective set of operation specific process data nodes, wherein the process data nodes for each respective process operation represent variables that are specified, measured or derived for the respective process operation, and one or more product data nodes that represent variables that are specified, measured, or derived for the product. The method also includes learning a probabilistic graph model (PGM) for the industrial process based on the process graph and historic process and product data collected for the industrial process in respect of the data nodes Learning the PGM includes: computing a graph structure based on the process graph and the historic process and product data, the graph structure including a subset of the set of data nodes from the process graph and defining a set of edges that connect data nodes within the subset that have been identified as having causal relationships, and computing a set of parameters that includes a respective probability for each of the edges in the set of edges, the respective probability for each edge indicating a causal relationship probability between the data nodes that are connected by the edge. The PGM comprises the computed optimal graph structure and the computed set of parameters. The learned PGM is stored.
In some examples, the method also includes obtaining new values in respect of the industrial process for the variables of the data nodes represented in the process graph; generating predictions or insights in respect of the process based on the new values and the PGM; and performing an action based on the generated predictions or insights.
In some examples, the action comprises causing information about the predictions or insights to be displayed as part of a graphical user interface display.
In some examples, the action comprises causing an operating parameter of the industrial process to be adjusted.
In some examples, generating the predictions or insights comprises performing causality structure predictions or conditional probability inference to generate a causal information prediction that estimates the relevance of relationships between respective pairs of the data nodes included in the graph structure.
In some examples, the causal information prediction is provided for at least one pair of data nodes that are not directly connected to each other by an edge.
In some examples, at least some of the data nodes have associated semantic descriptors, wherein generating the predictions or insights comprises generating insights based on the output of large language model (LLM) that has received at least some of the associated semantic descriptors as inputs.
In some examples, at least some of the data nodes are associated with a respective semantic descriptor that provides a natural language context for the variable that is represented by the data node.
In some examples, obtaining the process graph comprises prompting an LLM with information about the industrial process and receiving a list of proposed data nodes together with associated semantic descriptors for the process graph in response to the prompting.
In some examples, the respective set of operation specific data nodes for at least one of the process operations includes: specified data (SD) nodes that represent variables that are specified for the respective process operation; measured data (MD) nodes that represent variables that are obtained using respective process operation sensors at the respective process operation; and feature descriptor (FD) nodes that represent variables that are derived from data included in SD nodes or MD nodes.
In some examples, the method includes obtaining a value for an FD node by: (i) computing a representative value for time series of MD node values; or (ii) applying a machine learning based model to map an image captured for an MD node to a node value.
In some examples, computing the graph structure comprises computing an optimized graph structure by applying a structure learning algorithm to identify non-relevant data nodes and non-relevant edges that are represented in the process graph.
In some examples, obtaining the PGM model further comprises obtaining a base PGM model for a different industrial process and applying transfer learning to adapt the base PGM model for the industrial process.
In some examples, the data nodes include a quality related data node representing a variable that indicates a quality of the product, and the PGM model embeds causal relationship information indicative of the relevance of other data nodes within the PGM model to the quality related data node.
In some examples, the set of parameters that includes respective probability for each of the edges in the set of edges are represented as one or more conditional probability tables.
In some examples, the industrial process in an injection molding process.
According to further example aspect a system comprising a processor and a persistent storage that stores instructions that, when executed by the processor configuring the system to perform a method of generating a causal network representation of an industrial process that is configured to perform one or more process operations to generate a product. The method can be any of the examples described above.
According to a further example aspect, a system for managing industrial processes is disclosed. The system includes: a data collection module configured to collect specified data (SD) and measured data (MD) from an industrial process; a processing module configured to process the collected data to generate feature descriptors (FD) from the specified data (SD) and measured data (MD); a graphical modeling engine configured to generate a probabilistic graphical model (PGM) from the feature descriptors (FD), specified data (SD), and measured data (MD); an PGM processing module configured to produce context predictions or insights based on the PGM; and a client module configured to present the generated insights to process operators through an interactive graphical user interface (GUI).
In some examples, the data collection module is further configured to collect data from inline process components, including machine-based controllers and sensors, and manual operator inputs through a human-machine interface.
In some examples, the specified data (SD) includes predefined product characteristics and the measured data (MD) includes information collected from quality-based inspection devices such as machine vision sensors.
Reference will now be made, by way of example, to the accompanying drawings which show example embodiments of the present application, and in which:
Similar reference numerals may have been used in different figures to denote similar components.
This disclosure presents systems and methods for identifying process-to-product causal networks and generating process insights for an application-specific process.
As used in this disclosure, “process” can refer to the manufacturing operation or consecutive manufacturing operations (also referred to as “stages”) that convert raw materials or components into finished products. Each stage may be performed by one or more machines; conversely, in some examples, a single machine may perform multiple stages. “Product” can refer to the output of a specific process, which may be a finished product or a component that is then provided to another process.
A high level description of example aspects of the disclosure will be provided with reference to
In an example embodiment,
As illustrated in
Process 104 can take a number of different configurations in different example implementations. For example, process 104 can, in various implementations, include one or more machines that manufacture the same product or similar products (e.g. same base product with different materials, finishes, etc.), or one or more machines that manufacture different products using similar processes.
The methods and systems described herein can be applied to any number of industrial processes, however for illustrative purposes at least some example implementations will be described in the context of an injection molding process. In the case of an injection molding process, stages 101(1) to 101(N) for a single injection molding machine can, for example, include: heating stage (materials are heated); injection stage (molten material is injected into a mold); holding stage (molten material held at pressure equilibrium until gate freeze); cooling stage (material is cooled in mold); and ejection state (solidified part is ejected from mold). These stages collectively result in manufacturing of a production part (product 102). In the example where process 104 is an injection molding process, the process data may for example include sensor time series data, setpoint values, room temperature data, shift time data, operator data, etc. Product data can, for example, include: operator labelled defects (e.g., different types of defects such as flash, short shot, splay, warping, etc.), thermal image data (e.g., illustrating cold spots), and color image data (e.g., illustrating surface texture).
Injection molding process scenarios can include, for example: one or more injection molding machines manufacturing the same or similar parts (various shape, dimensions, synthetic polymer or plastic); one or more injection molding machines manufacturing different products that incorporate similar processes (metal injection molding machine, plastic injection molding machine); one or more products from the same injection molding process.
As illustrated in
The specified data SD for each process stage 101(k) can include multiple specified data elements SD(1) to SD(Nsd) (represented as respective data nodes), where Nsd denotes the number of data elements for the process stage (Nsd can have a different value for each process stage) and the index “i” denotes a generic data element SD(i) for the process stage 101(k). Similarly, the measured data MD for each process stage 101(k) can include multiple measured data elements MD(1) to MD(Nmd), where Nmd denotes the number of data elements for the process stage (Nmd can have a different value for each process stage) and the index “j” denotes a generic measured data element MD(j) for the process stage 101(k). Some of the data elements may be tensors that are comprised of further elements. In some examples, data elements may be processed to extract feature descriptors. For example, measured data element MD(j) can be processed to generate a set of one or more associated measured data feature descriptors FD1, . . . , FDn. In some examples, specified data SD elements can also be processed to generate a set of one or more associated specified data feature descriptors. These feature descriptors also are represented as data nodes in the process graph 100.
Similarly, the specified data SD for product 102 can include multiple specified data elements SD(1) to SD(Nsd), where Nsd denotes the number of data elements for the product and the index “i” denotes a generic data element SD(i) for the product 102. Similarly, the measured data MD for product 102 can include multiple measured data elements MD(1) to MD(Nmd), where Nmd denotes the number of data elements for the product 102.
In process graph 100, each stage's respective specified data elements SD(1) to SD(Nsd), and each stage's respective measured data elements MD(1) to MD(Nmd), correspond to a respective process node of the process graph 100. Product 102's specified data elements SD(1) to SD(Nsd), and product 102's respective measured data elements MD(1) to MD(Nmd), each correspond to a respective product node of the process graph 100. The feature descriptors FD can correspond to sub-nodes of nodes of process graph 100.
In the example of
In example embodiments, the components of system 300 include one or more sensors 304 and controllers 302 associated with each of the industrial process stages 102(1) to 101(N), at least one data collection module 306, at least one control module 308, at least one client module 310, at least one processing module 312, and one or more other modules 318. As used here, a “module” can refer to a combination of a hardware processing circuit and machine-readable instructions and data (software and/or firmware) executable on the hardware processing circuit. A hardware processing circuit can include any or some combination of a microprocessor, a core of a multi-core microprocessor, a microcontroller, a programmable integrated circuit, a programmable gate array, a digital signal processor, or another hardware processing circuit.
In example embodiments, controllers 302, sensors 304, data collection module 306, control module 308 and client module 310 may be located at an industrial process location or site and enabled to communicate with an enterprise or local communications network 316 that includes wireless links (e.g. a wireless local area network such as WI-FI™ or personal area network such as Bluetooth™), wired links (e.g. Ethernet, universal serial bus, network switching components, and/or routers), or a combination of wireless and wireless communication links. In example embodiments, processing module 312 and other modules 318 may be located at one or more geographic locations remote from the industrial process location and connected to local communications network 318 through a further external network 320 that may include wireless links, wired links, or a combination of wireless and wireless communication links. External network 320 may be a cloud network and may include the Internet. In some examples, one or more of data collection module 306, control module 308, and client module 310 may alternatively be distributed among one or more geographic locations remote from the industrial process location and connected to the remaining modules through external network 320. In some examples, processing module 312 and one or more other modules 318 may be located at the industrial process location and directly connected to local communications network 316. In some examples, data collection module 306, control module 308, client module 310, processing module 312 and one or more other modules 318 may be implemented using suitably configured processor enabled computer devices or systems such as personal computers, industrial computers, laptop computers, computer servers, edge devices, smartphones, and programmable logic controllers. In some examples, individual modules may be implemented using a dedicated processor enabled computer device, in some examples multiple modules may be implemented using a common processor enabled computer device, and in some examples the functions of individual modules may be distributed among multiple processor enabled computer devices. Further information regarding example processor enabled computer device configurations will be described below.
In example embodiments, sensors 304 that are associated with a respective stage 101(k) can for example include standard industrial process sensors such as image sensors (cameras), temperature sensors, pressure sensors, current sensors, vibrations sensors, inertial measurement unit sensors, and position sensors, among other things. The controllers 302 that are associated with a respective stage 101(k) can include electronic controllers that cause an automated process action to occur based on one or more input commands and/or setpoints that can be included, for example, in specified data elements SD(1) to SD(Nsd),
In example embodiments, data collection module 306 is configured to receive and pre-process sensor data from sensors 304 to provide measured data elements MD(1) to MD(Nmd), for each process stage 101(1) to 101(N) and for product 102. Data collection module can also receive and pre-process specified data elements SD(1) to SD(Nsd), for each process stage 101(1) to 101(N) and for product 102. The pre-processing performed by data collection module 306 can, for example, put process and product data into a format for suitable downstream processing that can include generating a PGM representation of the process 104 and product 104 as explained in greater detail below. Examples of possible data sources for data collection module 306 can include, for example: data from inline process components (e.g., machine based controllers and sensors) through a PLC or other communication interface; data from inline external sensors (e.g., cameras); data from manual operator inputs through a human-machine-interface (e.g., observational data such as “good part”, “defect part” input by an operator provided through client module 310); and other industrial data sources such as upstream machines and other process and user management systems.
Control module 308 is configured to provide control instructions, including for example, specified data SD(1) to SD(Nsd), to the controllers 302 for each process stage 101(1) to 101(N).
In some examples, processing module 312 is configured to receive and further process measured data MD (including measured data elements MD(1) to MD(Nmd), generated in respect of each process stage 101(1) to 101(N)) and the specified data SD (including specified data elements SD(1) to SD(Nsd), for each process stage 101(1)) to 101(N)) of the process. Processing module 312 processes the specified data SD and measured data MD in the context of process graph 100 to identify causal relationships and generate insights in respect of the process that is represented by the process graph 100. The causal relationships can, for example, be represented in a trained PGM.
Client module 310 may be configured to allow users at the industrial process location to interact with the other modules and components of system 300. In one example, client module 310 is configured to generate interactive GUI 110 that enables a user to view causal relationships and generated insights in the context of process graph 100. Client module 310 may for example include a viewer application that allows the user to view and interact with data received from the Processing module 312.
As will now be described in greater detail, according to example embodiments, system 300 collectively functions to enable manufacturing process data to be associated with product data using measured data (MD), specified data (SD), and associated feature descriptors (FD) to construct application-specific causal networks from pre-configured process graphs for the purposes of automatically generating descriptive, predictive, and prescriptive insights. In at least some examples, this process can be automated and performed without requiring substantive user input. The descriptive, predictive, and prescriptive insights can be communicated to process operators and process control systems to enable improved process decision making and performance improvements.
An example of a method 350 that can be applied using computer implemented system 300 to enable Probabilistic Graphical Model based learning and context prediction is shown in
As indicated by block 354 in
In this regard,
By way of example, in the case of an injection molding machine, literature 402, expert knowledge 404 and user input 406 can provide the list of stages with a semantic description (e.g., heating, injection, hold, cooling, ejection stages), and the components and sensors for each stage. One or more LLMs 408 can use this data to generate a list of nodes and basic topology for process graph 100, including nodes representing each of the process stages in sequential order and their respective specified data SD and measured data MD elements, as well as nodes representing the product and its respective specified data SD and measured data MD elements. By way of example,
It will be noted that the process graph of
In some examples, operator inputs can be collected through a GUI that can enable a user to provide additional knowledge about the relationships that are omitted from a preliminary process graph (User inputs represented by line 410 in
Accordingly, semantic information can be collected from users, experts, or literary sources for each node or a subset of the nodes and the relationships between nodes. This semantic information can be vectored and added as ancillary semantic data to process graph 100 to provide a basis for generating causal network data.
In summary, in some example implementations, obtaining the process graph 100 involves building an initial graph structure that represents a complete set of nodes that represent as respective nodes all stages, measured data elements, specified feature data elements and feature descriptors (also referred to as derived nodes), and a set of edges that represent all possible edges between the nodes within the set of nodes. This structure may be manually configured by an expert or through automated configuration tools, or through a combination of both. In some examples, constraints can be imposed when obtaining the process graph 100 such as one or more of: feature descriptors (derived nodes) are only connected to a parent node; node locations configured according to physical locations (e.g., physical process stage nodes); and directed connections are enforced as appropriate considering the process.
As the process understanding evolves, the initial graph structure may become increasingly complex.
As indicated by block 355 in
With reference to
Graphic modelling engine 606 applies statistical algorithms and machine learning by exploring the interdependencies among the variables (i.e., specified data SD elements and measured data MD elements and feature descriptors FD included in process data 602 and product data 602) represented in process graph 100 to learn a PGM 608 that represents joint probability distributions over these variables. In structure learning, the goal is to infer this graph structure from observed data, with incomplete knowledge about the relationships among variables. The trained PGM 608 can then be used to perform inferences over the joint distributions, enabling users to test causality hypotheses based on novel data.
PGM 608 is characterized by a graph structure (nodes and edges) and a set of parameters associated with the graph structure. In this regard, PGM 608 has nodes representing variables (e.g., nodes representing each of the data nodes in the process graph 100, including specified data SD elements, measured data MD elements, and feature descriptors that are present in process data 602 and product data 604) and edges between the nodes that represent the dependency or correlation between the variables represented by the nodes. The parameters associated with each independent path through the various levels of each node are referred to as the Conditional Probability Distributions (CPD). Each CPD is of the form P(node|parents(node)), where parents(node) are the parents of the node in the graph structure. In the case of a structure A→C←B, the parameters of the network would be P(A), P(B) and P(C|A, B). Graphic modelling engine 606 applies Parameter Learning techniques (e.g. Maximum Likelihood Estimator, Bayesian Estimator, and Expectation Maximization Estimator) to a training dataset (i.e., historically acquired process data 602 and product data 604). This learning operation computes parameter values that fit the training data. A graph structure combined with a set of the learned parameters form a trained PGM 608. In one example, PGM 608 is a directed graphical causal model (DGCM), which is composed of variables (e.g., data nodes including measured MD nodes, specified data SD nodes, and feature descriptor FD nodes), directed connections (edges) between pairs of variables, and a joint probability distribution over the possible values of all of the variables. In some examples, PGM 608 can be represented as a data structure that captures both the graph structure (nodes and edges) and the associated probabilistic information (parameters, including distributions). Nodes can be represented as a list or set of variables (e.g., nodes={A, B, C, D}). Edges can be represented as pairs of nodes indicating dependencies. For example, in a Bayesian Network (Directed Acyclic Graph, DAG), edges are directed (e.g., edges={(A, B), (B, C), (C, D)}. Parameters (including Distributions) can be represented using Conditional Probability Tables (CPTs) for Bayesian Networks, including for example, dictionaries mapping nodes to their conditional probability distributions (e.g., CPTS={A: P(A), B: P(B|A), C: P(C|B), D: P(D|C)}.
In some examples, Graphic Modelling Engine 606 applies a two-step process to obtain PGM 608. Referring again to
Score-based structure learning can be interpreted as an optimization task, which requires a scoring function (which maps structures to a numerical score, based on how well the structures fit to a given data set) and a search strategy (which traverses the search space of possible structures and selects a structure with optimal score). Commonly used scoring functions are Bayesian Dirichlet scores (BDeu or K2), and Bayesian Information Criterion (BIC).
The second step applied by Graphic Modelling Engine 606 to obtain PGM 608 is to compute the conditional probability distributions (CPDs) for the optimal graph structure. These CPDs form a set of learned parameters for the PGM 608, and may for example be stored in look-up tables that are indexed by the nodes included in the optimal graph structure. In example implementations, the CPD parameters are learned using parameter learning techniques (e.g., Maximum Likelihood Estimator, Bayesian Estimator, and Expectation Maximization Estimator) that compute parameter values based on training data.
In at least some examples, PGM 608 can be updated over time so as to capture the evolution of the complex causal networks that are represented by the PGM 608. The path strength can be rated (scored and ranked) based on the inference process with updated data. The top causality paths from specified data to feature descriptors to defects can be presented to human users via a GUI for action validation (As shown, for example, in
PGM 608 enables joint probability distribution to be compactly represented, based on a relatively small initial training dataset, and the PGM 608 can be regularly updated. This can be contrasted with conventional machine learning (ML) methods that rely heavily on black box network structures that are trained on large volumes of input-output data. These conventional solutions require large datasets, with simple labeling schemes, for robust training. These large datasets are not easily adapted and updated with changing conditions in the process and production environment.
In contrast to conventional ML based solutions, the contextual causal graph data embodied in PGM 608 can be managed dynamically to adapt to changing contexts, through structured knowledge represented in the nodes and edges of the graph structure. Since each node is directly connected to the elements of the physical process and is not a black box of abstracted connections, semantic information can be collected from users or literary sources for each node or a subset of the nodes. This is analogous to the traditional labeling process in conventional deep learning frameworks.
One objective of developing a causal graph (e.g., PGM 608) is to determine a structure of the specified data SD elements and measured data MD elements that best represents the dynamics of the physical system. This differs from purely black box approaches that fit models to a specific dataset for the purpose of generating a predicted output. Since each data node is mapped to a physical element of the process via process graph 100, a broad set of background contextual data known by human experts can exist to define the relationship between nodes. The semantic knowledge (e.g., node descriptor) collected for each node can represent capture of a history and/or background context relevant to the function of each node. This information creates a dynamic dialogue between the system and the user related to one or multiple nodes, enabling a unique labeling experience which represents creates high resolution feedback for a lower volume of training examples. This differs from many machine learning approaches of single concept labels for a high number of training examples. This type of data enables in-context learning for LLMs to create new causal models by transferring learning from similar processes.
By embedding the semantic knowledge in a vectorized format, the system can leverage this data through vector similarity metrics (e.g., Euclidian distance, dot product similarity) to generate a score for individual nodes and edges of the network. This score is then used to help guide the PGM process (e.g., graphic modelling engine 606) in finding causal structures in the process and product data. [See For example “godel-large-scale-pre-training-for-goal-directed-dialog”, arXiv:2206.11309v1 [cs.CL]22-6-2022.
Specific examples of process data 602 and measured data 604 will now be discussed in the context of an injection molding process to illustrate aspects of obtaining the process graph 100 and the PGM 608.
As noted above, measured data MD includes information collected from measurement devices, including for example embedded or external sensors 304 related to the process 104 or the product 102. Time series examples of measured data MD elements corresponding to process data are respectively illustrated in the following set of Figures:
As noted above, measured data MD can also include machine vision data that is based on images/videos captured by cameras of the process or the product 102.
As noted above, specified data SD includes information collected from human inputs or predefined machine or process inputs related to the process 104 or product 102. Predefined time series examples of specified data SD elements corresponding to process data are respectively illustrated in the following set of Figures:
Specified data in respect of product 102 can, for example, include a given CAD model of the part, and geometry and other physical data acquired from the CAD model. The geometric information, combined with image data of the product 102, for example through performing object pose estimation to identify the position and orientation of the product 102 in the image, can be used to improve the understanding of quality issues through inherent defect and geometry relationships. Relative geometric differences between the product 102 geometry and part CAD can also be leveraged to identify further quality issues without explicit need for 3D sensors.
As noted above, in at least some examples, data that is collected by data collection module 306 can be processed to facilitate downstream processing. For example, collected data can be preprocessed using appropriate algorithms to standardize and/or normalize the data to improve compatibility with the processing applied by graphic modelling engine 606. For example, thermal and color images collected in respect of product 102 can be processed to standardize the images and remove undesired variations in pose and background.
As noted above, measured data MD and specified data SD can be processed to extract feature descriptors FD that can be included in the data provided to graphic modelling engine 606. In some examples, one or both of data collection module 306 or processing module 312 can be configured with trained machine learning (ML) models 330 that are applied to collected process and product data to generate associated lower dimensional feature vectors or logit scores that can be used by graphic modelling engine 606 as feature descriptors FD when training the PGM 608.
ML models 330 can include trained deep learning based anomaly detection models that are trained to generate score-based quality feature descriptors FD in respect of measured data MD elements. For example, the thermal temperature or image data of
Further, feature descriptors FD may be generated based on preconfigured product characteristics described in specified data (SD), such as 3D features associated with product geometry (e.g. cold spot at location X associated with product aspect Y in thermal image Z). In this regard,
The conversion of high resolution specified data SD and measured data MD inputs into low dimensional feature descriptor FD representations can, in some examples, be better suited for analysis and decision making by human operator and real-time control algorithms.
By way of further example, in the context of an injection molding process, feature descriptors FD that can be extracted from a measured data MD time series for “barrel pressure” can include: highest absolute value, the absolute energy (the sum over the squared values), the first location of the maximum value, the binned entropy of the power spectral density, etc. These and other feature descriptors can, for example, be generated from measured data and feature descriptors using known solutions such as the Python package tsfresh.
Extracted feature descriptors can be further filtered and selected according to the importance of the features (e.g., the permutation feature importance) and relevance analysis (e.g., univariate feature significance test). During filtering and selection, the paths from measured data MD and specified data SD to various outcomes can be rated (scored and ranked). For example, in an injection molding process, 50 sensors may result in over 700 features for each sensor, with these 700 features then being filtered to about 20 relevant features in total. The relevant feature descriptors FDs can then be used for defect prediction by ML models 330 and causality discovery by graphic modelling engine 606. Examples of quality-based metrics derived from thermal images can include identification of surface defects (flash, spray, short shot, etc.) and location highlighting of cold spots in thermal images. Extracted feature descriptors can enable the dimensions of the feature space to be reduced to 2D or 3D for easy visualization to facilitate human decision-making (e.g. Principal Component Analysis, t-Distributed Stochastic Neighbor Embedding).
In the illustrated example, the “Feature” node is derived from parent “Sensor” node. For example, a measured data MD node can be subdivided into one or more abstract components (e.g. feature descriptors FDs, also referred to as features). In some examples, each node in the PGM, whether a discrete or continuous variable, can be discretized into various node levels through binning (via distribution assumptions) to represent a finite number of variable values.
Context/Causal information Prediction (Block 360)
Referring again to
By way of example,
In a further example, the trained PGM 608 can be used for context prediction (also referred to herein as causal information) based on Conditional Probability Inference (CPI) (
Given the variable values of enough nodes, without necessarily having all of the data or knowing the outcome, the trained PGM 608 can be processed to predict the possible paths and outcomes. The Conditional Probability Distribution parameters of the PGM 608, organized as look-up tables, can be assessed and the rows that remain valid describe the predicted outcome of the system. That is, the predicted outcome of the system can be estimated by repeatedly querying the PGM 608 with the current state (observed node variable values) of the process.
For process applications where visual confirmation of an outcome is not possible until a later stage in the process, this inference query of
By way of further illustrative example of context prediction, information from a derived context graph can be presented to a user as described above in respect of
In one example, PGM processing module 1201 includes a context prediction generator 1206 that is configured to perform Context/Causal Information Predictions (Block 360) to generate causal information 1208 (including for example context predictions in respect of the SD and MD data elements of industrial process 104 and product 102. Context prediction generator 1206 can apply one or more query algorithms to PGM 608, together with newly observed process data 602 and product data 604, to infer or predict causal information 1208.
With reference to
Insight generator 1202 can include one or more ML models 1204 for interpreting process data 602, product data 604, inputs from LLM 408 and PGM 608 to generate insights 201.
In the illustrated example, three types of insights 201 are provided by insight generator 1202: descriptive insights 202 for communicating process/product relationships/knowledge to a user; predictive insights 204 for modeling process/product relationships and outcome prediction and communicating these aspects to a user; and prescriptive insights 206 for recommending process improvements and risk mitigation to a user.
Insights 201 are generated with the objective of: (1) leveraging an application-specific causal network (i.e., PGM 608) to perform product focused analysis, such as product quality root cause investigations, and generate associated descriptive, predictive, and prescriptive manufacturing insights that can be communicated to process operators and process control systems for improved process decision making and performance improvements; and (2) leveraging an application-specific causal network (i.e., PGM 608) to perform process focused analysis, such as process health investigations, and generate associated descriptive, predictive, and prescriptive manufacturing insights that can be communicated to process operators and process control systems for improved process decision making and performance improvements.
In example applications, insight generator 1202 is preconfigured based on application-specific threshold settings and context settings and executes insight generation software continually to process incoming data, compare preconfigured analysis and generation settings, then curate and delivers insights 201 back to the machine and/or operator. This workflow may be completely automated or manually configured and curated depending on the application. Insights may also be curated based on a user hypothesis, for example “does the injection setpoint in an injection molding process have a direct impact on the production of products with flash defects”, and can return a summarized data analysis that helps answer the hypothesis.
In example implementations, insight data that includes insights 201 can be communicated to operators and/or machines associated with process 104. For example, insights 201 can be presented to the user through a client module 310 or other local human machine interface (HMI) device for in-factory actions, such as recommended operator interactions or recommended control action changes. In some cases, insights can be presented to the user through a cloud-connected computing device (e.g., other module 318) accessing a cloud platform for the purposes of offline data exploration. In some examples, insights can be presented to the machine control network (e.g., via control module 308) for automatic adjustment of controller settings. User interactions through the cloud platform can be communicated to a local edge device and/or HMI for in-factory actions, such as recommended operator actions or automated control actions.
In the context of an injection molding process, examples of descriptive insights 202 may, for example, include: (1) Statistics on product data (e.g. defect categories, defect/fault scores, defect clusters) and process data (e.g. critical features); (2) Data on relationships between process data and product data (e.g. showing “good” parts and “defect” parts in a dimension-reduced feature space); and (3) summary of daily/weekly/monthly production statistics. By way of example,
Example of predictive insights 204 for an injection molding process can, for example, include: Outcome prediction (e.g. probability of defects); Sensor failure prediction; and Quality measurement prediction.
Examples of prescriptive insights 206 for an injection molding process can, for example, include: (1) Action recommendation. For example, the plot of
With reference to
Referring again to
With reference to
Conventional black box modeling techniques do not provide a clear connection between the internal states of the model nodes in relation to the states of the real-world problem. This creates a scenario where transferring the learning about specific machines, products, and production environments is challenging, and can limit the amount of a-priori knowledge that can be leveraged when creating new models for new applications and processes.
However, the semantic layer that is included in the nodes of the causal networks represented in PGM 608 enables context similarities and differences to be identified when evaluating a new scenario. This is accomplished by using the semantic embeddings generated by LLMs 408 as indexes for previously learned causal structures (e.g., existing PGM 608). These causal structures can represent the entire structure of a specific application and/or subsets of a causal structure where there is high relevance with respect to the physical process
An example would be to search previous causal structures from an injection molding process to find related physical sub-process that are relevant to another process (e.g., new process 1404) such as metal die-casting. The overall context of the two processes are very different, but certain sub-processes such as melting, injection, and cooling share similar physical characteristics that are measured with similar sensors. The process of transferring knowledge through PGMs is to share the statistical strengths between models for two knowledge domains or two production processes. For example, an additional random variable can be introduced as a connector to link two processes (Xuan, J., Lu, J., & Zhang, G. (2021). Bayesian Transfer Learning: An Overview of Probabilistic Graphical Models for Transfer Learning (arXiv. https://doi.org/10.48550/arXiv.2109.13233), and the knowledge from one process can be transferred to the other process through this variable. The design of the variable for transfer learning depends on the problem and users' understanding of this problem. Specifically, methods for transfer learning with PGMs include but not limit to: Gaussian distribution prior (mean and/or standard deviation), probabilistic latent semantic analysis or latent Dirichlet allocation for document modeling, Bayesian nonparametric models, tree structures, attributes (e.g. color, texture, shape attributes of sensor images), and factor analysis (Xuan, J., Lu, J., & Zhang, G. (2021). Bayesian Transfer Learning: An Overview of Probabilistic Graphical Models for Transfer Learning (arXiv:2109.13233). arXiv. https://doi.org/10.48550/arXiv.2109.13233).
By leveraging learned causal structures from other processes the transfer module 1502 does not need to calculate all the joint probabilities represented in the feature descriptor space, thus allowing it to learn much faster and with less training examples. Either expert judgements or knowledge transferred from other processes can be incorporated as node constraints of the PGMs. Parameter learning accuracy can be improved with transferred priors and constraints although training data are limited or not relevant (Zhou, Y., Fenton, N., Hospedales, T. M., & Neil, M. (2015). Probabilistic Graphical Models Parameter Learning with Transferred Prior and Constraints. Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 972-981.).
In terms of process data 602, there is often a large degree of variation in the methods of data capture, even for very similar machines and processes. This creates a unique set of data for each machine and process and creates a significant challenge in developing an optimal set of feature descriptors for that data that are most relevant to the process context. By leveraging semantic comparison of historical causal structures the transfer module 1502 can apply the best techniques for identifying critical feature computation, and the structured relationships between those features.
Regarding product data 604, a major contribution of a causal graph structure is its ability to unify data across multiple domains in industrial processes. The data captured by quality control systems often exist in data silos and are not easily comparable for the purpose of determining a root cause of a quality issue. Using different feature extraction techniques for different data sources such as images, video, audio, for example, allows the causal graph to connect a large and diverse feature space between measured data and specified data. The features extracted from product quality data can be generated by any number of different modeling or data analysis strategies such as clustering/classification/regression models. It is the relationship between these features under the current context that is critical for the causal structure to determine. Quality metrics can be subjective to user preference or production requirements, so the semantic information related to connections in the graph are used dynamically to adjust the significance of features.
The processing unit 170 may include one or more processing devices 172, such as a processor, a microprocessor, a general processor unit (GPU), a hardware accelerator, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a dedicated logic circuitry, or combinations thereof. The processing unit 170 may also include one or more input/output (I/O) interfaces 174, which may enable interfacing with one or more appropriate input devices 184 and/or output devices 186. The processing unit 170 may include one or more network interfaces 176 for wired or wireless communication with a network (e.g with networks 316 or 320).
The processing unit 170 may also include one or more storage units 178, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The processing unit 170 may include one or more memories 180, which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM), and/or a read-only memory (ROM)). The memory(ies) 180 may store instructions for execution by the processing device(s) 172, such as to carry out examples described in the present disclosure. The memory(ies) 180 may include other software instructions, such as for implementing an operating system and other applications/functions. There may be a bus 182 providing communication among components of the processing unit 170, including the processing device(s) 172, I/O interface(s) 174, network interface(s) 176, storage unit(s) 178 and/or memory(ies) 180. The bus 182 may be any suitable bus architecture including, for example, a memory bus, a peripheral bus or a video bus.
From the above description, it will be appreciated that one or more aspects of the present disclosure can enable data based context to be acquired in respect of a production process and effectively leveraged to enable online visuality and real-time monitoring of the production process. This can support improved quality inspection insights through available process data. This disclosure describes methods of context prediction from inspection data to support process operators with meaningful, interpretable, and actionable quality insights. Among other things, context prediction can, in various applications, enable one or more of: Causality Structure Prediction—identify the causal factors associated with a given outcome (e.g., defect classification) to find answers about why an outcome occurred; and Conditional Probability Inference—predict the likelihood of an outcome (e.g., defects or other risks) occurring based on the inspection data during the production process.
In examples, context prediction derives high-level causal variables (or features) from low-level observations (or settings/sensors) and evaluates their relevance on the outcome. Highly relevant relationships that are computed between variables are then able to be organized into concise user insights and/or machine actions.
Data based context (also referred to as data context) for a manufacturing process can refer to the history of a manufacturing process of a specific type of part in the inspection dataset, consisting of all the observations/data before, during, and after the manufacturing process, including production settings, sensor recordings (features), images, and quality measurements (defects).
Data context also consists of the relevance and variance of the collected data. If the relevance and variance between settings, features, and defects can be identified and quantified, the quality of outcomes can be predicted. The relevance indicates the causal factors associated with the defects, and the variance indicates the likelihood of defects happening.
Context prediction aims to understand the data context through learning-based approaches to effectively identify the causal factors of a process that are statistically relevant to the outcome. This includes evaluating the relevance of different causal factors as well as the likelihood of possible outcomes.
In some example implementations, data context can be considered as falling within three groups: (1) context of defects (e.g., visible or measurable defects between “Good” and “Bad”; (2) context of features (e.g., measurement data that is acquired by sensors, or features extracted from such measurements); and (3) context of settings (e.g., environment defined context, and specified data such as material, machine or production settings).
A context prediction system such as described above can learn from an inspection dataset through the three context groups and their relations in order to perform causal structure predictions and make conditional probability inferences. Constructing the relations between defects and features plays an important role in accurate and efficient prediction. Feature extraction (e.g., from time series or an image) and feature selection (e.g., by factor analysis and graphical model) determine the context of features.
From a process operator's perspective, the context groups of defects and settings are relatively clear and understandable, whereas the context of features, especially unobservable latent features extracted from time series or using methods of dimension reduction (for example, Principal Component Analysis) may lose their original physical meanings.
Generally, the observations/data contained within the data context are of the following types: (1) setpoint data; (2) time-series data; (3) image data; and (4) outcome labels (i.e., classifications). Context prediction can benefit from a refined set of high-level features extracted and selected from a large amount of low-level observations. In at least some example applications, the extracted feature sets effectively summarize the relevant feature contexts of the observations. Feature extraction and feature transformation are techniques that can be applied to observations/data to reduce their dimensionality to produce a minimalistic data representation. This abstraction improves the performance of learning models that leverage feature representations of the data.
Relevant feature extraction techniques are applied to each observation/data contained within the data context (eg. the extracted features can be used to minimally describe the time series and their dynamics). They can also be used to cluster time series and to train machine learning models that perform classification or regression tasks on time series. This primarily allows to reduce the dimensionality of the problem and focus the resulting predictions on the most relevant features of the data. The extracted features have successful applications in sensor anomaly detection, activity recognition of synchronized sensors, and quality prediction during a continuous manufacturing process.
Having a lower dimensional representation of the relevant observation/data features is beneficial; however, not all of these features are relevant to understand the feature context. That is, multiple features may describe redundant context and therefore further dimensionality reduction is possible. Factor Analysis is one tool that is used to describe variability among extracted, correlated features in terms of a potentially lower number of unobserved variables called factors. Factor analysis seeks an intuitive explanation about what's common among the features. The extracted features are then transformed as linear combinations of the potential factors (which also belong to the feature context) plus “error” terms.
Further examples of the acquisition and use of causal network data such as a PGM 608 to provide information about causal relations and related insights in the context of a mass production process will now be described.
An example will now be described in the context of a plastic injection molding process using an Injection molding machine (IMM) to produce plastic parts. The IMM data context includes setpoint parameters, sensor time series, and camera captures (optical image, thermal image and its temperature intensity in pixels). In this example, 8 setpoint parameters (e.g., specified data SD elements) (see Table 3) were fed into the IMM control system. The parameters were set by a randomized factorial design. A series number was granted to each part after it was produced (e.g. ‘2023-11-14-16-30-45’).
Defects were manually labeled into four categories: flash, splay, short shot, sink.
For the purpose of this example, flash defects are used for showing examples in the following tables and figures.
Different setpoint parameters, different lengths of time series data, and unknown factors (e.g. environment or material change) made each production process unique. Each production generated up to 47 time series data profiles. Certain profiles were removed due to data quality issues (e.g., missing or excessively noisy data). Optical images were enhanced with histogram equalization to improve defect visibility.
Features (e.g., feature descriptors FD) were automatically calculated from measured data. For example, for time series data the Python package Tsfresh™ can be used to automatically calculate relevant features. Example process data is shown in Table 4, for 3 time series profiles (Mold_Temperature_3, Mold_Pressure_2, and Barrel_Pressure_0) for two different production parts (‘2023-11-14-16-30-45’ and ‘2023-11-14-16-48-03’). The last column is the labeled binary data of defect Flash.
The output features were named with semantic descriptors taking the form of ‘sensor name+feature name+feature parameters’. As shown in Table 5, sensor ‘Barrel_Pressure_0’+feature ‘fourier_entropy’+parameter ‘bins_5’. In this study, those time series having no feature outputs were discarded.
Feature Selection was performed as follows. The tsfresh package limits the number of irrelevant features in an early stage of the machine learning pipeline with respect to their significance for a classification or regression task. The tsfresh package deploys Scalable Hypothesis tests to evaluate the importance of the different extracted features. For every feature, the influence on the target (defect) is evaluated by Univariate Statistical Tests and the p-value is calculated: the smaller the p-value, the more the significance, and the feature is more related to the defect. The method of testing relevance is selected according to the data type of features and targets (defects). In tsfresh, for real data type features and binary targets, ‘mann’ method is used to calculate the p-value, and the default False Discovery Rate (FDR) is 0.05.
The results of feature extraction and selection for the four types of defects are summarized in Table 6. The defect Flash was related to 4 sensors, which were associated with 73 features; since some features had more than one set of parameters, the number of feature columns was much more than 73. Among the 98 IMM product parts, only 50/of the parts had the defect Short Shot. No features were found relevant to the defect Short Shot.
Feature Transformation was performed using the Python package factor analyzer to conduct factor analysis. The factor analysis model uses a minimum residual (MinRes) method and an orthogonal rotation to return a loading matrix. An absolute value of 0.4 or higher can be considered as a high loading. The MinRes method uses the fit function (i.e. the difference between the model-implied variance-covariance matrix and observed variance-covariance matrix) and adjusts the diagonal elements of the correlation matrix to minimize the squared residual when the factor model is the eigenvalue decomposition of the reduced matrix.
Exploratory Factor Analysis reveals how many factors are present and their associated factor loadings.
Table 7 lists the factors and the corresponding features with loadings >0.4. Factor F1 contributed to as many as 102 features, while F12 or F16 only contributed to one feature. In Table 9, F1 was related to 4 sensors, F2˜F4 related to at least 2 sensors, and F5, F7, F9 and others related to only one sensor. Therefore, F1 is the most common factor among all the features.
Table 8 lists the factors that are related to the sensors for the defect Flash. After factor analysis, the 187 features extracted from time series were reduced to 19 latent factors. Three variables (features) having low loadings (<0.4) of factors were ignored. The left 184 features were transformed to the factor coordinates, which were independent to each other.
Based on the factor structure obtained from Exploratory Factor Analysis, Confirmatory Factor Analysis performs a hypothesis test to examine if the factor structure is true, using the method of maximum likelihood. The factor structure identified and tested by factor analysis is shown in
A PGM is then learned through an automated causal inference analysis of dependencies among defects, factors, and settings. Probability Distributions can be continuous such as normal distribution or log-normal distribution. In this example, since only 98 samples were analyzed, the distribution of each variable (i.e. each node in the structure) may not be a normal distribution (see the distribution of 8 setpoints in
There are two typical tasks with graphical models: inference and learning. A Python library pgmpy was used in this example to implement inference and learning.
This example selected Bayesian Dirichlet (K2) scores for structure learning. Given any node of the 8 settings and 19 factors as a parent of the node “Flash” the local scores were calculated and compared in Table 9. The score columns were descending sorted. The comparison showed that the three scores were close, and the orders of parental influence from highest to lowest were also close. Given lists of potential parents, for example [F19, F4, F18] and [F1, F2, F3], the local K2 scores were −68.44 and −71.56 respectively, which indicated “Flash” was more influenced by the parent nodes [F19, F4, F18] than [F1, F2, F3]. The comparison implied that the first 3 common factors (from Factor Analysis) may not be the most influential factor in the Probabilistic Graphical Model. However, Factor Analysis worked on reducing the dimensions from 187 to 19.
The search space of Direct Acyclic Graph is super-exponential in the number of variables/nodes and the scoring functions allow for local maxima. The exhaustive search is intractable for big networks, and the local optimization algorithms such as hill climb search cannot always find the global optimal structure. Thus, heuristic search strategies often yield good results. The pgmpy library allows users to set up the starting structure for the local search. By default, a completely disconnected network is used. The pgmpy library also allows fixed edges—a list of edges that will always be there in the final learned model. The algorithm will add these edges at the start of the algorithm and will never change it.
Constraint-based structure learning constructs Direct Acyclic Graphs according to identified independencies. There are several conditional independence tests available in the pgmpy library, such as Pearson R test, Chi-Square test, Log-likelihood test, Freeman-Tukey test, and Cressie-Read test. This method returns a Direct Acyclic Graph structure which complies with the interdependencies implied by the dataset. Based on the results from constraint-based structure learning, an additional [white_list] or [black_list] can be supplied to the hill climb search. In this case, the search can restrict to a particular subset or exclude certain edges. To enforce a wider exploration of the search space, the search can be enhanced with a tabu list. The list keeps track of the last n modifications; those are then not allowed to be reversed, regardless of the score (similar to tabu search).
Given 8 setting points, 19 latent factors from sensor time series, and defect Flash, the results of structure learning were compared between score-based (with hill climb search and K2 scores) and constraint-based (with Chi-Square independence test at a significant level of 0.01) methods. Score-based structure learning (
However, constraint-based structure learning (
Next, heuristic search strategies were applied. Since the previous local K2 scores (Table 9) have identified the most potential parent nodes of “Flash”, the Direct Acyclic Graph structure for starting search could fix the connections of F19-Flash, F4-Flash, and F18-Flash (the top 3 parent nodes). Meanwhile, some connections between x and F failed the independence test (
Hybrid structure learning, such as the MMHC (Max-Min Hill-Climbing) algorithm, combines the constraint-based and score-based methods. The idea is to learn undirected graph skeleton (using the constraint-based construction) before orienting edges (using score-based optimization). The undirected skeleton can be imported as a [white_list] to the Hill-Climbing algorithm.
Parameters Learning. Given a set of data samples and a Directed Acyclic Graph structure that captures the dependencies between the variables, the parameters (Conditional Probability Distributions) of a Discrete Bayesian Network can be learned. The pgmpy library supports three methods: Maximum Likelihood Estimator, Bayesian Estimator, and Expectation Maximization Estimator. In this example, the node “Flash” connected with its parent nodes “F14”, “F18”, “F19”, “F3”, “F4”, and “x4.” Each “F” node or each “x” node had 3 levels (discretized), and the “Flash” node had 2 states: True or False. The tabular listed the probability of “Flash” at each joint condition: P(“Flash”|[“F14”, “F18”, “F19”, “F3”, “F4”, “x4” ]).
When estimating parameters for Bayesian Networks, lack of data is a frequent problem. Even if the total sample size is very large, the fact that state counts are done conditionally for each parent node configuration causes immense fragmentation. In this example, the variable “Flash” has 6 parent nodes that each take 3 states, then state counts will be done separately for 3{circumflex over ( )}6=729 parents configurations. This makes the Maximum Likelihood Estimator very fragile and unstable for learning Bayesian Network parameters. A way to mitigate its overfitting is Bayesian Parameter Estimation.
The Bayesian Parameter Estimator starts with already existing prior CPDs before the data are observed. The “priors” can have specific distributions or commonly be uniform. The “priors” are then updated, using the state counts from the observed data. The estimated values in the CPDs can be more conservative than those with Maximum Likelihood Estimator.
The Expectation Maximization algorithm can learn the parameters from incomplete data. The idea is to pick up a starting point for parameter learning and iterate two steps: (1) expectation step to “complete” the data based on current parameters, and (2) maximization step to estimate parameters based on current data.
The most common use of the Expectation Maximization algorithm is learning with latent variables. Latent variables are never observed but important for capturing some structures of data. Latent variables are useful for model sparsity (less parameters), discovering clusters in data, and dealing with missing data. Since latent variables satisfy the missing-at-random assumptions, the Expectation Maximization algorithm is applicable when some latent variables in the model do not have values.
Inference algorithms deal with efficiently finding the conditional probability queries. Two algorithms are available in the pgmpy package for inference: Variable Elimination and Belief Propagation. Both of them are exact inference algorithms. The basic concept of Variable Elimination is sum over Joint Distribution. The elimination order is evaluated through heuristic functions, which assign an elimination cost to each node that has to be removed.
Belief propagation, also known as sum-product message passing, calculates the marginal distribution for each unobserved variable, conditional on any observed variables. The belief is the normalized product of likelihood and priors (i.e. the probabilities of certain events already known in the beginning).
The inference process attempts at answering to some key questions using probability queries about the defects scenario: A considerable setpoint value growth lead to an increase of one type of defects; An increase of time series features are important factors in flash; One setpoint increase leads to some features change; An increase of environmental parameter is evidence of an increase of flash; A reduction of one critical feature value is evidence of a reduction of one type of defects; One setpoint value reduction and one feature value increase impact significantly on a specific defect at a specific location.
Structure learning with Probability-Graphic-Model demonstrates the dependency between variables. The associated variables are connected with each other by Directed Acyclic Lines. One variable can be a child node of several parent nodes (variables). The evaluation of the strength of dependency between the variables is critical for causality inference.
Two ways to calculate the strength of dependency are as follows: one is the Chi-Square Test for dependency. In the pgmpy package, ITests.chi_square( ) would return Chi statistics and p-value. The higher the Chi statistics, the lower the p-value, the stronger strength of dependen-cy. The other way is to use a score function such as k2, Bayesian Dirichlet equivalent uniform (BDeu), BDs, factorized Normalized Maximum Likelihood (fNML), or Bayesian Information Criterion (BIC), which measures how much a given variable is “influenced” by a given list of potential parents. The higher the score, the stronger the dependency.
The first scoring method calculates the dependency score of only two variables. However, the second scoring method can compute a score for either only two variables or the whole structure including a sequence of parent-child nodes. When computing the structure score, it relies on the probability distribution of the variables, as well as conditional probability between the variables. Therefore, both scoring methods can provide the strength of dependency in local and global perspectives.
In the illustrated example of
In the illustrated example, insight generator 1202 has access to a PGM 608 that has been obtained based on historic process data and product data processed for the injection molding represented in process graph 100A. An operator has selected the product data “Flash” feature descriptor node FD(i) from the GUI. In response to user selection of the “Flash” feature descriptor node FD(i), PGM processing module 1202 processes PGM 608 using context prediction generator 1206 to identify which of the feature descriptors FDs included in the process graph of
Referring again to
In the illustrated example, PGM processing module 1202 processes PGM 608 using insight generator 1202 to generate insights based on the current values of data nodes in the process graph 100A. In this regard,
In some examples, the GUI can also display image representations of measured data MD elements and/or feature descriptors FS. By way of example,
In some examples, the GUI can also display image representations of the industrial process overlaid with region of interest markers that correspond to detected faults and the process or data nodes that are identified as relevant to such defects. By way of example,
Some example aspects of the present disclosure are summarized in the following clauses.
Clause 1: Method or system to associate manufacturing process data to product data using measured data (MD), specified data (SD), and associated feature descriptors (FD) to construct application-specific causal networks from pre-configured process graphs for the purposes of automatically generating descriptive, predictive, and prescriptive insights, for manufacturing applications that can be communicated to process operators and process control systems for improved process decision making and performance improvements. In at least some examples, the application-specific causal networks are constructed without human input.
In some examples, expert judgements or user learning can be added to the causal networks if correctness is approved by the data
Clause 2: (Data collection interfaces) Clause 1 whereby the measured data (MD) and specified data (SD) for the process and product are collected with an edge device: inline at the machine through a PLC or other communication interface; inline during the production process from external sensors directly connected to the edge device; from manual operator inputs through a human-machine-interface; and/or from other factory data sources such as upstream machines and other process and user management systems.
Clause 3: (Product quality data) Method or system according to one or more of the previous clauses, whereby the specified data (SD) of the product may describe desired characteristics of the product preconfigured for the application, such as geometry, surface texture, and quality thresholds and the measured data (MD) of the product may be collected from one or more quality-based inspection devices, such as a machine vision sensor, colour measurement sensor, or 3D measurement sensor, and one or more algorithms compute associated quality-based metrics that can be used as feature descriptors (FD) (eg. using a trained machine learning model that computes ‘good’ and ‘defect’ quality labels).
Clause 4: (Standardized product data) Method or system according to one or more of the previous clauses, whereby the product has an associated CAD model and measured data (MD) collected from one or more machine vision sensor is standardized through a post-processing operation to remove undesired variabilities, such as pose variations and background changes, and the associated feature descriptors (FD) are derived from geometric characteristics of the product (eg. warpage, defect proximity, colors/temperature profiles at particular locations/paths).
Clause 5: (Feature descriptors) Method or system according to one or more of the previous clauses, whereby: the collected data is pre or post-processed using specialized algorithms to standardize/normalize the data applied to improve the compatibility of data within the causal network; trained machine learning models are applied to the pre/post/un-processed data to generate associated lower dimensional feature vectors or logit scores that can be used as feature descriptors (FD) within the causal network; and feature extraction techniques are applied to the pre/post/un-processed data to generate associated lower dimensional feature descriptors (FD) within the causal network.
Clause 6: (Transferability) Method or system according to one or more of the previous clauses, whereby the process data included in the causal network may include: one or more machines manufacturing the same product or similar product (eg. different materials, finishes, etc.); one or more machines manufacturing different products that incorporate similar processes; and/or one or more products from the same manufacturing process.
Clause 7: (Data processing devices) Method or system according to one or more of the previous clauses, whereby: the data collection, processing, and insight generation are completed inline at a factory with an edge device; the data collection is completed inline at a factory with an edge device and communicated to a connected cloud server that performs the processing and insight generation; and/or where the insight results are communicated from a cloud server to an edge device or human-machine-interface for operator interactions.
Clause 8: (Application-specific process graph) Method or system for creation of an application-specific process graph for manufacturing from a set of inputs containing measured data (MD) and specified data (SD) that is used to generate a causal network, where the process graph and causal network are generated from any combination of: Operator or subject-matter-expert inputs during configuration, either through the use of whitelists, blacklists, or connectivity graphs/matrices/maps; the use of Large-Language-Models (LLMs) to intelligently organize the process graph structure and to identify relevant connectivity graphs/matrices/maps based on learned application contexts; and/or the use of statistical algorithms to automatically identify relevant relationships and connections, such as Probabilistic Graphical Models (PGMs).
Clause 9: (Application-specific causal network) Method or system for development of an application-specific causal network from a process graph that associates process data and product data where the high-level causal network is expanded by incorporating further feature descriptor (FD) nodes, where feature descriptors are derived from applying customized feature extraction and feature selection pipelines or trained machine learning models to the measured data (MD) and specified data (SD) and inserting the resulting feature descriptor nodes into the causal network.
Clause 10: (Product insight) Method or system for leveraging an application-specific causal network to perform product focused analysis, such as product quality root cause investigations, and generate associated descriptive, predictive, and prescriptive manufacturing insights that can be communicated to process operators and process control systems for improved process decision making and performance improvements.
Clause 11: (Process insight) Method or system for leveraging an application-specific causal network to perform process focused analysis, such as process health investigations, and generate associated descriptive, predictive, and prescriptive manufacturing insights that can be communicated to process operators and process control systems for improved process decision making and performance improvements.
Clause 12: (Insight interface) Method or system of clause 10 or 11 whereby the generated manufacturing insights are: presented to the user through a local HMI device for in-factory actions, such as recommended operator interactions or recommended control action changes; presented to the user through a cloud-connected computing device accessing a cloud platform for the purposes of offline data exploration; presented to the machine control network for automatic adjustment of controller settings; and/or configured and generated through the cloud platform then communicated to a local edge device and/or HMI for in-factory actions, such as recommended operator actions or automated control actions.
Clause 13: (Insight generation) Method or system for leveraging an application-specific causal network to generate: descriptive insights for communicating process/product relationships/knowledge to a user; predictive insights for modeling process/product relationships and outcome prediction and communicating these aspects to a user; and/or prescriptive insights for recommending process improvements and risk mitigation to a user.
Clause 14: (LLM input/output interpretation layers) A natural language interpretation layer to transform expert knowledge into the causal network domain and to interpret contextualized causal data for interpretability and guided human intervention, whereby the interpretation layer leverages Large-Language-Model (LLM) machine learning architectures to: generate semantic causal structures from real-time natural expert knowledge capture system; generate semantic causal structures from publicly available literature; generate semantic embeddings for all elements of the contextualized causal network; and/or generate natural language explanations of context prediction and causal prediction in the formation of descriptive, predictive and prescriptive insights.
Clause 15: (LLM refinement) Clause 14 whereby the LLM is used to evaluate the long range historical relationships between generated insights and their impact on real-time operating environments to refine and adapt the data context for the purpose of improving the quality of interactions with the system users.
Although the present disclosure describes methods and processes with steps in a certain order, one or more steps of the methods and processes may be omitted or altered as appropriate. One or more steps may take place in an order other than that in which they are described, as appropriate. As used herein, statements that a second item (e.g., a signal, value, label, classification, attribute, scalar, vector, matrix, calculation) is “based on” a first item can mean that characteristics of the second item are affected or determined at least in part by characteristics of the first item. The first item can be considered an input to an operation or calculation, or a series of operations or calculations that produces the second item as an output that is not independent from the first item. Where possible, any terms expressed in the singular form herein are meant to also include the plural form and vice versa, unless explicitly stated otherwise. In the present disclosure, use of the term “a,” “an”, or “the” is intended to include the plural forms as well, unless the context clearly indicates otherwise. Also, the term “includes,” “including,” “comprises,” “comprising,” “have,” or “having” when used in this disclosure specifies the presence of the stated elements, but do not preclude the presence or addition of other elements.
Although the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product. A suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, a server, or a network device) to execute examples of the methods disclosed herein.
The present disclosure may be embodied in other specific forms without departing from the subject matter of the claims. The described example embodiments are to be considered in all respects as being only illustrative and not restrictive. Selected features from one or more of the above-described embodiments may be combined to create alternative embodiments not explicitly described, features suitable for such combinations being understood within the scope of this disclosure.
All values and sub-ranges within disclosed ranges are also disclosed. Also, although the systems, devices and processes disclosed and shown herein may comprise a specific number of elements/components, the systems, devices and assemblies could be modified to include additional or fewer of such elements/components. For example, although any of the elements/components disclosed may be referenced as being singular, the embodiments disclosed herein could be modified to include a plurality of such elements/components. The subject matter described herein intends to cover and embrace all suitable changes in technology.
The content of any publications identified in this disclosure are incorporated herein by reference.
This application claims the benefit and priority of U.S. Provisional Patent Application No. 63/516,733 “SYSTEM AND METHOD FOR IDENTIFYING PROCESS-TO-PRODUCT CAUSAL NETWORKS AND GENERATING PROCESS INSIGHTS” filed Jul. 31, 2023, the contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63516733 | Jul 2023 | US |