Life Cycle Assessment (LCA) is a process that evaluates the environmental impacts of products and services throughout their full life cycle. By providing a comprehensive view of the environmental impacts of a product or service, LCA helps quantify sustainability. For instance, LCA can be applied to quantify sustainability in cloud computing hardware. Accurate LCA for computer hardware equipment can enable an organization to better understand the environmental effects of its large computing systems such as cloud data centers. This understanding can help organizations make adjustments to purchasing, deployment, and utilization of such cloud hardware to better achieve sustainability goals.
A computing system for predicting substitute components of an assembled product is provided. According to one aspect, the computing system comprises a processor having associated memory storing instructions that cause the processor to execute a model training program configured to generate a component graph including a respective node for each of a plurality of component IDs of the assembled product. The component graph includes node embeddings for each node, in which the node embeddings are computed such that nodes with similar features are closer in a node embedding space than nodes with dissimilar features. The processor is further configured to execute an embeddings comparison algorithm to compare embeddings of the component IDs to thereby identify a substitute component ID for a target node on the graph. The LCA program can be further configured to execute a life cycle analysis (LCA) program configured to compute an LCA result using an LCA algorithm based at least upon the substitute component ID.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
As discussed above, conducting an accurate Life Cycle Assessment (LCA) for an assembled product can help organizations better understand the environmental effects of the deployment and usage of such product. For example, one typical step of LCA is to determine an environmental effect such as the greenhouse gas emissions for a product. However, accurately estimating the greenhouse gas emissions produced during the life cycle of a product is a particularly challenging task. One approach to estimate a product's greenhouse gas emissions is by utilizing Bill of Materials (BOM) information, environmental impact data, and a Full Material Disclosures (FMD) of the components listed in the BOM. The FMD data is typically used with external environmental impact datasets to calculate the final environmental impact of a component. Conventional approaches in this technical field typically use custom metrics to compare BOMs and assume that a part database is well organized and can be compared easily. However, it is found that these assumptions do not hold at a large scale since not all BOMs contain the FMD data and other relevant properties and there are missing or incorrect properties. When BOM data is incomplete or inaccurate, LCA is difficult to perform with accuracy using these conventional techniques. For mass deployments of hardware, such as in a cloud data center, even small errors in BOM data can result in large errors in the greenhouse gas predictions for the entire fleet of deployed computer hardware equipment.
As schematically illustrated in
Continuing with
Based on the received component data 22, the model training program 18 is configured to generate, via a component graph generator 30 and a feature extractor 32, an initial instance of the component graph 34 with node-wise feature vectors. Turning briefly to
Turning back to
Referring back to
Thus, the positive and negative match training data pairs 38A, 38B are pairs of node-wise feature vectors 34 computed during the unsupervised learning stage 120. The model training program 18 is configured to perform the machine learning (ML) model training 40 based on the obtained positive match training data pairs 38A and negative match training data pairs 38B to output the node embeddings 62 for each of the component IDs in the component graph from the trained ML model. The ML model may have an optimizer 178 which optimizes the model through evaluation 174.
In addition to the unsupervised and semi-supervised learning discussed above, the model training program 18 is further configured to perform graph embedding model training 40 through feedback training. Turning briefly to
Turning back to
The processor 12 is further configured to receive a source LCA dataset 70 that includes the component IDs 26 along with Full Material Disclosures (FMD) data 74 and environmental impact data, in which there is no FMD data or missing FMD data for the target node in a source LCA dataset. In the depicted example, there is no FMD data or missing FMD data for the component ID #12347 in the source LCA dataset 70. The processor 12 is configured to execute a program to implement an embeddings comparison algorithm 78 to compare embeddings of the component IDs to thereby identify a substitute component ID 82 for the target node on the graph. In this example, the substitute component ID #12367 is identified for the component ID #12347 for substitution. Turning briefly to
Table 1 of
In addition, an ablation study was conducted to study the impact of the negative match training data pairs 38B with random negative sample generation using a computer-assisted random generator. Results of this study are shown in Table 2 of
To generate the component graph with node embeddings for each node above, the following steps 304-316 are performed. At step 304, the method may include receiving component data including component hierarchical data, component IDs, and meta data of properties of each of the component IDs.
Continuing from step 304 to step 306, the method 300 may include generating an initial instance of the component graph based on the component hierarchical data and generating a vector representation for each node of the component graph.
Advancing from step 306 to step 308, the method 300 may include generating tokenized features based on the component IDs and the meta data of properties of the components indicated by each of the component IDs.
Proceeding from step 308 to step 310, the method 300 may include concatenating the vector representation for each node and the tokenized features to generate node-wise feature vectors for each of the component IDs.
Continuing from step 310 to step 312, the method 300 may include generating a training data set by obtaining positive match training data pairs and negative match training data pairs for each of a plurality of pairs of the node-wise feature vectors, wherein the positive match training pairs and negative match training pairs are pairs of component IDs for which the respective concatenated feature vectors negatively and positively match according to a data distance algorithm.
Advancing from step 312 to step 314, the method 300 may include training a machine learning (ML) model based on the positive match training data pairs and negative match training data pairs.
Proceeding from step 314 to step 316, the method 300 may include outputting the node embeddings for each of the component IDs in the component graph from the trained ML model.
At step 318, the method 300 may further include executing, via a life cycle analysis (LCA) program, an embeddings comparison algorithm to compare embeddings of the component IDs to thereby identify a substitute component ID for a target node on the component graph.
Proceeding from step 318 to step 320, the method 300 may include computing an LCA result using an LCA algorithm based at least upon the substitute component ID.
The computing system 100 and method 300 described herein provide mechanisms for predicting substitute components of the assembled product. These systems and methods can be used to aid Life Cycle Assessment (LCA) and enable an organization to provide a better assessment of the sustainability of its cloud hardware deployments at scale, even when the organization only possesses incomplete or inaccurate data for some components of the assembled product. By predicting substitute components for which valid data useful in the LCA analysis exists to be used in place of the incomplete or inaccurate data, organizations can perform LCA analysis more accurately to thereby better understand the environmental impact of the assembled product. This enables such organizations to make decisions that reduce the overall environmental impact of the assembled product the organizations deploy by selecting equipment with a reduced environmental impact, minimizing waste, and using their equipment in a manner that increases its lifespan, for example.
In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.
Computing system 600 includes a logic processor 602 volatile memory 604, and a non-volatile storage device 606. Computing system 600 may optionally include a display subsystem 608, input subsystem 610, communication subsystem 612, and/or other components not shown in
Logic processor 602 includes one or more physical devices configured to execute instructions. For example, the logic processor may be configured to execute instructions that are part of one or more applications, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.
The logic processor may include one or more physical processors (hardware) configured to execute software instructions. Additionally or alternatively, the logic processor may include one or more hardware logic circuits or firmware devices configured to execute hardware-implemented logic or firmware instructions. Processors of the logic processor 602 may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic processor optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic processor may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration. In such a case, these virtualized aspects are run on different physical logic processors of various different machines, it will be understood.
Non-volatile storage device 606 includes one or more physical devices configured to hold instructions executable by the logic processors to implement the methods and processes described herein. When such methods and processes are implemented, the state of non-volatile storage device 606 may be transformed—e.g., to hold different data.
Non-volatile storage device 606 may include physical devices that are removable and/or built in. Non-volatile storage device 606 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., ROM, EPROM, EEPROM, FLASH memory, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), or other mass storage device technology. Non-volatile storage device 606 may include nonvolatile, dynamic, static, read/write, read-only, sequential-access, location-addressable, file-addressable, and/or content-addressable devices. It will be appreciated that non-volatile storage device 606 is configured to hold instructions even when power is cut to the non-volatile storage device 606.
Volatile memory 604 may include physical devices that include random access memory. Volatile memory 604 is typically utilized by logic processor 602 to temporarily store information during processing of software instructions. It will be appreciated that volatile memory 604 typically does not continue to store instructions when power is cut to the volatile memory 604.
Aspects of logic processor 602, volatile memory 604, and non-volatile storage device 606 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.
The terms “module,” “program,” and “engine” may be used to describe an aspect of computing system 600 typically implemented in software by a processor to perform a particular function using portions of volatile memory, which function involves transformative processing that specially configures the processor to perform the function. Thus, a module, program, or engine may be instantiated via logic processor 602 executing instructions held by non-volatile storage device 606, using portions of volatile memory 604. It will be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.
When included, display subsystem 608 may be used to present a visual representation of data held by non-volatile storage device 606. The visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the non-volatile storage device, and thus transform the state of the non-volatile storage device, the state of display subsystem 608 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 608 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic processor 602, volatile memory 604, and/or non-volatile storage device 606 in a shared enclosure, or such display devices may be peripheral display devices.
When included, input subsystem 610 may comprise or interface with one or more user-input devices such as a keyboard, mouse, camera, microphone, or touch screen, for example.
When included, communication subsystem 612 may be configured to communicatively couple various computing devices described herein with each other, and with other devices. Communication subsystem 612 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 600 to send and/or receive messages to and/or from other devices via a network such as the Internet.
The following paragraphs provide additional support for the claims of the subject application. One aspect provides a computing system for predicting substitute or missing parts of an assembled product. The computing system may include a processor having associated memory storing instructions that cause the processor to execute a model training program configured to generate a component graph including a respective node for each of a plurality of component IDs of the assembled product. The component graph may include node embeddings for each node, in which the node embeddings may be computed such that nodes with similar features are closer in a node embedding space than nodes with dissimilar features. The processor may be further configured to execute an embeddings comparison algorithm to compare embeddings of the component IDs to thereby identify a substitute component ID for a target node on the component graph.
According to this aspect, the processor may be further configured to execute a life cycle analysis (LCA) program configured to compute an LCA result using an LCA algorithm based at least upon the substitute component ID.
According to this aspect, to generate the component graph, the model training program may be further configured to receive component data including component hierarchical data, component IDs, and meta data of properties of each of the component IDs. The model training program may be further configured to generate an initial instance of the component graph based on the component hierarchical data. The model training program may be further configured to generate a vector representation for each node of the component graph. The model training program may be further configured to generate tokenized features based on the component IDs and the meta data of properties of the components indicated by each of the component IDs. The model training program may be further configured to concatenate the vector representation for each node and the tokenized features to generate the initial instance of the component graph with node-wise feature vectors for each of the component IDs.
According to this aspect, to generate the component graph, the model training program may be further configured to generate a training data set by obtaining positive match training data pairs and negative match training data pairs for each of a plurality of pairs of the node-wise feature vectors, wherein the positive match training pairs and negative match training pairs are pairs of component IDs for which the respective concatenated feature vectors negatively and positively match according to a data distance algorithm. The model training program may be further configured to train a machine learning (ML) model based on the positive match training data pairs and negative match training data pairs. The model training program may be further configured to output the node embeddings for each of the component IDs in the component graph from the trained ML model.
According to this aspect, there may be no Full Material Disclosures (FMD) data or missing FMD data for the target node in a source LCA dataset.
According to this aspect, the FMD data may be present for the substitute component ID in a modified LCA dataset, in which the LCA model may be fed the modified LCA dataset as input to produce the LCA result.
According to this aspect, the model training program may be further configured to generate the initial instance of the component graph based on the component hierarchical data by identifying a common component and linking a plurality of subgraphs of units of the assembled product equipment via the common component.
According to this aspect, the positive match training data pairs may be obtained when a Jaccard distance is larger than a positive match threshold, and the negative match training data pairs may be obtained when a Jaccard distance is larger than a minimum threshold and less than a negative match threshold for the pair of component IDs.
According to this aspect, the positive match training data pairs may further include the pairs of component IDs included in a ground truth replacement component data set.
According to this aspect, the processor may be further configured to display a graphical user interface (GUI) including a part sourcing tool that shows the substitute component ID with the LCA result, along with other components with similarity scores relative to within a predetermined range and other LCA results for each component, and further including an order selector enabling a user to order the substitute component or one of the other components.
According to this aspect, the processor may be further configured to display a graphical user interface (GUI) including a data validity tool that identifies that the LCA result for the substitute component is an outlier as compared to other LCA scores for other components having similarity scores within a predetermined range with respect to the substitute component.
According to another aspect of the present disclosure, a computerized method for predicting substitute or missing parts of an assembled product is provided. According to this aspect, the computerized method may include generating, via a model training program, a component graph including a respective node for each of a plurality of component IDs of the assembled product, the component graph including node embeddings for each node, the node embeddings being computed such that nodes with similar features are closer in a node embedding space than nodes with dissimilar features. The computerized method may further include executing an embeddings comparison algorithm to compare embeddings of the component IDs to thereby identify a substitute component ID for a target node on the component graph.
According to this aspect, the computerized method may further include computing, via a life cycle analysis (LCA) program, an LCA result using an LCA algorithm based at least upon the substitute component ID.
According to this aspect, the computerized method may further include, to generate the component graph, receiving component data including component hierarchical data, component IDs, and meta data of properties of each of the component IDs. The computerized method may further include generating an initial instance of the component graph based on the component hierarchical data. The computerized method may further include generating a vector representation for each node of the component graph. The computerized method may further include generating tokenized features based on the component IDs and the meta data of properties of the components indicated by each of the component IDs. The computerized method may further include concatenating the vector representation for each node and the tokenized features to generate node-wise feature vectors for each of the component IDs.
According to this aspect, the computerized method may further include, to generate the component graph, generating a training data set by obtaining positive match training data pairs and negative match training data pairs for each of a plurality of pairs of the node-wise feature vectors, wherein the positive match training pairs and negative match training pairs are pairs of component IDs for which the respective concatenated feature vectors negatively and positively match according to a data distance algorithm. The computerized method may further include training a machine learning (ML) model based on the positive match training data pairs and negative match training data pairs. The computerized method may further include outputting the node embeddings for each of the component IDs in the component graph from the trained ML model.
According to this aspect, the component graph may be generated based on the component hierarchical data by identifying a shared component and linking a plurality of subgraphs of units of the assembled product via the shared component.
According to this aspect, the positive match training data may be obtained when a Jaccard distance is larger than a positive match threshold; and the negative match training data pairs may be obtained when a Jaccard distance is larger than a minimum threshold and less than a negative match threshold for the pair of component IDs.
According to this aspect, the positive match training data pairs may further include the pairs of component IDs included in a ground truth replacement component data set.
According to this aspect, there may be no Full Material Disclosures (FMD) data or missing FMD data for the target node in a source LCA dataset.
According to another aspect of the present disclosure, a computer system for predicting substitute or missing parts of an assembled product is provided. The computing system may include a processor having associated memory storing instructions that cause the processor to execute a model training program. The model training program may be configured to receive component data including component hierarchical data, component IDs, and meta data of properties of each of the component IDs. The model training program may be further configured to generate an initial instance of component graph based on the component hierarchical data. The model training program may be further configured to generate tokenized features based on the component IDs and the meta data of properties of the components indicated by each of the component IDs. The model training program may be further configured to generate node-wise feature vectors for each of the component IDs based on the initial instance of the component graph and the tokenized features. The model training program may be further configured to generate a training data set by obtaining positive match training data pairs and negative match training data pairs for each of a plurality of pairs of the node-wise feature vectors, in which the positive match training pairs and negative match training pairs may be pairs of component IDs for which the respective concatenated feature vectors negatively and positively match according to a data distance algorithm. The model training program may be further configured to train a machine learning (ML) model based on the positive match training data pairs and negative match training data pairs to output the node embeddings for each of the component IDs in the component graph from the trained ML model. The model training program may be further configured to generate, via the trained ML model, the component graph including a respective node for each of a plurality of component IDs of the assembled product based on the component data, the component graph including node embeddings for each node, the node embeddings being computed such that nodes with similar features are closer in a node embedding space than nodes with dissimilar features. The processor may be further configured to execute an embeddings comparison algorithm to compare embeddings of the component IDs to thereby identify a substitute component ID for a target node on the graph. The processor may be further configured to execute a life cycle analysis (LCA) program configured to compute an LCA result using an LCA algorithm based at least upon the substitute component ID.
It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.