FAULT CRITICALITY ASSESSMENT USING NEURAL TWINS

Description

BACKGROUND

Advances in deep neural networks (DNNs) are driving the demand for domain-specific accelerators, including for data-intensive applications such as image classification and segmentation, voice recognition, and natural language processing. The ubiquitous application of DNNs has led to a rise in demand for custom artificial intelligence (AI) accelerators. Many such use-cases, including autonomous driving, require high reliability. Built-in self-test (BIST) can be used for enabling power-on self-test in order to detect in-field failures. However, DNN inferencing applications such as image classification are inherently fault-tolerant with respect to structural faults; it has been shown that many faults are not functionally critical, i.e., they do not lead to any significant error in inferencing. As a result, conventional pseudo-random pattern generation for targeting all faults with BIST is an “over-kill”. Therefore, it can be desirable to identify which nodes are critical for in-field testing to reduce overhead.

Functional fault testing is commonly performed during design verification of a circuit to determine how resistant a circuit architecture is to errors manifesting from manufacturing defects, aging, wear-out, and parametric variations in the circuit. Each node can be tested by manually injecting a fault to determine whether that node is functionally critical—in other words, whether it changes a terminal output (i.e., an output for the circuit architecture as a whole) for one or more terminal inputs (i.e., an input for the circuit architecture as a whole). Indeed, the functional criticality of a fault is determined by the severity of its impact on functional performance. If the node is determined to be critical, it can often degrade circuit performance or, in certain cases, eliminate functionality. Fault simulation of an entire neural network hardware architecture to determine the critical nodes is computationally expensive—taking days, months, years, or longer—due to large models and input data size. Therefore, it is desirable to identify mechanisms to reduce the time and computation expense of evaluating fault criticality while maintaining the accuracy of criticality evaluation. Brute-force fault simulation for determining fault criticality is computationally expensive due to many potential fault sites in the accelerator array and the dependence of criticality characterization of processing elements (PEs) on the functional input data. Supervised learning techniques can be used to accurately estimate fault criticality, but it requires ground truth for model training. The ground-truth collection involves extensive and computationally expensive fault simulations.

Therefore, there continues to be a need in the art for mechanisms to determine fault criticality of nodes in circuits such as for DNNs.

BRIEF SUMMARY

Fault criticality assessment using neural twins is provided. Techniques and systems are provided that can predict criticality of faults with minimal ground-truth data functional fault simulation.

A method of fault criticality assessment using neural twins includes converting a netlist into a neural twin by replacing each circuit element of the netlist with a neural-network-readable cell equivalent; and replacing each wire with a neural connection. Bias value adders are inserted at locations in the neural twin; and these bias value adders are used to apply a bias that represents a perturbation in the signal propagated by that connection. For each perturbed bias at a corresponding site selected to be perturbed, a loss value is calculated for the neural twin; and the site is classified, using a neural-twin-trained classifier, as critical or benign based on that loss value.

In some cases, the particular perturbed bias applied at each corresponding site selected to be perturbed involves determining a sign of a bias sensitivity computed for that selected site; if the sign is determined to be positive, applying a first bias corresponding to a stuck-at-one fault; and if the sign is determined to be negative, applying a second bias corresponding to a stuck-at-zero fault. By using the sign of bias sensitivity to determine the fault type for injection and simulation, it is possible to identify more critical fault locations using fewer fault simulation runs compared to a scenario where one injects both stuck-at fault types to determine which type results in a critical functional fault.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C illustrate a process that transforms a netlist into its neural twin for fault criticality assessment.

FIG. 2 illustrates a method of fault criticality analysis using a neural twin.

FIG. 3 shows a method of refining a neural twin using misclassification-driven training.

FIG. 4 illustrates a process flow diagram for training a decision tree for a particular neural twin.

FIGS. 5A-5C show an example implementation of neural-network-readable cell equivalent representations for use in neural twins.

FIG. 6 shows a potential mathematical representation of the neural twin of FIG. 1C.

FIG. 7 shows a system for performing fault criticality analysis using a neural twin.

FIG. 8 shows experimental results for fault criticality analysis using a neural twin.

DETAILED DESCRIPTION

Fault criticality assessment using neural twins is provided. Techniques and systems are provided that can predict criticality of faults with minimal ground-truth data functional fault simulation.

Neural twins are a promising tool that can be used to approximate circuits and test the circuit using far less computationally expensive hardware and methodologies. Furthermore, neural twins are end-to-end differentiable, allow backpropagation, and are fast inferencing. As the name implies, neural twins incorporate gate-level structural and functional information of the circuit to be tested to create approximations of the circuit to be tested that are more easily analyzed by computers, especially neural networks. Thus, a software simulator of a physical system is used to accelerate the analysis of the physical system.

The fault criticality assessment of structural faults using neural twins involves modeling of a neural twin for a particular circuit, modeling of neural-network-readable cell equivalents used for neural twins, training of the neural twin for evaluating functional fault criticality, and classifier training for the classifier used to classify fault criticality when evaluating fault criticality using a neural twin.

It should be understood that a structural fault is considered functionally critical if the structural fault leads to functional failure. For example, a functional failure can be evaluated in terms of the fault's impact on inferencing accuracy (for the inferencing use-case). A fault can be deemed to be benign if the fault does not affect the inferencing accuracy (for the inferencing use-case).

A method of fault criticality assessment using neural twins can include selecting sites at a neural twin of a particular netlist for perturbation of bias by computing a bias sensitivity of potential fault sites and selecting a number of fault sites as the selected sites according to the computed bias sensitivity; applying the perturbation of bias to each selected site via a corresponding bias value adder located in the neural twin; calculating a loss value for the neural twin corresponding to the application of the perturbation of bias for each selected site; and classifying, using a neural-twin-trained classifier, a particular site of the selected sites as critical or benign based on the loss value from perturbing the bias at that site. The resulting sites that are classified as critical can then be used in test generation and fault simulation software programs. Indeed, the sites classified as critical nodes can be used for applications of automatic test pattern generation (ATPG), a design for test application (e.g., for BIST), and test point insertion.

The described fault criticality assessment can be used in generating fault testing schemes for any application target. That is, a variety of circuits and their use-cases can be evaluated. These circuits can include any processing architecture (e.g., a systolic array of processing units) and associated deep learning application(s), including those used for training and inferencing. Examples include deep neural networks for image classification and segmentation (with applications to autonomous driving, manufacturing automation, and medical diagnostics as some examples), regression, voice recognition, and natural language processing.

The described fault criticality assessment can identify predicted critical nodes and these predicted critical nodes can be used in creating testing methodologies to determine if a particular instance of the circuit architecture can be used in a certain application, especially in the context of circuit architectures for neural networks.

By identifying the critical nodes, the testing methodologies for fault testing can be applied to those nodes identified by the described fault criticality assessment. By determining where critical nodes exist with further knowledge of what terminal outputs are necessary, a testing methodology can be created to ensure that the particular instance of the circuit architecture can be used for that certain application as well as the extent that testing must be performed (or extent of infrastructure on a chip is needed to be added such as for BIST). Testing can be useful both before deployment and after deployment to ensure continued functionality.

Advantageously, fewer computational resources (and corresponding time and/or chip area) are required to carry out functional fault testing.

FIGS. 1A-1C illustrate a process that transforms a netlist into its neural twin for fault criticality assessment. The modeling of a neural twin for a particular circuit can be carried out by the process reflected by FIGS. 1A-1C. FIG. 1A shows a representative circuit 100 of example simplified processing unit that includes combinational logic 102, pipelining 104, and combinational logic 106. A circuit, such as circuit 100, can be represented by a graph 120 as shown in FIG. 1B and modeled with a neural twin 130 as shown in FIG. 1C. As will be described in more detail with respect to FIG. 2, a netlist of circuit 100 can be converted to a neural twin 130 by replacing each circuit element of the netlist with a neural-network-readable cell equivalent; and replacing each wire with a neural connection. With reference to FIGS. 1A and 1B, every cell instance in the netlist (e.g., of circuit 100) is replaced by a vertex that is indicated by the corresponding standard cell and a wire connecting two standard cell instances in the netlist is mapped as an edge between corresponding vertices (e.g., resulting in neural connection between two neural-network-readable cell equivalents shown in FIG. 1C).

For example, referring to FIGS. 1A-1C, a two-input AND gate 108 of circuit 100 is represented in graph 120 as vertex AND₂122, which is replaced with the neural-network-readable cell equivalent, AND₂-NET 132 in the neural twin 130, and the wire Z3 110 connecting the two-input AND gate 108 to the next cell instance (e.g., Full Adder 112) can be replaced with an edge representing a neural connection to the subsequent vertex/neural-network-readable cell equivalent.

In the illustrative example, Full Adder 112, which is a combinational gate with multiple outputs, is converted/mapped to multiple neural-network-readable cell equivalents that share the same inputs as the original gate but that each have a different single output. For example, consider a Full Adder (S,CO)=FA(A,B,C) with inputs A, B, and C, and outputs S (sum) and CO (carry-out). The Full Adder 112 is replaced by two vertices (126, 128) in the graph 120 and corresponding neural-network-readable cell equivalents in the neural twin 130: S=FAs(A,B,C) and CO=FAco(A,B,C). Here, A, B, and C are inputs shared by both FAs 126 and FAco 128; S and CO are the outputs of FAs and FAco, respectively. FAs 126 and FAco 128 are replaced with FA-S-NET 136 and FA-CO-NET 138 in the neural twin 130.

Combinational logic of a two-input OR gate, a two-input XOR gate, two-input NOR gate, and a two-input XNOR gate are also replaced by the neural-network-readable cell equivalents (vertices labeled in FIG. 1B as OR₂, XOR, NOR₂, and XNOR, respectively; and neural-network-readable cell equivalents labeled in FIG. 1C as OR₂-NET, XOR-NET, NOR₂-NET, and XNOR-NET, respectively).

Since the neural twin 130 is intended to model the combinational logic (e.g., combinational logic 102 and combinational logic 106), the flip-flops in the pipelining 104 are replaced with buffers and inverters during the netlist-to-neural twin conversion. In the illustrative embodiment, let (Q,QN)=DFF(D,clk) indicate a flop with inputs D and clk (clock), and outputs Q and QN. The flop DFF is replaced by a buffer Q=BUF(D) and an inverter QN=INV(D). Here, D is tied to the inputs of both BUF and INV; Q and QN are the outputs of BUF and INV, respectively. In this way, it can be ensured that no sequential elements are present in the neural twin 130. In the illustrative example, it can be seen that each flip-flop is replaced with both a buffer and an inverter when both of those outputs connect to another cell; however, if only the Q output is used to connect to another cell, the flip-flop is replaced by only a buffer (“BUF”). Similarly, if only the QN output is used to connect to another cell, the flip-flop would be replaced by just an inverter (“INV”).

The neural-network-readable cell equivalents used to replace the various gates and other components of the circuit 100 may be obtained from a library of neural-network-readable cell equivalents of standard cells. The library may be associated with the program performing the conversion. An example process of creating/modeling neural-network-readable cell equivalents (with a corresponding example implementation of a neural network representation) is described with respect to FIGS. 5A-5C.

Turning to FIG. 1C, to use the neural twin for fault criticality assessment, bias value adders 145 are inserted at locations in the neural twin 130. Bias value adders 145 can specifically be located at each output of a neural-network-readable cell equivalent. Training of the neural twin 130 for evaluating fault criticality is described with respect to FIGS. 2 and 3 and classifier training for an example classifier used to classify fault criticality when evaluating fault criticality using a neural twin is described in detail with respect to FIG. 4.

The neural twin 130 network architecture is based on the topology of the graph 120 such that there exists a one-to-one physical correspondence between each wire (or fault site) in the netlist of circuit 100 and a neural connection in the neural twin 130 network. A bias, applied at the bias value adders 145, is associated with every neural connection between two neural-network-readable cell equivalents.

Bias value adders 145 are used to modify electrical characteristics at a particular point and the bias applied at a bias value adder 145 represents a perturbation in the signal being propagated along a neural connection. As shown by the legend 150, the output signal Zk of a neural-network-readable cell equivalent k is summed at bias value adder 145-K with a corresponding bias, biask. Since the network is modeled to have Boolean logic functional behavior, an activation function is used to constrain the summed value between 0 and 1, given as clp(zk +biask).

The fault criticality analysis using the neural twin 130 is described with respect to FIG. 2. Indeed, as will be described in more detail with respect to FIG. 2, the bias value adders 145 can allow a tester to introduce bias into the circuit to simulate faults at one or more fault sites. For example, a stuck-at-one fault can be simulated by applying a value of 1 as the bias at a bias value adder 145 (and clipping according to the activation function). Similarly, a stuck-at-zero fault can be simulated by applying a value of −1 as the bias at the bias value adder 145 (and clipping according to the activation function). Individual bias value adders allow a tester to test one or more potential fault sites independently or as a group. Normal circuit functionality can be simulated by adding a 0 to the output of the neural-network-readable cell equivalent node.

FIG. 2 illustrates a method of fault criticality analysis using a neural twin. Referring to FIG. 2, method 200 includes converting (202) a netlist into a neural twin. The netlist describes the connectivity of a circuit and may be in a format such as Verilog, VHDL, Spice, or other languages that can be synthesized and used by electronic design automation tools (e.g., before manufacturing preparation). The netlist can be of any hardware architecture and advantageously can be of an AI accelerator such as those designed for image processing. The process of converting (202) the netlist into a neural twin can be carried out such as described with respect to FIGS. 1A-1C, for example, by replacing (204) each circuit element of the netlist with a neural-network-readable cell equivalent and replacing (206) each wire with a neural connection. In some cases, replacing (204) each circuit element of the netlist with a neural-network-readable cell equivalent includes converting the netlist to a graph and assigning appropriate neural-network-readable cell equivalents to each vertex of the graph. The neural-network-readable cell equivalents can be obtained from a library of previously generated models. In some cases, a model may be generated at the time that the model is needed for a particular neural twin. Behavior, training, and examples of neural-network-readable cell equivalents are described with respect to FIGS. 5A-5C. As described in more detail with respect to FIGS. 5A-5C, a particular neural-network-readable cell equivalent can be generated by training a neural network using a truth table for a corresponding circuit element. The neural network can be any suitable structure with any suitable number of neurons. A nine-neuron network with three fully connected layers is shown in the example implementation of FIGS. 5A and 5B; however, more or fewer neurons and layers may be used.

Method 200 includes inserting (208) bias value adders at locations in the neural twin. Bias value adders can be inserted, for example, at an output of each of the neural-network-readable cell equivalents. Bias values can be initialized by a training process of the neural twin network (e.g., with a fault-free dataflow). The resulting neural twin network can then be used for fault criticality assessment.

Indeed, the method 200 includes selecting (210) sites at the neural twin for perturbation of bias. Selecting (210) the sites at the neural twin for perturbation of bias can be accomplished by computing a bias sensitivity of potential fault sites and selecting a number of fault sites according to the computed bias sensitivity. For example, the output of every circuit element can be considered a potential fault site; and, once all the bias sensitivities are calculated, a predetermined number (or other criteria) of fault sites having the largest bias sensitivities can be selected as the sites for the perturbation of bias.

In an example implementation, a misclassification-driven training (MDT) is used to calculate bias sensitivities and determine the sites to select. It should be understood that other methods, including various geometrical and statistical approaches, can be used to calculate bias sensitivities and determine the sites to select. For the example implementation described here, the MDT process can first begin with considering a fault-free functional dataflow through the circuit across inferencing cycles for a given application workload (e.g., images for an image classifier circuit). The neural twin network performs the fault-free functional dataflow and an approximation-loss value is computed, referred to herein as a fault-free loss value. The fault-free loss value represents the loss of the system when all biases are set to 0 (i.e., no stuck-at-one or stuck-at-zero faults perturbed in the system). The system performing the MDT can receive the fault-free loss value generated for the fault-free neural twin. During processes performed with respect to the fault-free neural twin, bias values (biask of FIG. 1C) can be computed, for example, by backpropagating the loss function value.

The following pseudocode illustrates an example procedure for obtaining the biask values for a processing element circuit, where ϕ_nt^PErefers to the mathematical representation of the neural twin of a particular processing element circuit (see e.g., FIG. 6), custom-character is the dataset of a given application workload, bs is batch size for the dataset, _psis the approximation-loss value, MSE refers to the mean-squared-error function, y_r,c,i^psis the floating-pint partial-sum output of the processing element (given as PE(r,c)) in the i-th inferencing cycle.

Input: ϕ_nt^PE, custom-character

, bs

Output: {bias_k} ← optimized biases of neural twin

Randomly initialize {bias_k};

for epoch ≤ N_epochdo

|
c ← 1, custom-character

_ps← 0;

|
for (D_j,f, y_r,c,j^ps) ∈ custom-character

do

|
|
ŷ_r,c,j^ps← ϕ_nt^PE(D_j,f);

|
|

custom-character

_ps←

backpropagate (ℒ_{ps}) t o compute gradients {\frac{\partial ℒ_{ps}}{\partial {bias}_{k}}}; Update {{bias}_{k}} based on

|
|
|
computed gradients using learning rate lr, c ← 1, custom-character

_ps← 0;

|
|
end

|
end

|
if custom-character

_ps> 0 then

|
|

backpropagate (ℒ_{ps}) t o compute gradients {\frac{\partial ℒ_{ps}}{\partial {bias}_{k}}}; Update {{bias}_{k}} based on

|
|
computed gradients using learning rate lr;

|
end

end

return {bias_k}

Accordingly, the system performing the MDT can also receive the bias values biask for each site. For each site in the neural twin network (e.g., corresponding to each bias adder), a bias sensitivity can be calculated by taking a gradient of the loss value with respect to the corresponding bias. As mentioned above, this bias sensitivity is used to select (210) the sites at the neural twin for perturbation of bias. For example, every site having a bias sensitivity calculated is a potential fault site and the potential fault sites can be ranked by an absolute value of the gradient corresponding to each of the potential fault sites. After ranking, one or more of the potential fault sites can be selected as the selected sites to test based on a criteria (singular or plural) with respect to the ranked potential fault sites. Potential criteria that may be used include, but are not limited to, a certain number (e.g., predetermined) of potential fault sites (selected in ranked order from largest gradients/bias sensitivity to smallest gradients/bias sensitivity), a certain percentage of potential fault sites (e.g., indicating the certain number selected in ranked order), and all potential fault sites with a bias sensitivity above a certain value/threshold.

For each selected site (e.g., of the potential fault sites), a perturbation of bias is applied (212). The perturbation of bias is applied in a manner to maximize the loss value. In one implementation, the bias applied at a particular site is based on the sign of the bias sensitivity computed for the site. For example, the bias is applied according to the sign of the bias sensitivity. That is, to maximize the loss value for a perturbed bias, the sign indicates whether the stuck at one fault or the stuck at zero fault would result in a larger loss value. Accordingly, applying (212) the perturbation of bias to each selected site comprises, for each selected site, determining a sign of a bias sensitivity computed for that selected site; if the sign is determined to be positive, applying a first bias corresponding to a stuck-at-one fault; and if the sign is determined to be negative, applying a second bias corresponding to a stuck-at-zero fault. By using the sign of bias sensitivity to determine the fault type for injection and simulation, it is possible to identify more critical fault locations using fewer fault simulation runs compared to a scenario where one injects both stuck-at fault types to determine which type results in a critical functional fault. For example, the sign of the bias sensitivity for each site can be determined; and when the sign is negative, the bias is set to −1 and when the sign is positive, the bias is set to +1. As explained with respect to FIG. 1C, a clipping function is used as an activation function for the bias value adders; accordingly, even with a −1 or 1 applied as the bias, the signal propagated by the neural connection will be between 0 and 1. In some cases, one fault site (via its corresponding bias value adder) is perturbed with a bias at a time.

Using the bias configured as described above (e.g., for a stuck at one or a stuck at zero fault), the method includes calculating (214) a loss value for the neural twin corresponding to the application of the perturbation of bias for each selected site. Based on the loss value, a selected site can be classified (216) as either critical or benign using a neural-twin-trained classifier. The neural-twin-trained classifier can be, for example, a decision tree (DT). Training of a decision tree implementation can be performed with methods as described in FIG. 4.

In some cases, the neural twin can be further refined during operation. That is, the bias values and the loss values can be updated not just as part of the original training of the neural twin, but also as a result of or after the selection process of potential fault sites. FIG. 3 shows a method of refining a neural twin using misclassification-driven training. Similar to processes described with respect to operations 210, 212, and 214 of FIG. 2, a refining method 300 can begin by receiving (302) a fault-free loss value for a fault-free neural twin and receiving (304) a corresponding bias of one or more particular fault sites. With the loss value for the fault-free neural twin and the corresponding bias from the one or more particular bias, a bias sensitivity can be calculated for each of the one or more particular fault sites by calculating (306) the gradient of the loss value with respect to the corresponding bias at each of the one or more fault sites. Based on the sign of the gradient, a stuck-at-one or stuck-at-zero fault can be perturbed (308) at each of the one or more fault sites. If the magnitude of the loss value as a result of the newly perturbed faults is greater than the magnitude of any previously recorded loss value for a particular fault site, then the value can be saved (310).

FIG. 4 illustrates a process flow diagram for training a decision tree for a particular neural twin. The process 400 can begin with performing (402) a functional fault simulation of a portion of the netlist or one or more circuits corresponding to one or more netlists. Coverage of the functional fault simulation can be partial or entire. In some cases, for a processing element circuit (e.g., for a DNN), coverage can be as low as 2% or 8% depending on bit size of processing elements of the netlist. Based on the functional fault simulation, ground-truth data can be collected (404). The ground-truth data can be, for example, a set 406 of data that notates sites that contain benign faults and a set 408 of data that notates sites that contain critical faults.

Once ground-truth is established, a neural network representation (e.g., the particular neural twin) can be trained on the labeled ground-truth data set, including the set 406 of data that notates sites that contain benign faults and the set 408 of data that notates sites that contain critical faults. The particular neural twin can have faults perturbed at one or more known fault sites, and sites from the set 406 of data that notates sites that contain benign faults can have faults perturbed (410) to establish (412) a maximum loss associated with benign faults or an upper limit for fault tolerance that is considered benign. Likewise, sites from the set 408 of data that notates sites that contain critical faults can have biases perturbed (414) to establish (416) a minimum loss associated with critical faults, or a lower limit of fault tolerance that is considered critical. For a given fault site, both a stuck-at-zero fault and a stuck-at-one fault is injected. Inferencing by the neural twin is used to obtain stuck-at-zero loss values across a workload and stuck-at-one loss values across the workload. For a stuck-at-zero fault, a −1 can be input to the bias value adder at a particular site. Similarly, for a stuck-at-one fault, a 1 can be input to the bias value adder at a particular site.

Biases can be perturbed in the particular neural twin using the bias value adders as described in FIG. 1C. An output of the neural twin (e.g., the loss value) as a result of the perturbed bias can be recorded. In some cases, instead of the output of the neural twin, a loss difference can be recorded instead, where a loss difference is a difference between the output (e.g., the loss value) of the neural twin with the perturbed bias and the output (e.g., the fault-free loss value) of the neural twin with no biases perturbed. The maximum loss between all perturbed faults can be stored as the final loss and can be used to train the decision tree classifier. Ultimately, a decision tree 418 can be produced based on the results of output of the neural twin at sites from the set 406 of data that notates sites that contain benign faults and sites from the set 408 of data that notates sites that contain critical faults.

Mathematically, this process can be described as follows. The error values are calculated for stuck-at-zero (s-a-0) and stuck-at-one (s-a-1) faults injected at location/in the neural twin (i.e., bias_l).

The error e_Ffor fault F ∈ {s-a-0, s-a-1} is calculated using:

$e_{F} = \frac{1}{m} \sum_{j = 1}^{m} MSE (ϕ_{n t}^{PE} (D_{j, f}; bias), y_{r, c, j}^{ps})$

The larger of the s-a-0 and s-a-1 errors is recorded as the input feature of the decision tree-based criticality classifier. The corresponding label is the ground-truth criticality c of the location l, where c=+1(−1) indicates the fault location l is critical (benign). The training procedure of DT attempts to obtain decision thresholds that can appropriately classify the training data set. A fault is classified using a decision tree based on different decision thresholds and eventually mapped to a leaf node in the tree which indicates the fault criticality (benign or critical). Multiple decision thresholds are required by the decision tree model in order to learn the non-linear class boundary separating the benign and critical faults. In the simulated example, the decision tree model was implemented as a classification and regression tree (CART).

The pseudocode is given as follows, where DT refers to decision tree,

Input: Neural-twin model ϕ_ne^PE, biases bias, data set custom-character

, criticality ground truth (l, c) € custom-character

Output: trained DT tree

tree =DecisionTree( ); /Initialize model*/

X ← ∅; /*Feature matrix*/

Y ← ∅; /*Label matrix*/

bias ← 0;

for
(l,c) ϵ T do

for
fault type F ϵ (s-a-0, s-a-1} do

bias[l] = −1 if F is s-a-0 else 1;

Obtain error e_Fusing custom-character

and bias ;

bias ← 0;

end

X.push(max(ϵ_s-a-0,ϵ_s-a-1)); /*Record the error value*/

Y.push(c); /*c ϵ {−1, +1)*/

end

tree.train(X, Y);

return tree

FIGS. 5A-5C show an example implementation of neural-network-readable cell equivalent representations for use in neural twins. FIG. 5A shows a two-input AND gate (AND2 gate 500) with corresponding neural-network-readable cell equivalent 505 trained on a two-input AND gate truth table (as shown); FIG. 5B shows an inverter (INV gate 510) with corresponding neural-network-readable cell equivalent 515 trained on an inverter truth table (as shown); and FIG. 5C illustrates operation of a neuron inside a neural-network-readable cell equivalent of the example implementation.

In the example implementation of FIGS. 5A and 5B, it can be seen that a neural network formed of three fully connected layers with four neurons in first layer, four neurons in second layer, and one neuron in the third layer that outputs the binary state of the gate being modeled can be used as a neural-network-readable cell equivalent. Here, neural-network-readable cell equivalent is mathematically represented by a non-linear function, ϕ_cell:

ϕ_cell(z_in)=σ_sig(A₃σ_sig(A₂σ_sig(A₁z_in+b₁)+b₂)+b₃)

where z_indenotes the binary input vector of size r_z; A₁, A₂, and A₃are affine transformations of size 4×r_z, 4×4, and 1×4 respectively; b₁, b₂, and b₃are bias vectors of the multi-layer perceptron network. The Sigmoid activation function present at the output of each neuron in this example implementation of a neural-network-readable cell equivalent is denoted by σ_sig(⋅). The sigmoid activation function (σ_sig) constrains the output to be a value from 0 to 1.

The input vector z_inis composed of the binary inputs to the standard cell. When z₁and z₂are binary inputs to a standard cell, z_in=[z₁, z₂]. The size of the input vector, r_z, is equal to the number of input ports of the standard cell being modeled. For example, when the cell being modeled is an AND2 gate 500, as shown in FIG. 5A, r_zis 2; and when the cell being modeled is an INV gate 510, as shown in FIG. 5B, r_zis 1.

Turning to FIG. 5C, a single neuron of a cell-net can include parameters of vector a and scalar b, which denote weight vector and bias, respectively, which are used to relate the output to the input.

A neural-network-readable cell equivalent can be trained by any suitable method using a set of Boolean inputs as features and a single Boolean output as a label. The training can also include noisy data—random noise can be added. For example, as shown in FIG. 5A, a two-input AND gate has 4 rows in its corresponding truth table. A large number—for example 2500 —of “noisy variants” can be created for each row by adding a “noise value”—a random value between 0 and some predetermined upper limit less than 1—to each of the binary inputs only that all map to the same binary output. When a neural-network-readable cell equivalent is appropriately trained, the neural-network-readable cell equivalent can be added to a library of neural-network-readable cell equivalents.

FIG. 6 shows a potential mathematical representation of the neural twin of FIG. 1C. A mathematical representation of a neural twin can be formed and saved for use elsewhere such as by software programs for neural networks. The mathematical representation can be a function comprising one or more circuit inputs, one or more variables, and one or more circuit outputs. The variables can represent values at various points in the circuit, for example electrical nodes between gates. The variables can be equations that relate bias inputs with circuit inputs, neural-network-readable cell equivalent function outputs, or both circuit inputs and neural-network-readable cell equivalent function outputs. The neural-network-readable cell equivalent function outputs can be a function of either one or more variables, one or more circuit inputs, or a combination thereof.

FIG. 7 shows a system for performing fault criticality analysis using a neural twin. For example, a computing device embodied as system 700 can be used to execute a fault criticality assessment, including the processes described with respect to FIGS. 1A-1C, 2, 3, 4, and 5A-5C. It should be understood that aspects of the system described herein are applicable to both mobile and traditional desktop computers, as well as server computers and other computer systems.

For example, system 700 includes a processor 705 (e.g., CPU, GPU, FPGA) that processes data according to instructions of various software programs, including software instructions 710 for performing fault criticality assessment using neural twins as described herein, stored in memory 715.

Memory 715 can be one or more of any suitable computer-readable storage medium including, but not limited to, volatile memory such as random-access memories (RAM, DRAM, SRAM); non-volatile memory such as flash memory, various read-only-memories (ROM, PROM, EPROM, EEPROM), phase change memory, magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM), and magnetic and optical storage devices (hard drives, magnetic tape, CDs, DVDs). As used herein, in no case does the memory 715 consist of transitory propagating signals.

As mentioned above, memory 715 can store instructions 710 for performing fault criticality assessment using neural twins as described herein. Instructions 710 may include instructions for method 200 described with respect to FIG. 2, method 300 for refinement of the neural twin, process 400 for training of a decision tree described with respect to FIG. 4, and creating and training neural-network-readable cell equivalent representations as described with respect to FIGS. 5A-5C. In some cases, instructions 710 further include instructions for ATPG, design for test application (e.g., BIST scan chain and other design for test element insertion/electronic design automation tool), and/or test point insertion. Memory 715 can also store data structures, for example, for the library 720 of neural-network-readable cell equivalents.

System 700 may also include a radio/network interface 725 that performs the function of transmitting and receiving radio frequency communications. The radio/network interface 725 facilitates wireless connectivity between system 700 and the “outside world,” via a communications carrier or service provider. The radio/network interface 725 allows system 700 to communicate with other computing devices, including server computing devices and other client devices, over a network.

In various implementations, data/information used by and/or stored via the system 700 may include data caches stored locally on the device or the data may be stored on any number of storage media that may be accessed by the device via the radio/network interface 725 or via a wired connection between the device and a separate computing device associated with the device, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated such data/information may be accessed through the device via the radio/network interface 725 or a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.

System 700 can also include user interface system 730, which may include input and output devices and/or interfaces such as for audio, video, touch, mouse, and keyboard. Visual output can be provided via a display that may present graphical user interface (“GUI”) elements, text, images, video, notifications, virtual buttons, virtual keyboards, circuit layout, and any other information that is capable of being presented in a visual form.

System 700 can further include a neural network module 735. The neural network module 735 can include a dedicated processor and memory or use the processor 705 and memory 715. The memory of the neural network module can include code for execution of training methods as well as weights and models used by the neural network. The code for execution of training methods as well as weights and models used by the neural network can also be stored in the memory 715 for the rest of the system 700.

Accordingly, embodiments of the subject invention may be implemented as a computer process, a computing system, or as an article of manufacture, such as a computer program product or computer-readable storage medium. Certain embodiments of the invention contemplate the use of a machine in the form of a computer system within which a set of instructions, when executed, can cause the system to perform any one or more of the methodologies discussed above, including providing a software tool or a set of software tools that can be used during the physical design and test pattern generation of integrated circuits and/or printed circuit boards and/or system level design. The set of instructions for the software tool can be stored on a computer program product, which may be one or more computer readable storage media readable by a computer system and encoding a computer program including the set of instructions and other data associated with the software tool.

By way of example, and not limitation, computer-readable storage media may include volatile and non-volatile memory, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Examples of computer-readable storage media include volatile memory such as random-access memories (RAM, DRAM, SRAM); non-volatile memory such as flash memory, various read-only-memories (ROM, PROM, EPROM, EEPROM), phase change memory, magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM), and magnetic and optical storage devices (hard drives, magnetic tape, CDs, DVDs). As used herein, in no case does the term “storage media” consist of transitory propagating signals.

FIG. 8 shows experimental results for fault criticality analysis using a neural twin. A percentage classification accuracy is denoted by A_cand a percentage of critical fault locations misclassified as benign (i.e., test escapes) is denoted by M_c. Note that f % of labeled ground truths were randomly drawn as training data for the decision tree and experiments were repeated 10 times. Mean accuracy and standard deviation results were obtained (listed in the table in parentheses). The neural twin-based framework achieves A_c=100% and M_c=0% misclassification of critical faults by using the decision tree trained on only 8% of the labeled data for a 32-bit processing element (PE) (T32) and 5% of the labeled data for a 16-bit PE (T16). 8% of T32 corresponds to only 2% of all fault sites in the 32-bit PE. 5% of T16 corresponds to only 1.5% of all fault sites in the 16-bit PE.

Although the subject matter has been described in language specific to structural features and/or acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as examples of implementing the claims and other equivalent features and acts are intended to be within the scope of the claims.

Claims

1. A method of fault criticality assessment using neural twins, the method comprising: converting a netlist into a neural twin by: replacing each circuit element of the netlist with a neural-network-readable cell equivalent; andreplacing each wire with a neural connection;inserting bias value adders at locations in the neural twin;selecting sites at the neural twin for perturbation of bias;applying the perturbation of bias to each selected site;calculating a loss value for the neural twin corresponding to the applying of the perturbation of bias for each selected site; andclassifying, using a neural-twin-trained classifier, a particular site of the selected sites as critical or benign based on the loss value from perturbing the bias at that site.
2. The method of claim 1, wherein selecting sites at the neural twin for perturbation of bias comprises: computing a bias sensitivity of potential fault sites; andselecting a number of fault sites as the selected sites according to the computed bias sensitivity.
3. The method of claim 2, wherein computing the bias sensitivity of potential fault sites comprises: receiving a fault-free loss value for a fault-free neural twin;receiving a corresponding bias of each of a plurality of potential fault sites; andcalculating a gradient of the fault-free loss value with respect to the corresponding bias at each of the plurality of potential fault sites, the gradient indicating the bias sensitivity; and
4. The method of claim 3, wherein the criteria with respect to the ranked potential fault sites comprises a certain number of potential fault sites, a certain percentage of potential fault sites, or all potential fault sites with a bias sensitivity above a certain value.
5. The method of claim 1, wherein applying the perturbation of bias to each selected site comprises for each selected site: determining a sign of a bias sensitivity computed for that selected site;if the sign is determined to be positive, applying a first bias corresponding to a stuck-at-one fault; andif the sign is determined to be negative, applying a second bias corresponding to a stuck-at-zero fault.
6. The method of claim 1, wherein replacing each circuit element of the netlist with the neural-network-readable cell equivalent comprises obtaining the neural-network-readable cell equivalent from a library of neural-network-readable cell equivalents trained on truth tables associated with a corresponding circuit element.
7. The method of claim 1, wherein the neural-twin-trained classifier is a decision tree.
8. The method of claim 1, wherein the netlist is of an AI accelerator designed for image processing.
9. The method of claim 1, wherein inserting the bias value adders at the locations within the neural twin comprises inserting the bias value adder at an output of each of the neural-network-readable cell equivalents.
10. The method of claim 1, further comprising: generating the neural-network-readable cell equivalents by, for each neural-network-readable cell equivalent, training a neural network using a truth table for a corresponding circuit element.
11. A computer-readable medium having instructions stored thereon that when executed by a computing device, direct the computing device to: convert a netlist into a neural twin by: replacing each circuit element of the netlist with a neural-network-readable cell equivalent; andreplacing each wire with a neural connection;insert bias value adders at locations in the neural twin;select sites at the neural twin for perturbation of bias;apply the perturbation of bias to each selected site;calculate a loss value for the neural twin corresponding to the application of the perturbation of bias for each selected site; andclassify, using a neural-twin-trained classifier, a particular site of the selected sites as critical or benign based on the loss value from perturbing the bias at that site.
12. The computer-readable medium of claim 11, further comprising instructions that direct the computing device to: perform automatic test pattern generation, design for test application, or test point insertion based on selected sites classified as critical.
13. The computer-readable medium of claim 11, wherein the instructions to select the sites at the neural twin for perturbation of bias direct the computing device to: compute a bias sensitivity of potential fault sites; andselect a number of fault sites as the selected sites according to the computed bias sensitivity.
14. The computer-readable medium of claim 13, wherein the instructions to compute the bias sensitivity of potential fault sites direct the computing device to: receive a fault-free loss value for a fault-free neural twin;receive a corresponding bias of each of a plurality of potential fault sites; andcalculate a gradient of the fault-free loss value with respect to the corresponding bias at each of the plurality of potential fault sites, the gradient indicating the bias sensitivity; and
15. The computer-readable medium of claim 14, wherein the criteria with respect to the ranked potential fault sites comprises a certain number of potential fault sites, a certain percentage of potential fault sites, or all potential fault sites with a bias sensitivity above a certain value.
16. The computer-readable medium of claim 11, wherein the instructions to apply the perturbation of bias to each selected site direct the computing device to, for each selected site: determine a sign of a bias sensitivity computed for that selected site;if the sign is determined to be positive, apply a first bias corresponding to a stuck-at-one fault; andif the sign is determined to be negative, apply a second bias corresponding to a stuck-at-zero fault.
17. The computer-readable medium of claim 11, wherein the neural-twin-trained classifier is a decision tree.
18. The computer-readable medium of claim 11, wherein the netlist is of an AI accelerator designed for image processing.
19. The computer-readable medium of claim 11, wherein the instructions to insert the bias value adders at the locations within the neural twin comprises direct the computing device to insert the bias value adder at an output of each of the neural-network-readable cell equivalents.
20. The computer-readable medium of claim 11, further comprising instructions to: generate the neural-network-readable cell equivalents by, for each neural-network-readable cell equivalent, training a neural network using a truth table for a corresponding circuit element.

FAULT CRITICALITY ASSESSMENT USING NEURAL TWINS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims