Advances in deep neural networks (DNNs) are driving the demand for domain-specific accelerators, including for data-intensive applications such as image classification and segmentation, voice recognition, and natural language processing. The ubiquitous application of DNNs has led to a rise in demand for custom artificial intelligence (AI) accelerators. Many such use-cases, including autonomous driving, require high reliability. Built-in self-test (BIST) can be used for enabling power-on self-test in order to detect in-field failures. However, DNN inferencing applications such as image classification are inherently fault-tolerant with respect to structural faults; it has been shown that many faults are not functionally critical, i.e., they do not lead to any significant error in inferencing. As a result, conventional pseudo-random pattern generation for targeting all faults with BIST is an “over-kill”. Therefore, it can be desirable to identify which nodes are critical for in-field testing to reduce overhead.
Functional fault testing is commonly performed during design verification of a circuit to determine how resistant a circuit architecture is to errors manifesting from manufacturing defects, aging, wear-out, and parametric variations in the circuit. Each node can be tested by manually injecting a fault to determine whether that node is functionally critical—in other words, whether it changes a terminal output (i.e., an output for the circuit architecture as a whole) for one or more terminal inputs (i.e., an input for the circuit architecture as a whole). Indeed, the functional criticality of a fault is determined by the severity of its impact on functional performance. If the node is determined to be critical, it can often degrade circuit performance or, in certain cases, eliminate functionality. Fault simulation of an entire neural network hardware architecture to determine the critical nodes is computationally expensive—taking days, months, years, or longer—due to large models and input data size. Therefore, it is desirable to identify mechanisms to reduce the time and computation expense of evaluating fault criticality while maintaining the accuracy of criticality evaluation. Brute-force fault simulation for determining fault criticality is computationally expensive due to many potential fault sites in the accelerator array and the dependence of criticality characterization of processing elements (PEs) on the functional input data. Supervised learning techniques can be used to accurately estimate fault criticality, but it requires ground truth for model training. The ground-truth collection involves extensive and computationally expensive fault simulations.
Therefore, there continues to be a need in the art for mechanisms to determine fault criticality of nodes in circuits such as for DNNs.
Fault criticality assessment using neural twins is provided. Techniques and systems are provided that can predict criticality of faults with minimal ground-truth data functional fault simulation.
A method of fault criticality assessment using neural twins includes converting a netlist into a neural twin by replacing each circuit element of the netlist with a neural-network-readable cell equivalent; and replacing each wire with a neural connection. Bias value adders are inserted at locations in the neural twin; and these bias value adders are used to apply a bias that represents a perturbation in the signal propagated by that connection. For each perturbed bias at a corresponding site selected to be perturbed, a loss value is calculated for the neural twin; and the site is classified, using a neural-twin-trained classifier, as critical or benign based on that loss value.
In some cases, the particular perturbed bias applied at each corresponding site selected to be perturbed involves determining a sign of a bias sensitivity computed for that selected site; if the sign is determined to be positive, applying a first bias corresponding to a stuck-at-one fault; and if the sign is determined to be negative, applying a second bias corresponding to a stuck-at-zero fault. By using the sign of bias sensitivity to determine the fault type for injection and simulation, it is possible to identify more critical fault locations using fewer fault simulation runs compared to a scenario where one injects both stuck-at fault types to determine which type results in a critical functional fault.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Fault criticality assessment using neural twins is provided. Techniques and systems are provided that can predict criticality of faults with minimal ground-truth data functional fault simulation.
Neural twins are a promising tool that can be used to approximate circuits and test the circuit using far less computationally expensive hardware and methodologies. Furthermore, neural twins are end-to-end differentiable, allow backpropagation, and are fast inferencing. As the name implies, neural twins incorporate gate-level structural and functional information of the circuit to be tested to create approximations of the circuit to be tested that are more easily analyzed by computers, especially neural networks. Thus, a software simulator of a physical system is used to accelerate the analysis of the physical system.
The fault criticality assessment of structural faults using neural twins involves modeling of a neural twin for a particular circuit, modeling of neural-network-readable cell equivalents used for neural twins, training of the neural twin for evaluating functional fault criticality, and classifier training for the classifier used to classify fault criticality when evaluating fault criticality using a neural twin.
It should be understood that a structural fault is considered functionally critical if the structural fault leads to functional failure. For example, a functional failure can be evaluated in terms of the fault's impact on inferencing accuracy (for the inferencing use-case). A fault can be deemed to be benign if the fault does not affect the inferencing accuracy (for the inferencing use-case).
A method of fault criticality assessment using neural twins can include selecting sites at a neural twin of a particular netlist for perturbation of bias by computing a bias sensitivity of potential fault sites and selecting a number of fault sites as the selected sites according to the computed bias sensitivity; applying the perturbation of bias to each selected site via a corresponding bias value adder located in the neural twin; calculating a loss value for the neural twin corresponding to the application of the perturbation of bias for each selected site; and classifying, using a neural-twin-trained classifier, a particular site of the selected sites as critical or benign based on the loss value from perturbing the bias at that site. The resulting sites that are classified as critical can then be used in test generation and fault simulation software programs. Indeed, the sites classified as critical nodes can be used for applications of automatic test pattern generation (ATPG), a design for test application (e.g., for BIST), and test point insertion.
The described fault criticality assessment can be used in generating fault testing schemes for any application target. That is, a variety of circuits and their use-cases can be evaluated. These circuits can include any processing architecture (e.g., a systolic array of processing units) and associated deep learning application(s), including those used for training and inferencing. Examples include deep neural networks for image classification and segmentation (with applications to autonomous driving, manufacturing automation, and medical diagnostics as some examples), regression, voice recognition, and natural language processing.
The described fault criticality assessment can identify predicted critical nodes and these predicted critical nodes can be used in creating testing methodologies to determine if a particular instance of the circuit architecture can be used in a certain application, especially in the context of circuit architectures for neural networks.
By identifying the critical nodes, the testing methodologies for fault testing can be applied to those nodes identified by the described fault criticality assessment. By determining where critical nodes exist with further knowledge of what terminal outputs are necessary, a testing methodology can be created to ensure that the particular instance of the circuit architecture can be used for that certain application as well as the extent that testing must be performed (or extent of infrastructure on a chip is needed to be added such as for BIST). Testing can be useful both before deployment and after deployment to ensure continued functionality.
Advantageously, fewer computational resources (and corresponding time and/or chip area) are required to carry out functional fault testing.
For example, referring to
In the illustrative example, Full Adder 112, which is a combinational gate with multiple outputs, is converted/mapped to multiple neural-network-readable cell equivalents that share the same inputs as the original gate but that each have a different single output. For example, consider a Full Adder (S,CO)=FA(A,B,C) with inputs A, B, and C, and outputs S (sum) and CO (carry-out). The Full Adder 112 is replaced by two vertices (126, 128) in the graph 120 and corresponding neural-network-readable cell equivalents in the neural twin 130: S=FAs(A,B,C) and CO=FAco(A,B,C). Here, A, B, and C are inputs shared by both FAs 126 and FAco 128; S and CO are the outputs of FAs and FAco, respectively. FAs 126 and FAco 128 are replaced with FA-S-NET 136 and FA-CO-NET 138 in the neural twin 130.
Combinational logic of a two-input OR gate, a two-input XOR gate, two-input NOR gate, and a two-input XNOR gate are also replaced by the neural-network-readable cell equivalents (vertices labeled in
Since the neural twin 130 is intended to model the combinational logic (e.g., combinational logic 102 and combinational logic 106), the flip-flops in the pipelining 104 are replaced with buffers and inverters during the netlist-to-neural twin conversion. In the illustrative embodiment, let (Q,QN)=DFF(D,clk) indicate a flop with inputs D and clk (clock), and outputs Q and QN. The flop DFF is replaced by a buffer Q=BUF(D) and an inverter QN=INV(D). Here, D is tied to the inputs of both BUF and INV; Q and QN are the outputs of BUF and INV, respectively. In this way, it can be ensured that no sequential elements are present in the neural twin 130. In the illustrative example, it can be seen that each flip-flop is replaced with both a buffer and an inverter when both of those outputs connect to another cell; however, if only the Q output is used to connect to another cell, the flip-flop is replaced by only a buffer (“BUF”). Similarly, if only the QN output is used to connect to another cell, the flip-flop would be replaced by just an inverter (“INV”).
The neural-network-readable cell equivalents used to replace the various gates and other components of the circuit 100 may be obtained from a library of neural-network-readable cell equivalents of standard cells. The library may be associated with the program performing the conversion. An example process of creating/modeling neural-network-readable cell equivalents (with a corresponding example implementation of a neural network representation) is described with respect to
Turning to
The neural twin 130 network architecture is based on the topology of the graph 120 such that there exists a one-to-one physical correspondence between each wire (or fault site) in the netlist of circuit 100 and a neural connection in the neural twin 130 network. A bias, applied at the bias value adders 145, is associated with every neural connection between two neural-network-readable cell equivalents.
Bias value adders 145 are used to modify electrical characteristics at a particular point and the bias applied at a bias value adder 145 represents a perturbation in the signal being propagated along a neural connection. As shown by the legend 150, the output signal Zk of a neural-network-readable cell equivalent k is summed at bias value adder 145-K with a corresponding bias, biask. Since the network is modeled to have Boolean logic functional behavior, an activation function is used to constrain the summed value between 0 and 1, given as clp(zk +biask).
The fault criticality analysis using the neural twin 130 is described with respect to
Method 200 includes inserting (208) bias value adders at locations in the neural twin. Bias value adders can be inserted, for example, at an output of each of the neural-network-readable cell equivalents. Bias values can be initialized by a training process of the neural twin network (e.g., with a fault-free dataflow). The resulting neural twin network can then be used for fault criticality assessment.
Indeed, the method 200 includes selecting (210) sites at the neural twin for perturbation of bias. Selecting (210) the sites at the neural twin for perturbation of bias can be accomplished by computing a bias sensitivity of potential fault sites and selecting a number of fault sites according to the computed bias sensitivity. For example, the output of every circuit element can be considered a potential fault site; and, once all the bias sensitivities are calculated, a predetermined number (or other criteria) of fault sites having the largest bias sensitivities can be selected as the sites for the perturbation of bias.
In an example implementation, a misclassification-driven training (MDT) is used to calculate bias sensitivities and determine the sites to select. It should be understood that other methods, including various geometrical and statistical approaches, can be used to calculate bias sensitivities and determine the sites to select. For the example implementation described here, the MDT process can first begin with considering a fault-free functional dataflow through the circuit across inferencing cycles for a given application workload (e.g., images for an image classifier circuit). The neural twin network performs the fault-free functional dataflow and an approximation-loss value is computed, referred to herein as a fault-free loss value. The fault-free loss value represents the loss of the system when all biases are set to 0 (i.e., no stuck-at-one or stuck-at-zero faults perturbed in the system). The system performing the MDT can receive the fault-free loss value generated for the fault-free neural twin. During processes performed with respect to the fault-free neural twin, bias values (biask of
The following pseudocode illustrates an example procedure for obtaining the biask values for a processing element circuit, where ϕntPE refers to the mathematical representation of the neural twin of a particular processing element circuit (see e.g., is the dataset of a given application workload, bs is batch size for the dataset,
ps is the approximation-loss value, MSE refers to the mean-squared-error function, yr,c,ips is the floating-pint partial-sum output of the processing element (given as PE(r,c)) in the i-th inferencing cycle.
, bs
ps ← 0;
do
ps ← ps + MSE (ŷr,c,jps, yr,c,jps), c ← c + 1;
ps ← 0;
ps > 0 then
Accordingly, the system performing the MDT can also receive the bias values biask for each site. For each site in the neural twin network (e.g., corresponding to each bias adder), a bias sensitivity can be calculated by taking a gradient of the loss value with respect to the corresponding bias. As mentioned above, this bias sensitivity is used to select (210) the sites at the neural twin for perturbation of bias. For example, every site having a bias sensitivity calculated is a potential fault site and the potential fault sites can be ranked by an absolute value of the gradient corresponding to each of the potential fault sites. After ranking, one or more of the potential fault sites can be selected as the selected sites to test based on a criteria (singular or plural) with respect to the ranked potential fault sites. Potential criteria that may be used include, but are not limited to, a certain number (e.g., predetermined) of potential fault sites (selected in ranked order from largest gradients/bias sensitivity to smallest gradients/bias sensitivity), a certain percentage of potential fault sites (e.g., indicating the certain number selected in ranked order), and all potential fault sites with a bias sensitivity above a certain value/threshold.
For each selected site (e.g., of the potential fault sites), a perturbation of bias is applied (212). The perturbation of bias is applied in a manner to maximize the loss value. In one implementation, the bias applied at a particular site is based on the sign of the bias sensitivity computed for the site. For example, the bias is applied according to the sign of the bias sensitivity. That is, to maximize the loss value for a perturbed bias, the sign indicates whether the stuck at one fault or the stuck at zero fault would result in a larger loss value. Accordingly, applying (212) the perturbation of bias to each selected site comprises, for each selected site, determining a sign of a bias sensitivity computed for that selected site; if the sign is determined to be positive, applying a first bias corresponding to a stuck-at-one fault; and if the sign is determined to be negative, applying a second bias corresponding to a stuck-at-zero fault. By using the sign of bias sensitivity to determine the fault type for injection and simulation, it is possible to identify more critical fault locations using fewer fault simulation runs compared to a scenario where one injects both stuck-at fault types to determine which type results in a critical functional fault. For example, the sign of the bias sensitivity for each site can be determined; and when the sign is negative, the bias is set to −1 and when the sign is positive, the bias is set to +1. As explained with respect to
Using the bias configured as described above (e.g., for a stuck at one or a stuck at zero fault), the method includes calculating (214) a loss value for the neural twin corresponding to the application of the perturbation of bias for each selected site. Based on the loss value, a selected site can be classified (216) as either critical or benign using a neural-twin-trained classifier. The neural-twin-trained classifier can be, for example, a decision tree (DT). Training of a decision tree implementation can be performed with methods as described in
In some cases, the neural twin can be further refined during operation. That is, the bias values and the loss values can be updated not just as part of the original training of the neural twin, but also as a result of or after the selection process of potential fault sites.
Once ground-truth is established, a neural network representation (e.g., the particular neural twin) can be trained on the labeled ground-truth data set, including the set 406 of data that notates sites that contain benign faults and the set 408 of data that notates sites that contain critical faults. The particular neural twin can have faults perturbed at one or more known fault sites, and sites from the set 406 of data that notates sites that contain benign faults can have faults perturbed (410) to establish (412) a maximum loss associated with benign faults or an upper limit for fault tolerance that is considered benign. Likewise, sites from the set 408 of data that notates sites that contain critical faults can have biases perturbed (414) to establish (416) a minimum loss associated with critical faults, or a lower limit of fault tolerance that is considered critical. For a given fault site, both a stuck-at-zero fault and a stuck-at-one fault is injected. Inferencing by the neural twin is used to obtain stuck-at-zero loss values across a workload and stuck-at-one loss values across the workload. For a stuck-at-zero fault, a −1 can be input to the bias value adder at a particular site. Similarly, for a stuck-at-one fault, a 1 can be input to the bias value adder at a particular site.
Biases can be perturbed in the particular neural twin using the bias value adders as described in
Mathematically, this process can be described as follows. The error values are calculated for stuck-at-zero (s-a-0) and stuck-at-one (s-a-1) faults injected at location/in the neural twin (i.e., biasl).
The error eF for fault F ∈ {s-a-0, s-a-1} is calculated using:
The larger of the s-a-0 and s-a-1 errors is recorded as the input feature of the decision tree-based criticality classifier. The corresponding label is the ground-truth criticality c of the location l, where c=+1(−1) indicates the fault location l is critical (benign). The training procedure of DT attempts to obtain decision thresholds that can appropriately classify the training data set. A fault is classified using a decision tree based on different decision thresholds and eventually mapped to a leaf node in the tree which indicates the fault criticality (benign or critical). Multiple decision thresholds are required by the decision tree model in order to learn the non-linear class boundary separating the benign and critical faults. In the simulated example, the decision tree model was implemented as a classification and regression tree (CART).
The pseudocode is given as follows, where DT refers to decision tree,
, criticality ground truth (l, c) €
and bias ;
In the example implementation of
ϕcell(zin)=σsig(A3σsig(A2σsig(A1zin+b1)+b2)+b3)
where zin denotes the binary input vector of size rz; A1, A2, and A3 are affine transformations of size 4×rz, 4×4, and 1×4 respectively; b1, b2, and b3 are bias vectors of the multi-layer perceptron network. The Sigmoid activation function present at the output of each neuron in this example implementation of a neural-network-readable cell equivalent is denoted by σsig(⋅). The sigmoid activation function (σsig) constrains the output to be a value from 0 to 1.
The input vector zin is composed of the binary inputs to the standard cell. When z1 and z2 are binary inputs to a standard cell, zin=[z1, z2]. The size of the input vector, rz, is equal to the number of input ports of the standard cell being modeled. For example, when the cell being modeled is an AND2 gate 500, as shown in
Turning to
A neural-network-readable cell equivalent can be trained by any suitable method using a set of Boolean inputs as features and a single Boolean output as a label. The training can also include noisy data—random noise can be added. For example, as shown in
For example, system 700 includes a processor 705 (e.g., CPU, GPU, FPGA) that processes data according to instructions of various software programs, including software instructions 710 for performing fault criticality assessment using neural twins as described herein, stored in memory 715.
Memory 715 can be one or more of any suitable computer-readable storage medium including, but not limited to, volatile memory such as random-access memories (RAM, DRAM, SRAM); non-volatile memory such as flash memory, various read-only-memories (ROM, PROM, EPROM, EEPROM), phase change memory, magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM), and magnetic and optical storage devices (hard drives, magnetic tape, CDs, DVDs). As used herein, in no case does the memory 715 consist of transitory propagating signals.
As mentioned above, memory 715 can store instructions 710 for performing fault criticality assessment using neural twins as described herein. Instructions 710 may include instructions for method 200 described with respect to
System 700 may also include a radio/network interface 725 that performs the function of transmitting and receiving radio frequency communications. The radio/network interface 725 facilitates wireless connectivity between system 700 and the “outside world,” via a communications carrier or service provider. The radio/network interface 725 allows system 700 to communicate with other computing devices, including server computing devices and other client devices, over a network.
In various implementations, data/information used by and/or stored via the system 700 may include data caches stored locally on the device or the data may be stored on any number of storage media that may be accessed by the device via the radio/network interface 725 or via a wired connection between the device and a separate computing device associated with the device, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated such data/information may be accessed through the device via the radio/network interface 725 or a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
System 700 can also include user interface system 730, which may include input and output devices and/or interfaces such as for audio, video, touch, mouse, and keyboard. Visual output can be provided via a display that may present graphical user interface (“GUI”) elements, text, images, video, notifications, virtual buttons, virtual keyboards, circuit layout, and any other information that is capable of being presented in a visual form.
System 700 can further include a neural network module 735. The neural network module 735 can include a dedicated processor and memory or use the processor 705 and memory 715. The memory of the neural network module can include code for execution of training methods as well as weights and models used by the neural network. The code for execution of training methods as well as weights and models used by the neural network can also be stored in the memory 715 for the rest of the system 700.
Accordingly, embodiments of the subject invention may be implemented as a computer process, a computing system, or as an article of manufacture, such as a computer program product or computer-readable storage medium. Certain embodiments of the invention contemplate the use of a machine in the form of a computer system within which a set of instructions, when executed, can cause the system to perform any one or more of the methodologies discussed above, including providing a software tool or a set of software tools that can be used during the physical design and test pattern generation of integrated circuits and/or printed circuit boards and/or system level design. The set of instructions for the software tool can be stored on a computer program product, which may be one or more computer readable storage media readable by a computer system and encoding a computer program including the set of instructions and other data associated with the software tool.
By way of example, and not limitation, computer-readable storage media may include volatile and non-volatile memory, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Examples of computer-readable storage media include volatile memory such as random-access memories (RAM, DRAM, SRAM); non-volatile memory such as flash memory, various read-only-memories (ROM, PROM, EPROM, EEPROM), phase change memory, magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM), and magnetic and optical storage devices (hard drives, magnetic tape, CDs, DVDs). As used herein, in no case does the term “storage media” consist of transitory propagating signals.
Although the subject matter has been described in language specific to structural features and/or acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as examples of implementing the claims and other equivalent features and acts are intended to be within the scope of the claims.