The general field is artificial intelligence methods using deep learning. More specifically, the embodiments address the ability to explain DNN model decisions, which are particularly relevant to decision-making in key areas such as imaging and detection.
Artificial Intelligence (AI) methods can achieve incredible performance when learning to solve increasingly complex computational problems in many areas impacting human society. The current and future AI-powered systems are so sophisticated that many require almost no human intervention on how these systems arrive at their decisions, e.g., Deep Learning (DL) models such as Deep Neural Networks (DNN) comprised of many layers and parameters. How these systems arrive at their decision is becoming less transparent or more black-box like which becomes a barrier to the usefulness and acceptance of AI systems. This is especially true when the binary computed decisions directly impact human lives, such as in medicine, law, defense/security, transportation, and banking.
State-of-the-art Convolutional Neural Network (CNN) models provide a gold standard for AI-systems in performing computer vision tasks, such as image classification, object detection, semantic segmentation, and instance segmentation. By way of example, CNN neural net activation map-based techniques have been research alongside other attribution-based network approaches to enhance the resilience, interpretability, and adversarial attack defense in the underlying deep learning decision making process, e.g., deep k-nearest neighbors. For a detailed description of various historical approaches to explaining AI, including activation map-based techniques, see the article by Arrieta et al., Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Information Fusion 58 (2020) pages 82-118, which is incorporated herein by reference for all of its teachings. However, such research provides a proof of concept and does not address the real-world application of such capability.
Accordingly, before we can practically apply AI methods, there is a need for understanding how such decisions are made. The ability to explain DNN model decisions, which are particularly relevant to decision-making in key areas such as imaging and detection, is needed.
In a first non-limiting embodiment, a process for explaining deep neural network (DNN) decisions includes: attaching an activation map analysis (AMA) system to the DNN for employing a calibration process and an inferencing process, wherein the calibration process includes: feeding the DNN a calibration dataset (C), extracting, by an extraction module of the AMA system, baseline activation maps at each layer of the DNN for the calibration dataset, wherein the baseline activation maps are stored as activation map vectors, reducing dimensionality of the activation map vectors using a dimensionality reduction process, calculating a calibration non-conformity for each element of the calibration dataset to determine a calibration non-conformity dataset (A); and the inferencing process includes: feeding the DNN a query input (z) of dataset (Q), extracting, by an extraction module of the AMA system, query activation maps at each layer of the DNN for the query dataset, wherein the query activation maps are stored as activation map vectors, calculating a query non-conformity for each element of the query dataset to determine a query non-conformity dataset (B); and calculating by a specified-metric module a fraction value of the calibration non-conformity dataset that have higher non-conformity than a threshold established by the query non-conformity dataset, wherein the fraction value is used to determine at least one of a credibility score CS(z), a prediction label PL(z) and confidence CF(z) in one or more of the DNN decisions.
In a second non-limiting embodiment, a process for determining DNN decision credibility includes: attaching an activation map analysis (AMA) system to the DNN for employing a calibration process and an inferencing process, wherein the calibration process includes: feeding the DNN a calibration dataset (C), extracting, by an extraction module of the AMA system, baseline activation maps of at least one layer of the DNN for the calibration dataset, encoding the baseline activation maps using one or more binary HDC component vectors to generate at least one calibration hypervector, calculating a calibration non-conformity for each element of the calibration dataset, including the at least one calibration hypervector, to determine a calibration non-conformity dataset (A); and the inferencing process includes: feeding the DNN a query input (z) of dataset (Q), extracting, by an extraction module of the AMA system, query activation maps of at least one layer of the DNN for the query dataset, encoding the query activation maps using one or more binary HDC component vectors to generate at least one query hypervector, calculating a query non-conformity for each element of the query dataset, including the at least one query hypervector, to determine a query non-conformity dataset (B); and calculating by a specified-metric module a fraction value of the calibration non-conformity dataset that have higher non-conformity than a threshold established by the query non-conformity dataset, wherein the fraction value is used to determine a credibility score CS(z) for the DNN decision.
In a third non-limiting embodiment, a system for explaining deep neural network (DNN) decisions includes: a host network deploying a deep neural network (DNN); and an activation map analysis toolkit including an extraction module, a dimension reduction module and a non-conformity module; wherein the activation map analysis toolkit communicates with the host network to output use-case specific metrics which explain DNN decisions.
Example embodiments will become more fully understood from the detailed description given herein below and the accompanying drawings, wherein like elements are represented by like reference characters, which are given by way of illustration only and thus are not limitative of the example embodiments herein.
Activation Map Analysis (AMA) provides a system of multiple components to mature the neural net activation map-based analysis for solving the real-world scaling problems in explainable artificial intelligence (XAI). The AMA system attaches to an existing deep neural network (host network) as an observer that analyzes the internal activities of the host network to provide additional information, e.g., metrics for resilience, interpretability, and adversarial defense, to the end-user. A first baseline embodiment of the AMA system for resilience is developed using deep k-nearest neighbor (DkNN) coupled with principal component analysis (PCA) for data dimensionality reduction. The DkNN algorithm is described in Papernot et al., Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning, arXiv:1803.04765v1 [cs.LG] 13 Mar. 2018 (hereafter “Papernot”), which is incorporated herein by reference in its entirety. The second embodiment of an improved AMA system described herein uses Hyperdimensional Computing (HDC) to encode activation map data of the host for data reduction, which replaces the PCA component in the first embodiment.
Originally described in Kanerva, P., “Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors.” Cognitive Computation, vol. 1, no. 2, June 2009, HDC is a computational paradigm focused on mimicking the human brain's robustness, randomness, and holistic representation of information using high dimensional vectors (e.g., 10,000 elements) as stores of information, known as hyperdimensional vectors or “hypervectors”. In AI systems, HDC encodes input data into hypervectors and performs all core computations in-memory. It retains previous information in the memory, and processes new information based on the data and scenarios stored in the memory. It mimics human brain activity by directly searching for similar neural activity patterns previously encountered, i.e., the nearest hypervectors in the memory closely associated with the new input data. Although it may seem counterintuitive using hypervectors, HDC can provide the benefits of a fast, lightweight, and efficient AI system for supervised learning tasks, especially on edge devices where there are restrictions on data storage and hardware resources. As the second embodiment of the AMA system, HDC replaces PCA and encodes the activation map data of the host network for data dimensionality reduction. With HDC, we can address two major weaknesses in using PCA within AMA that limits its scalability: 1) memory demand, and 2) online model updates. The PCA-based AMA requires a large PCA component matrix to be loaded into memory for AMA operations to take place, which limits its scalability. Furthermore, the PCA based AMA requires updating the PCA component matrix when the host network model is augmented with new classes, which limits its ability for online learning in edge applications. This application of HDC within AMA contrasts with the traditional application of HDC encoding; that is, HDC encoding is traditionally used to encode the input feature space of data, e.g., the raw pixel values of an image. In contrast, HDC within AMA is used to encode, for the purposes of data aggregation and dimensionality reduction, the complex feature space of the activation map data that is a result of passing the input image through the host network.
As described further herein, the following HDC encoding schemes have been developed: 1) each activation map in a single layer of the host network is encoded as a hypervector, 2) the hypervectors representing every activation map in a single layer of the host network are bundled into a single hypervector, and 3) all activation maps in all layers of the host network are bundled into a single hypervector, which we call a meta-hypervector.
Based on the characteristics of the host network activation maps layers, the developed HDC encoding schemes may either 1) directly apply HDC to encode the raw pixel values, positions, and the layer indices of the activation maps for those layers with low spatial information and high semantic information, or 2) pre-process the activation maps to produce global image descriptors (GID) and apply HDC to encode the GID for those layers with high spatial information and low semantic information.
Accordingly, the improved AMA system using HDC provides a robust encoding of the complex activation map feature space, distilling a hypervector to each layer of the host CNN network with lower memory and faster speed. Furthermore, the use of HDC within AMA maintains comparable performance as that from using PCA on the activation map data for dimensionality reduction (discussed further below). Additionally, HDC within AMA provides the conduit for additional detection and analytic processing, e.g., finding recurring anomalies for continuous learning and providing model interpretability as an input data sample progresses through the host network using hypervector representations.
In accordance with the embodiments herein,
The systems and process are implemented on one or more processors, i.e., an Intel 8 core CPU, 256 GB RAM, and an NVIDIA Tesla T4 GPU with 16 GB VRAM. The AMA modules are implemented in Python using various libraries, i.e., NumPy, Scikit-Learn, Scikit-Image, and OpenCV, the host networks may be implemented in PyTorch or TensorFlow/Keras, and the HDC encoding schemes are implemented using CuPy, a GPU/CUDA implementation of NumPy.
In a first embodiment of AMA shown in
As each activation map vector in the NBAM normally has very large dimensions, data dimension reduction techniques are invoked to reduce the dimensionality of vectors in NBAM. In this first embodiment, Principal Component Analysis (PCA) is applied as a data reduction technique, 34. After applying the PCA to the NBAM 34, the components of the NBAM are obtained as P and they are stored in the files on hard drive storage 35. To reduce the dimension for NBAM, each activation map vector in NBAM is projected 36 to the space spanned by the first n principal components of NBAM 36, which forms the PCA representation for NBAM. The principal components of NBAM are stored in the files on hard drive storage 37.
Next, calculate the baseline non-conformity, which represents how well each element in C (or training set Γ) conforms with the baseline activation maps 33. This is a multistep process. The calibration dataset 38 in fed into the host CNN 31. The activation map extraction module 32 produces the calibration activation maps 39 (CLAM). With the CLAM 39 and PCA components 35 as inputs, the projection module 40 produces the principal components of CLAM 41. Next, using similarity hashing, find the K-nearest neighbor 42 of each element of CLAM 41 in the NBAM principal components 37 for each element of C, x: Γ={K−nearest neighbors of x in the normal baseline}.
From the k-nearest neighbors found in 37 for each element of CLAM 41 which are stored in memory 43, create the label set 44 for x (a multi-set): Ωλ={Yi:i ∈ Γ}, where λ is the layer index, and Yi is the class label for the i-th nearest neighbor. From the label sets stored in memory 45, calculate the calibration non-conformity 46: α(x,y) Σλ∈1 . . . l|i ∈Ωλ: i≠y| for each element of C, and the non-conformity set of calibration dataset is A={α(x,y): (x,y)∈C}. The calibration non-conformity score set A 47 is stored in the files on a hard drive.
The non-conformity set A serves as reference non-conformity pattern that the non-conformity of any other query input will be compared against to determine whether the query input is an in-distribution or out-distribution sample. If the query sample is out-distribution, then the host CNN output may not be trustworthy. Having built the foundations for deciding the trustworthiness of a CNN model prediction, we can proceed to the inferencing phase (
In the inferencing phase (
Next, find the K-nearest neighbors 42 for each element of principal components of QRAM 51 in the NBAM principal components 37 for each element of C, x: Γ={K−nearest neighbors of x in the normal baseline}. The k-nearest neighbors found in 37 for each element of QRAM 51 are in the memory 52. Create the label set 53 for each x (a multi-set): Ωλ={Yi:i ∈ Γ}, where λ is the layer index, and Yi is the class label for the i-th nearest neighbor. The label sets Ωλ are in memory 54. Calculate the query non-conformity 55: B={a(z,j)=Σλ∈1 . . . |i ∈Ωλ:i≠y|:j ∈ 1 . . . n} for each element of Q 48. The query non-conformity score dataset B 56 is in memory.
Once the non-conformity score of the QRAM 56 is available, and the non-conformity set of calibration set 47 A={a(x,y): (x,y) ∈Q} is retrieved, the module of user specified metrics 57 calculates the fraction of the A 58 that have higher non-conformity values than the threshold, a(z,j) E B, for each class j,
It can be observed from the above equation that the larger a(z,j) is, the smaller non-conformity Pj (z). In other words, higher non-conformity of the query sample leads to lower Pj (z). Based on 58, the module of user specified metrics 59 calculates the credibility score (2), class label prediction (3), and prediction confidence (4), and the results are stored in memory 60. The credibility score, CS(z), is the trustworthiness of the prediction. The prediction label (PL(z)), confidence (CF(z)), and the credibility scores (CS(z)) are expressed as [Papernot et al., Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning, arXiv: 1803.04765v1]:
The data reduction part of the first embodiment of AMA requires the calculation of the PCA components of the covariance matrix of NBAM, which has very large memory requirements and has limited utility when applied to real-world host CNN network.
In the second embodiment of the AMA, the data reduction technique PCA is replaced by HDC encoding. Activation maps are encoded by HDC methods as hypervectors to reduce the data dimensionality. As in the first embodiment, there are two phases: calibration (
In the calibration phase (
As each activation map vector in the NBAM normally has very large dimensions, data dimension techniques are invoked to reduce the dimensionality of vectors in NBAM. In this embodiment, hypervectors representing the activation maps are calculated by Hyperdimensional Computing (HDC) 61. To encode NBAM, a set of HDC component vectors 63 are pre-computed and stored in the files on the hard drives. The set of HDC component vectors 63 are binary hypervectors that represent the different attributes of an activation map, i.e., pixel value, pixel position, and layer position within the host network. These are generated by randomly sampling their vector elements from a Bernoulli distribution with probability p=0.5. The NBAM are encoded with the HDC component vectors as its basis. The resulting vector HV_NBAM, is stored in the files on hard drives 62.
There are two approaches for encoding activation maps of different layers. One approach directly performs HDC encoding on the activation map pixel values, positions, and layer indices of activation maps without any preprocessing. A second approach is to first perform some pre-processing calculations on the activation maps and produce some feature vectors. The HDC is then performed on the feature vectors resulting from the pre-processing, which is explained in more detail later.
Next, calculate the baseline non-conformity, which represents how well each element in C conforms with the NBAM. Start with feeding the calibration dataset 38 into the host network 31. The activation map extraction module 32 produces the calibration activation maps 39 (CLAM). With the calibration activation maps 39 and HDC components 63 as inputs, the HDC encoding module 64 produces hypervector for CLAM, HV_CLAM 65.
Use Hamming distance to find the K-nearest neighbors 66 of each element of HV_CLAM 65 in the HV_NBAM 62 for each element of C, x: Γ={K−nearest neighbors of x in the normal baseline}. The k-nearest neighbors of each element of NV_CLAM 65 are stored in the memory 67.
Create the label set 68 for x (a multi-set): Ωλ={Yi: i ∈Γ}, where is the layer index, and Yi is the class label for the i-th nearest neighbor. The label sets are stored in memory 69. Calculate the calibration non-conformity of each HV_CLAM 70: α(x,y)=Σλ∈1 . . . |i∈Ωλ:i≠y| for each element of C. The non-conformity set of calibration set is A={a(x,y): (x,y) ∈ C}, and is stored in the files on a hard drive 71.
In the inferencing phase (
Next, the QRAM is then HDC encoded 64 as HV_QRAM 74. Using Hamming distance, find the K-nearest neighbors 66 of each element of HV_QRAM 74 in the HV_NBAM 62 for each element of C, x: Γ={K−nearest neighbors of x in the normal baseline}. The k-nearest neighbors of each element are stored in the memory 75. Label set 76 is created for each x (a multi-set): Ωλ={Yi: i ∈ Γ}, where λ is the layer index. The label sets are stored in memory 77. The non-conformity 78: B={a(z,j)=Σλ∈1 . . . l|i∈Ωλ:i≠y|: j ∈ 1 . . . n} for each element of Q is calculated. The non-conformity score set B is stored in memory 79.
Once the conformity scores of the HV_QRAM 79 are available, and the non-conformity set of calibration, A={a(x,y): (x,y) ∈Q} 71 is retrieved, the AMA with HDC encoding follows the first embodiment to calculate the fraction of the A that have higher non-conformity values than the threshold (equation 1), a(z,j) ∈ B, for each class j, which results in 81. Based on 81, the module of user specified metrics eq. 2 59 calculates the credibility score, class label prediction, and prediction confidence, and the results are in memory 82.
There are three configurations of the AMA credibility calculation implemented using HDC: single-layer credibility—calculate the credibility using the hypervectors from a single selected layer; multi-layer credibility—calculate the credibility using the hypervectors from a set of selected layers; and meta-layer credibility—calculate the credibility using the meta-hypervectors, which represents the activation maps of all layers of the host network.
The use of HDC within AMA maintains comparable performance as that from using PCA on the activation map data for dimensionality reduction. To illustrate this, we compare both versions of AMA (PCA vs. HDC) where the host network is a ResNet-101 CNN trained only on 13 classes of the ImageNet dataset. By way of example only, credibility calculations were determined using the following exemplary layers and activation map feature data:
a. Residual Block 1 (Layer F5)— 256×56×56 size=˜800 k features
b. Residual Block 2 (Layer F6)— 512×28×28 size=˜400 k features
c. Residual Block 3 (Layer F7)— 1024×14×14 size=˜200 k features
d. Residual Block 4 (Layer F8)— 2048×7×7 size=˜100 k features
e. Final Hidden Layer (Layer F9)—2048 features
With reference to the process of
Table 1 shows the storage size of the HDC components, and the encoding times required at each layer of the host network. The calculated time from HDC encoding to credibility calculation is approximately 150 seconds. Table 2 shows the memory and time complexity between the PCA and HDC versions of AMA. This illustrates the computational complexity required to complete the feature dimensionality reduction 14 module for the 3900 images in the calibration set storage 18. A significant memory reduction is achieved with the HDC version of AMA.
Further to the exemplary embodiment of
The Vector of Locally Aggregated Descriptors (VLAD) operation descriptions are shown in
for each image imgp in calibration set over all p
for each layer layerp,q in the host net over all q:
for each activation map amp,q,r in the layer over all r:
find all keypoints kpp,q,r,s over all s
for each keypoint kpp,q,r,s in the activation map over all s:
find the corresponding descriptor ldptp,q,r,s (5)
In the more detailed operations of
Repeat above operations of finding local descriptor over all calibration images {imgp: imgp ∈ X} to form the set of all descriptors defined in Eq. (5), LDS={LD1, LD2, . . . , LDp, . . . , LDM} 85, where M is the number of images in the calibration set, and LDp={LDPTp,q,r,s, ∀q, r, s}. By applying K-means clustering 86 to the set of LDS, K cluster centroids {c1, c2, . . . , cK} (words) are produced and is stored on the hard drive 87, which form the codebook (CB). The codebook (CB) is retrieved and used in the next phase, calibration/inferencing.
In the inferencing phase, the input amp,q,r is a grayscale activation map 88. Apply the local descriptor algorithm chosen in the training phase 84 to the input amp,q,r to find the keypoints and descriptors of the activation map LDPTp,q,r={{ldptp,q,r,1, ldptp,q,r,2, . . . , ldptp,q,r,s, . . . , ldptp,q,r,n
For simplicity, without the loss of generality, ldptp,q,r,s is replaced by ldptw, and S={(ldptw,cj), ∀w, j ∈ {1, 2, . . . , K} }. Next, calculate the residual of each pair in S 92,
r
w,j=
ldpt
w
−c
j (6)
and place in memory 93. These residuals are then used to calculate 94 the accumulation of the residuals (ALDVR) 95 as follows:
There are K such accumulations of residual vectors, each of which is d-dimension. These K vectors are concatenated 96, as follows,
v=[v1,v2, . . . ,vj,vK]T (8)
to form a Kd-dimension vector v 97.
Next, normalize the concatenated vector of (8) according to Eq. (9) and (10) as follows:
to produce the normalized vector vn 99 which is the VLAD of the input activation map amp,q,r,s.
The embodiments described and claimed herein are not to be limited in scope by the specific examples herein disclosed since these examples are intended as illustrations of several aspects of the embodiments. Any equivalent examples are intended to be within the scope of the embodiments. Indeed, various modifications of the embodiments in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. All references including patents, patent applications and publications cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
The present application claims benefit of priority to U.S. Provisional Patent Application No. 63/254,758, entitled “SYSTEM AND METHOD FOR HYPERDIMENSIONAL COMPUTING (HDC) FOR ACTIVATION MAP ANALYSIS (AMA),” filed Oct. 12, 2021, the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63254758 | Oct 2021 | US |