This application is the national phase entry of International Application No. PCT/CN2024/090949, filed on Apr. 30, 2024, which is based upon and claims priority to Chinese Patent Application No. 202310922777.2, filed on Jul. 26, 2023, the entire contents of which are incorporated herein by reference.
The present invention belongs to the technical field of computer vision and interpretability, and specifically relates to a method and system for quantifying semantic variance between neural network representations.
In recent years, although deep neural networks have achieved significant effects in various types of tasks, they are still mysterious “black boxes”. Visualizing semantic concepts encoded in intermediate layers of neural networks is a natural way to interpret the neural networks. Activation maximization is to find a representative feature that can maximize the activation of a neuron, channel, or layer. People can use visualization results to roughly decode the semantic concepts contained in the intermediate layers of the neural networks. For example, low-level filters usually encode basic concepts such as edges and textures, while high-level filters often encode advanced objects and patterns. In addition to visualization, some researchers are dedicated to exploring the relationships between intermediate layer filters and semantic concepts. Reference “David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantifying interpretability of deep visual representations. In Proc. CVPR, 2017” collects a novel dataset BRODEN, provides pixel-level labels for visual concepts, and discovers alignment between individual filters and specific concepts. Reference “Ruth Fong and Andrea Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proc. CVPR, 2018.” demonstrates how a DNN uses a plurality of filters to represent specific semantic concepts, and researches embedding of concepts through a combination of the plurality of filters. However, most existing research only visualizes or statistically distributes the semantic concepts contained in neural network representations, that is, collects statistics on distribution forms such as the number of filters or an IoU value corresponding to each semantic concept in representations, which cannot measure semantic information variance between two neural network representations. When the internal mechanism of a neural network is analyzed, various variance between two neural network representations, such as semantic information variance, usually need to be analyzed. Therefore, it is urgent to solve the problem that the semantic information variance between two neural network representations cannot be measured.
In view of the problems existing in the prior art, the present invention provides a method and system for quantifying semantic variance between neural network representations, where two neural network representations to be compared are first extracted, weights of each filter in an intermediate layer corresponding to each semantic concept is learned on a reference dataset using Net2Vec method, then set Intersection over Union (IoU) of each representation for all semantic concepts in the reference dataset are calculated, and finally variance between the set IoU of the two representations for all the semantic concepts are integrated to obtain semantic variance between the two neural network representations. The method solves the problem of lack of accurate measurement on the variance between neural network representations on a semantic information level, and has an accurate measurement effect.
In order to achieve the above objectives, the technical solution adopted by the present invention is: a method for quantifying semantic variance between neural network representations, comprising the following steps:
As an improvement of the present invention, the Net2Vec method in step S2 is specifically as follows:
As an improvement of the present invention, in step S3, for the set IoU of the concept c:
As an improvement of the present invention, in step S4, the variance between the set IoU of the representations of the two neural networks for all semantic concepts are integrated as follows:
As an improvement of the present invention, the semantic variance of common semantic concepts between representations R1 and R2 of the two neural networks is calculated according to the following equation to obtain the semantic variance of R2 relative to R1:
As a further improvement of the present invention, the value of λ is 2.
In order to achieve the above objectives, the technical solution adopted by the present invention is: a system for quantifying semantic variance between neural network representations, comprising a computer program, characterized in that the steps of any above method are implemented when the computer program is executed by a processor.
Compared with the prior art, the present invention has the following technical advantages and technical effects: The present invention designs a method and system for quantifying semantic variance between neural network representations, which calculate set IoU of the representations corresponding to each semantic concept and integrate variances between the set IoU of the representations for all semantic concepts to obtain the semantic variance between two representations. The present invention solves the problem that semantic information variance between neural network representations cannot be measured, and provides a method for accurately measuring the variance between neural network representations on a semantic information level in the field of interpretability. The Net2Vec method is used to learn a weight of each filter in an intermediate layer corresponding to each semantic concept on a reference dataset, and different weights are assigned in consideration of learning difficulties in common and non-common semantic concepts between two representations, so the measurement results are accurate and the measurement conforms to experimental experience. The present invention can be used for comparing the semantic variance between representations in different intermediate layers of neural networks, thereby analyzing internal mechanisms of the neural networks semantically. The present invention can also be used for measuring the semantic variance between different modal representations in a form of activation maps, such as RGB modality, depth modality, and infrared modality, thereby providing a semantic measurement method for analyzing the variance between different modal representations of multi-modal neural networks.
FIGURE is a flowchart of method steps for quantifying semantic variance between neural network representations according to the present invention.
The present invention will be further illustrated below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are only used to illustrate the present invention and not to limit the scope of the present invention.
As shown in FIGURE, a method for quantifying semantic variance between neural network representations is implemented as follows:
Step S1. Extract representations of two neural networks (such as ResNet50) to be compared: Predict, on a reference dataset (such as a BRODEN dataset), the neural networks required for feature extraction, and in the prediction process, retain intermediate layer output of the neural networks when predicting each sample.
Step S2. Learn the weight of each filter in an intermediate layer corresponding to each semantic concept on the reference dataset using the Net2Vec method:
First, the weight to be learned is represented by w∈RK, where K is the total number of filters in the intermediate layer, and a predicted segmentation mask M(x; w) is calculated using a Sigmoid function σ according to the following equation:
Then, the weight w for the concept c is learned by minimizing the binary cross entropy loss function of the following equation:
Then, on the reference dataset, 30epoch is trained to obtain the weight corresponding to each semantic concept using an SGD optimizer with a learning rate set to 0.0001, a momentum set to 0.9, and a batch size set to 64.
Step S3. Calculate a set IoU of each representation corresponding to each semantic concept: first learn the weight of each filter in the intermediate layer corresponding to each semantic concept using the Net2Vec method, then linearly superimpose, using the learned weights, activation values of the filters in the intermediate layer output retained in the previous step to obtain a total activation value corresponding to each semantic concept, binarize the total activation value to obtain a mask of each sample corresponding to each semantic concept, and finally calculate the set IoU of each representation corresponding to each semantic concept;
Specific steps are as follows:
The set IoU for the concept c is calculated according to the following equation:
For the weight w of the concept c, this equation calculates an IoU (Jaccard coefficient) between the segmentation mask M after binarization of the total activation value obtained by linearly superimposing the activation values of the filters, and the ground truth segmentation mask Lc;
The corresponding set IoU is calculated for each semantic concept.
Step S4. Integrate variance between the set IoU of two representations for all semantic concepts;
The step of integrating variance between the set IoU of two representations for all semantic concepts is as follows:
First, R1 and R2 represent representations of two neural networks, and IoUset(cj; Ri)j=1-c represents a set IoU of the representation Ri corresponding to each semantic concept, where C is the total number of concepts. the semantic variance of common semantic concepts between the two representations is calculated according to the following equation to obtain the semantic variance of R2 relative to R1:
Then, the semantic variance of non-common semantic concepts between two representations is calculated according to the following equation to obtain the semantic variance of R2 relative to R1:
Finally, the semantic variance of all semantic concepts are integrated from the above two equations according to the following equation to calculate final semantic variance between the two neural network representations:
λ=2 is set to emphasize the semantic variance caused by non-common concepts between two representations, namely, newly added or disappearing semantic concepts of R2 relative to
is a pixel ratio of the concept cj in the entire reference dataset, and the semantic variance of each semantic concept are divided by this equation to eliminate semantic concept proportion deviations in the reference dataset.
When the semantic variance S·Var(R2; R1) is positive, it indicates that R2 has richer semantic information than R1, and vice versa.
It should be noted that the above content only explains the technical idea of the present invention and cannot limit the scope of protection of the present invention thereby. For those of ordinary skill in the art, many improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications fall within the scope of protection of the claims of the present invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202310922777.2 | Jul 2023 | CN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/CN2024/090949 | 4/30/2024 | WO |
| Publishing Document | Publishing Date | Country | Kind |
|---|---|---|---|
| WO2025/020619 | 1/30/2025 | WO | A |
| Number | Name | Date | Kind |
|---|---|---|---|
| 20230084910 | Wang et al. | Mar 2023 | A1 |
| Number | Date | Country |
|---|---|---|
| 110647992 | Jan 2020 | CN |
| 114549832 | May 2022 | CN |
| 114970820 | Aug 2022 | CN |
| 116992930 | Nov 2023 | CN |
| Entry |
|---|
| Fong et al, “Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks”, 2018, arXiv: 1801.03454v2 [cs.CV] (9 Pages) (Year: 2018). |
| David Bau, et al., Network Dissection: Quantifying Interpretability of Deep Visual Representations, CVF, pp. 6541-6549. |
| Ruth Fong, et al., Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks, CVF, pp. 8730-8738. |