ML predictors are widely used for processing or analyzing data, e.g. in classifying a sample of input data. To this end, the ML predictor may apply a trained model, which was previously learned on a set of training data, to the input data to obtain a result or decision, e.g., referred to as inference. As such, the inference of a ML predictor depends on the set of training data, and in particular, the ML predictor may be regarded as kind of a “black bock”, because it is not obvious, how a particular decision emerges from the input data, i.e., what the reasoning of the ML predictor relies on. However, understanding decisions of ML predictors may be important for the reliability and reliable applicability of such structures. For example, the analysis of a ML predictor may be used for correcting the underlying model to increase the reliability of the ML predictor, or for increasing the efficiency of the ML predictor, e.g. by pruning.
Therefore, a concept allowing for analyzing an inference of an ML predictor on a data structure is desirable, which gives information about the inference, which information may allow for increasing the reliability and/or the computational efficiency of the ML predictor.
An embodiment may have an apparatus, configured for assigning a relevance score to a predictor portion, PP, of a machine learning, ML, predictor for performing an inference on a data structure, the relevance score indicating a share with which propagation paths, which connect the PP with a first predetermined PP of the ML predictor, contribute to an activation of the first predetermined PP, which activation is associated with the inference performed by the ML predictor on the data structure, wherein the apparatus is configured for determining the relevance score for the PP by performing a reverse propagation of an initial relevance score, which is attributed to the first predetermined PP, along the propagation paths, and filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined PP of the ML predictor, differently than a second propagation path through the ML predictor, the second propagation path circumventing the second predetermined PP.
Another embodiment may have an apparatus, configured for assigning a relevance score to a portion of a data structure, the relevance score rating a relevance of the portion for an inference performed by a machine learning predictor on the data structure, wherein the apparatus is configured for determining the relevance score for the portion by performing a reverse propagation of an initial relevance score, which is attributed to a first predetermined predictor portion of the ML predictor, from the first predetermined PP through the ML predictor onto the portion of the data structure, filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined PP of the ML predictor, differently than a second propagation path through the ML predictor, the second propagation path circumventing the second predetermined PP.
Another embodiment may have an apparatus, configured for determining, for each out of a set of data structures, an affiliation score with respect to a concept associated with a predictor portion of a machine learning predictor by determining a relevance score for the PP with respect to an inference performed by the ML predictor on the respective data structure, wherein the relevance score indicates a contribution of the PP to an activation of a first predetermined PP of the ML predictor, which activation is associated with the inference performed by the ML predictor on the data structure, wherein the apparatus is configured for determining the relevance score by performing a reverse propagation of an initial relevance score from the first predetermined PP to the PP.
Another embodiment may have a method, comprising: assigning a relevance score to a portion of a data structure, the relevance score rating a relevance of the portion for an inference performed by a machine learning predictor on the data structure, wherein the method comprises determining the relevance score for the portion by performing a reverse propagation of an initial relevance score, which is attributed to a first predetermined predictor portion of the ML predictor, from the first predetermined PP through the ML predictor onto the portion of the data structure, filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined PP of the ML predictor, differently than a second propagation path through the ML predictor, the second propagation path circumventing the second predetermined PP.
Another embodiment may have a method, comprising: assigning a relevance score to a predictor portion of a ML predictor for performing an inference on a data structure, the relevance score indicating a share with which propagation paths, which connect the PP with a first predetermined PP of the ML predictor, contribute to an activation of the first predetermined PP, which activation is associated with the inference performed by the ML predictor on the data structure, wherein the method comprises determining the relevance score for the PP by performing a reverse propagation of an initial relevance score, which is attributed to the first predetermined PP, along the propagation paths, and filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined PP of the ML predictor, differently than a second propagation path through the ML predictor, the second propagation path circumventing the second predetermined PP.
Another embodiment may have a method, comprising: determining, for each out of a set of data structures, an affiliation score with respect to a concept associated with a predictor portion of a machine learning predictor by determining a relevance score for the PP with respect to an inference performed by the ML predictor on the respective data structure, wherein the relevance score indicates a contribution of the PP to an activation of a first predetermined PP of the ML predictor, which activation is associated with the inference performed by the ML predictor on the data structure, wherein the method comprises determining the relevance score by performing a reverse propagation of an initial relevance score from the first predetermined PP to the PP.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the methods according to the invention when said computer program is run by a computer.
Embodiments of a first aspect of the present invention rely on the idea to assign a relevance score to a portion of an ML predictor, or to a portion of a data structure serving as input to the ML predictor, the relevance score indicating a relevance of the portion with respect to a result of an inference performed by the ML predictor on the data structure, or indicating a relevance of the portion with respect to an activation of a first predetermined predictor portion of the ML predictor in an inference performed by the ML predictor on the data structure. To this end, the relevance score is determined by reversely propagating an initial relevance score from a first predetermined predictor portion of the ML predictor to the portion (of the ML predictor or the data structure), wherein the reverse propagation is filtered with respect to a second predetermined predictor portion, i.e., the reverse propagation differentiates between propagation paths passing or not passing through the second predetermined predictor portion. Doing so may reveal information about the second predetermined predictor portion, e.g. about a relevance the portion, for which the relevance score is determined, with respect to the second predetermined predictor portion. For example, such information may be used to analyze a concept represented by the second predetermined predictor portion. In this respect, the term “concept” may refer to characteristics of a content, to which a predictor portion is sensitive, i.e. in response to which it activates. The filtered reverse propagation allows for revealing, which portions of a data structure, or which upstream portions of the ML predictor are relevant for the activation of the first predetermined predictor portion. For example, in this respect, relevant sub-concepts of a concept attributed to the second predetermined predictor portions may be analyzed. These findings in turn allow for finding artifacts in the ML predictor to increase reliability, and/or allow for pruning the ML predictor to increase computational efficiency.
Embodiments of a first aspect of the invention provide, in a first alternative, a method (e.g. for analyzing an inference performed by a ML predictor on a data structure), comprising: assigning a relevance score to a portion of a data structure (e.g. to a pixel of a digital image), the relevance score rating a relevance of the portion for an inference performed by a ML predictor (e.g., an artificial neural network (NN)) on the data structure (e.g., the inference being the network output of applying the NN to the data structure) (e.g., the relevance score indicating a share with which propagation paths, which connect the portion with the first predetermined PP, contribute to an activation of the first predetermined PP, which activation is associated with the inference), wherein the method comprises determining the relevance score for the portion by performing a reverse propagation of an initial relevance score, which is attributed to a first predetermined predictor portion (PP) of the ML predictor, from the first predetermined PP through the ML predictor (or along propagation paths of the ML predictor) onto the portion of the data structure, and by filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined PP (E.g., a set of one or more units or neurons; E.g., the second predetermined PP is upstream relative to the first predetermined PP with respect to a forward propagation direction (e.g., used for inference) of the ML predictor) of the ML predictor, differently than a second propagation path through the ML predictor, the second propagation path circumventing (or not passing through) the second predetermined PP (e.g., the propagation paths connecting the portion with the first predetermined PP; e.g., the apparatus derives the relevance score by aggregating relevance values resulting from reversely propagating the initial relevance value along propagation paths connecting the first predetermined PP and the portion of the data structure).
Further embodiments of the first aspect of the invention provide, in a second alternative, a method (e.g. for analyzing an inference performed by a ML predictor on a data structure), comprising: assigning a relevance score to a predictor portion (PP) (e.g. a target PP) of a ML predictor (e.g., an artificial neural network (NN)) for performing an inference on a data structure (e.g., a inference on the data structure), the relevance score indicating a share with which propagation paths, which connect the PP (e.g. the target PP) with a first predetermined PP of the ML predictor, contribute to an activation of the first predetermined PP, which activation is associated with the inference performed by the ML predictor on the data structure, wherein the method comprises determining the relevance score for the PP by performing a reverse propagation of an initial relevance score, which is attributed to the first predetermined PP, along the propagation paths, and by filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined PP (e.g., a set of one or more units or neurons) of the ML predictor, differently than a second propagation path through the ML predictor, the second propagation path circumventing (or not passing through) the second predetermined PP (e.g., the propagation paths connecting the (target) PP with the first predetermined PP; e.g., the apparatus derives the relevance score by aggregating relevance values resulting from reversely propagating the initial relevance value along propagation paths connecting the first predetermined PP and the (target) PP).
For example, the predictor portion, to which the relevance score is assigned, may be a filter or a channel, or part of a channel of the ML predictor, e.g. an intermediate channel, i.e., a channel between an input channel and an output channel of the ML predictor. In examples, the channel may be the input channel, and in this respect, the predictor portion may relate to a portion of the data structure, being input to the ML predictor. In this respect, it is clear that the portion of the first alternative of the first aspect may be regarded as a predictor portion in the sense of the second alternative.
Embodiments according to a second aspect rely on the idea to measure an affiliation of each of a set of data structures with respect to a predictor portion of an ML predictor, or to a concept (E.g. a concept as introduced above) associated with the predictor portion, by using a relevance score, which measures a contribution of the relevance score in a prediction performed on the respective data set. In other words, the data structures may be rated with respect to a relevance of the predictor portion for the inferences performed on the respective data structures. Doing so allows for identifying a subset of the set of data structures, which subset is representative of a concept encoded by the predictor portion.
The relevance scores may be performing by a reverse propagation of an initial relevance score of a first predetermined predictor portion to the considered predictor portion, e.g., as described with respect to embodiments of the first aspect. By using the reverse propagation for determining the affiliation of the data structures to the predictor portion measure, the affiliation score relies on the actual contribution of the predictor portion to the initial relevance score of the predetermined predictor portion (e.g., an output portion, such as a certain class of a classifying ML predictor). Therefore, in contrast to approaches, which measure the affiliation based on activation of the predictor portion, the disclosed method bases the affiliation on the relevance for a certain inference.
Embodiments according to the second aspect of the invention provide a method (e.g., for analyzing an inference behavior of a machine learning (ML) predictor on a data structure), comprising: determining, for each out of a set of data structures, an affiliation score (or relevance score) with respect to a concept associated with a predictor portion (PP) (e.g., a target PP, e.g. a PP under investigation) of a machine learning (ML) predictor (wherein the concept represents a type of content, to which the predetermined network portion is sensitive, or in response to which the predetermined network portion contributes to a predetermined inference result, or to an activation of a first predetermined PP in an inference of the data structure; E.g., the affiliation score rates, to which extent a content represented in the respective data structure correlates with a the a concept associated with a predetermined predictor portion of an artificial neural network (NN)) by determining a relevance score for the PP with respect to an inference performed by the ML predictor on the respective data structure, wherein the relevance score indicates a contribution of the PP to an activation of a first predetermined PP of the ML predictor, which activation is associated with the inference performed by the ML predictor on the data structure, wherein the method comprises determining the relevance score by performing a reverse propagation of an initial relevance score (which is attributed to the first predetermined PP) from the first predetermined PP to the PP (the target PP).
Further embodiments provide an apparatus configured for performing one or more of the previously described methods.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
In the following, embodiments are discussed in detail, however, it should be appreciated that the embodiments provide many applicable concepts that can be embodied in a wide variety of coding concepts. The specific embodiments discussed are merely illustrative of specific ways to implement and use the present concept, and do not limit the scope of the embodiments. In the following description, a plurality of details is set forth to provide a more thorough explanation of embodiments of the disclosure. However, it will be apparent to one skilled in the art that other embodiments may be practiced without these specific details. In other instances, well-known structures and devices are shown in form of a block diagram rather than in detail in order to avoid obscuring examples described herein. In addition, features of the different embodiments described herein may be combined with each other, unless specifically noted otherwise. Some embodiments of the present invention are now described in more detail with reference to the accompanying drawings, in which the same or similar elements or elements that have the same or similar have the same reference signs assigned or are identified with the same name.
This section describes embodiments of the invention. Further embodiments are described in section B. Furthermore, section B describes embodiments of section A in more detail. In other words, details and features described in section B may optionally be combined with the embodiments of section A, and the advantages described in section B equivalently apply to the embodiments of section A. Applications of the embodiments of section A and B are described in section C. In other words, section C describes embodiments, which make use of the apparatuses and methods described in section A and B and exploit their advantages with respect to exemplary applications.
For example, the output 18 may comprise one or more output portions 36, each of which may be represented by an output value or an output vector. In examples, the ML predictor may be for classifying the data structure, and each of the output portions may be associated with a respective class.
It is noted, that a PP may perform a plurality of individual mappings, i.e. a PP may comprise a plurality of units, in examples referred to as nodes or neurons, each unit performing a mapping from an input to an output of the respective unit. E.g., for the example of a neural network, a PP may comprise one or more neurons of the NN.
An output of a PP in an inference may be referred to an activation of the PP, and may, for example, correspond to the aggregation of respective activations of a plurality of units of which the PP is composed.
Similarly, an output of a unit or neuron of the ML predictor in an inference may be referred to an activation of the unit or neuron, respectively.
For determining the relevance score associated with the inference of the data structure 16, apparatus 10 may perform a reverse propagation 30. The reverse propagation starts at a first predetermined predictor portion 24, referred to as first PPP in the following. As already described before, one or more portions of units of the ML predictor may be regarded, or handled, as one portion of the ML predictor. In the illustrative example of
For the reverse propagation, an initial relevance score is attributed to the first PPP 24, which is propagated in reverse direction 32, e.g. reverse with respect to a propagation direction of the data flow in an inference on a data structure, through the ML predictor 12.
The reverse propagation may, but does not necessarily start at the output of the ML predictor, but may start at any portion of the ML predictor.
The initial relevance value may be a predetermined value, e.g. 1. Alternatively, the initial relevance value may correspond to, or may be derived based on, the activation of the first PPP 24 in the inference, for which the reverse propagation is performed.
The apparatus 10 may perform the reverse propagation, for example, by successively determining respective relevance scores for upstream PPs of current PPs, for which current PPs respective relevance scores are already determined. In this context, the upstream direction corresponds to the reverse propagation direction 32, whereas the forward propagation direction of the inference may be referred to as downstream direction.
Accordingly, the reverse propagation may constitute propagation paths in upstream direction. It should be noted, that different propagation paths may overlap in one or more PPs and/or interconnections between PPs. For example, PP 14′ in
According to examples, the apparatus reversely propagates the relevance score from the first PPP 24 to the data structure 16. The propagation paths may end, or point to portions of the data structure. E.g., in the illustration of
According to embodiments of the first aspect of the present invention, the apparatus filters the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined predictor portion of the ML predictor, referred to as second PPP in the following, differently than a second propagation path through the ML predictor, the second propagation path circumventing the reverse propagation.
The second PPP may be different from the first PPP. For example, the second PPP may be positioned upstream in the ML predictor with respect to the first PPP.
For example, considering in
In examples, the apparatus may consider only propagation paths for the determination of upstream PPs of the second PPP, e.g. the portion 22 of the data structure, which pass through the second PPP. In other words, the weights for filtering the reverse propagation may be one of paths passing through the second PPP and zero for paths circumventing the second PPP.
For example, the CRP algorithm described in section 4.1 below may be an example of a filtered reverse propagation.
The reverse propagation is not necessarily performed until the input data structure 16, but may in alternative examples merely be performed until a PP of the ML predictor, which may be referred to as target PP. The target PP may be upstream with respect to the first PPP and/or the second PPP.
It is noted that embodiments of the first alternative of the first aspect and the second alternative of the first aspect may, in examples, merely differ in whether the reverse propagation is performed until the data structure (first alternative) or until a PP (second alternative). In this context it is noted that the data structure or portions thereof may also be regarded as PPs, at least with respect to the attributability of a relevance score. Accordingly, it is clear, that details described herein equivalently apply to the subject-matter of both claim groups.
In other words, apparatus 10 may perform the reverse propagation up to the data structure 16, or a portion thereof, or may alternatively perform the reverse propagation until a PP of the ML predictor, e.g. PP 26.
It is further noted, that the concept of the subject matter of the first aspect is applicable to the subject-matter of the second aspect and vice versa.
Neuron 41 may be a current neuron in the reverse propagation, i.e. the neuron, the relevance score of which is currently to be reversely propagated. That is, the relevance score of neuron 41 is known, e.g. as it is, or belongs to, the first PPP 14′, or by reverse propagation from the first PPP 14′. The relevance score of neuron 41 may be distributed to the upstream neurons of the current neuron 41. To this end, respective upstream shares R1 of the relevance score of the current neuron for the upstream neurons may be determined. The distribution may be performed so that fractions, with which the activation of the upstream portions contribute to an activation of neuron 41 in the reversely propagated inference, correspond to respective fractions of relevance shares R11, R21, R31 determined for the upstream neurons of the current neuron 41. It is noted, that the relevance share is not necessarily determined for each of the upstream neurons, e.g., in examples, in which filtering of propagation paths is applied. Still, the relevance shares may be determined by distributing the relevance score of the current neuron to all upstream neurons of the current neuron.
The relevance score for an upstream neuron, e.g., neuron 32, may be determined by aggregating (e.g. adding or taking the maximum value, multiplication or any other measure of aggregation or pooling) incoming relevance shares from downstream neurons. For example, the relevance score of neuron 32 may be composed of the upstream relevance share R21, and, if neuron 42 is part of a propagation path, that is, for example, has a non-zero relevance score and/or has received an upstream relevance share in previous propagation steps, further of an upstream relevance share R22 from neuron 42.
In the case that the above-mentioned filtering of the reverse propagation is applied, the upstream relevance shares attributed to a neuron may be weighted differently. E.g., in the exemplary scenario in which neuron 42 is the second PPP, in the determination of the relevance score for neuron 32, relevance shares R21 and R22 may be weighted differently, as in this scenario, from the perspective of neuron 32, relevance share R21 originates from a propagation path passing through the second PPP, whereas relevance share R22 originates from a propagation path circumventing the second PPP. In examples, in which the filtering implies that merely propagation paths passing through the second PPP are considered, in the determination of the relevance score for neuron 32, share R22 may be disregarded in this scenario. Accordingly, in examples, not necessarily all upstream relevance shares attributed to a neuron are considered in the determination of the relevance score for a neuron.
The same concept of reverse propagation may be applied to portions comprising multiple neurons. For example, grouping neurons 41, 42 into one portion, the reverse propagation of relevance scores may be performed equivalently as described above.
For example, in section 4.1 below, Ri←j may refer to an upstream relevance share, Ri to a relevance score of a neuron or a portion, and ai to an activation of a neuron.
Apparatus 20 determines the relevance score 51 of the PP 26 by performing a reverse propagation 30 of an initial relevance score attributed to a first PPP 24 of the ML predictor.
The PP 26 may be located upstream in the ML predictor with respect to the first PPP.
The affiliation score 53 may, in examples, correspond to the relevance score 51 determined for the respective data structure. Alternatively, the affiliation score 53 may be determined based on the relevance score 51.
All details described before with respect to
For example, the details of one or more of the description of the ML predictor 12, portions and units thereof, the reverse propagations 30, the initial relevance score, the first PPP 24, a location thereof (e.g. the first PPP 24 may be an output of the ML predictor or may be an intermediate portion (not an output portion)) may optionally be applied to the apparatus 20 of
In examples, apparatus 20 may further rank the data structures 16 of the set 17 according to their assigned relevance scores and/or apparatus 20 may select a subset of data structures 16 based on their assigned relevance scores. For example, the selected subset may be considered representative of a concept associated with the PP 26. That is, for example, a content represented by the selected and/or highest ranked data structures may be considered representative of content-related data structure properties which result in a certain contribution of the PP 26 to the activation of the first PPP 24. Accordingly, these content-related data structure properties may be considered to represent a concept of the PP 26.
In the following, embodiments of the present invention, and advantages thereof, along with application examples, are described in more detail.
It is noted, that details described in the following may be individually combined with the subject-matter of the claims. In particular, the methods described in section 4 are applicable independently from specific implementation details described in the following.
In particular, without loss of generality, due to the methods' properties being based on the decomposition of mapping functions, both the proposed Concept Relevance Propagation attribution approach (section 4.1), as well as consequently the Relevance Maximization approach (section 4.2) for selecting representative examples for (latent) model encodings/units can be applied to Non-neural-network machine learning predictors which can be formulated as directed (acyclic) graphs (DAG) of mappings, or can be transferred to such a form, via e.g. the process of Neuralization[47, 48, 49]. In other words, the techniques described in Section 4 below are applicable to machine learning predictors other than neural networks, given those predictors have been neuralized, or are described or describable as DAG.
According, what is disclosed in
Alternatively (second alternative of the first aspect), apparatus 10 may be configured for assigning a relevance score to a predictor portion PP (26) (e.g. a target PP) of the ML predictor 12 (e.g., an artificial neural network (NN)) for performing an inference on the data structure 16 (e.g., a inference on the data structure), the relevance score indicating a share with which propagation paths, which connect the PP 26 (e.g. the target PP) with a first predetermined PP 24 of the ML predictor, contribute to an activation of the first predetermined PP, which activation is associated with the inference performed by the ML predictor on the data structure. Apparatus 10 is in this case configured for determining the relevance score for the PP 26 by performing a reverse propagation 30 of an initial relevance score, which is attributed to the first predetermined PP, along the propagation paths, and for filtering the reverse propagation by weighting a first propagation path (301) through the ML predictor, the first propagation path passing through a second predetermined PP 14′ (e.g., a set of one or more units or neurons) of the ML predictor, differently than a second propagation path 302 through the ML predictor, the second propagation path circumventing (or not passing through) the second predetermined PP (e.g., the propagation paths connecting the (target) PP with the first predetermined PP; e.g., the apparatus derives the relevance score by aggregating relevance values resulting from reversely propagating the initial relevance value along propagation paths connecting the first predetermined PP and the (target) PP).
As already mentioned, in examples, the first and second alternatives may merely differ in that the reverse propagation is performed up to the portion 22 of the data structure, while in the second alternative, the propagation may be performed up to PP 26, which may be a portion of the predictor, or, as far as the data structure, or the interconnections of portions of the data structure with portions of the ML predictor, may be considered as portions of the ML predictor, up to a portion of the data structure as in the first alternative. Accordingly, the details described in the following may apply to both alternatives, as long as they comply with this difference.
According to an embodiment, apparatus 10 is configured for filtering the reverse propagation by selectively taking into account propagation paths 301 connecting the first predetermined PP 24 and the portion 22 of the data structure 16, which propagation paths pass through the second predetermined PP 14′ (and, e.g., disregarding propagation paths 302 circumventing, or not passing through, the second predetermined PP 14′; e.g. equation 7 of section 4, see Kronecker-Delta, e.g., representing weights 0 and 1 for circumventing and passing through paths, respectively).
According to an embodiment, the initial relevance score is one of a predetermined value (e.g. one), and an activation of the first predetermined PP 24, which activation is associated with the inference performed by the ML predictor 12 on the data structure 16.
According to an embodiment, see, e.g.,
According to an embodiment, the apparatus 10 is configured for determining the upstream relevance share R11, R22 (which may, e.g., as described before, be regarded as a relevance score of a connection between two neurons) for each of the set of downstream neurons 41, 42 according to a fraction at which an activation a1 of the neuron 31 in the inference on the data structure contributes to an activation of the respective downstream neuron 41, 42 in the inference.
For example, the apparatus may, in performing the reverse propagation 30, distribute a relevance score at a predetermined (e.g., current) neuron of the ML predictor onto upstream neurons of the predetermined neuron according to fractions, which correspond to further fractions at which activations of the predecessor neuron contribute to an activation of the predetermined neuron in the inference.
In addition or in alternative to the activation, the upstream relevance shares may be determined on the basis of one or more other measures. For example, a (modified) gradient backpropagation process, e.g., GradCam, may be used after the inference on the data structure to determine, for each of the set of downstream neurons 41, 42, a (modified) gradient value for each upstream neuron of the respective downstream neuron, and apparatus 10 may determine the upstream relevance shares R11, R22 based on the (modified) gradient values directed at the respective neuron 31.
In general, the apparatus 10 may also be configured for determining the upstream relevance share R11 (which may, e.g., as described before, be regarded as a relevance score of a connection between two neurons) of a connection between a neuron 31 and a downstream neuron 41 of the neuron 31 according to a fraction between a weight/parameter/measure a1 connecting downstream neuron 41 to neuron 31 and a total (or any aggregation) of the weights/parameters/measures of all connections between downstream neuron 41 and any upstream neuron of the downstream neuron 41, e.g. neurons 31, 32, 33.
In examples, the apparatus 10 may determine the upstream relevance share R11, R22 (which may, e.g., as described before, be regarded as a relevance score of a connection between two neurons) based on a combination of one or more or all of the variants described above.
According to an embodiment, the second predetermined PP 14′ comprises one or more neurons of the ML predictor (e.g., the ML predictor comprises or is composed of a neural network (NN), or is representable as directed (acyclic) graph (DAG) of mappings, and comprises a plurality of predictor units, which may also be referred to as nodes, or neurons; E.g., the one or more neurons of the second predetermined PP may belonging to a certain channel and/or layer of the ML predictor or NN).
According to an embodiment, in performing an inference, the second predetermined PP 14′ is sensitive to a specific concept (or pattern), which is potentially present in the content of the data structures.
According to an embodiment, the apparatus 10 is configured for determining the PP relevance score 23 for each of the set of PPs by performing, in an unfiltered manner, a reverse propagation of the initial relevance score attributed to the first predetermined PP 24 from the first predetermined PP 24 through the ML predictor (or along propagation paths of the ML predictor) onto the respective PP 14.
According to an embodiment, the apparatus 10 is configured for obtaining the second predetermined PP 14′ from the set of PPs by one or more of (i) ranking the PPs of the set of PPs according to their PP relevance scores and using one out of one or more highest ranked PPs as the second predetermined PP 14′ and (ii) using an input received via a user interface for selecting the predetermined PP.
In the following, the description of what is disclosed in
According to an embodiment, the apparatus 10 is configured for generating a relevance map (e.g., a heatmap, e.g. a conditional heatmap, such as heatmap 141 descibed with respect to
According to an embodiment, the apparatus 10 is configured for determining respective relevance scores for a plurality of portions of the data structure, and masking portions of the data structure depending on whether the respective relevance scores for the portions fulfill a condition (e.g. exceed a threshold) (or do not fulfill the condition). See, e.g., partial images 145 of
According to an embodiment, the apparatus 10 is configured for assigning respective relevance scores to a plurality of portions of the data structure by performing the reverse propagation from the first predetermined PP 24 to the data structure; and selecting a set of (one or more) portions of the data structure out of the plurality of portions of the data structure based on the respective relevance scores (e.g., by ranking the portions according to their relevance scores, and selecting one or more highest ranked portions; or by selecting portions, the relevance scores of which fulfil a predetermined criterion, e.g. exceed a predetermined threshold, see, e.g., partial images 145 desribed with respect to
According to an embodiment, the apparatus 10 is configured for labelling the portion 22 (or, cf. second alternative, the PP 26) of the data structure as being affiliated to the second predetermined PP 14′, and/or as being associated with a concept represented by the second predetermined PP 14′.
According to an embodiment, the apparatus 10 is configured for performing the inference on the data structure 16 by means of the ML predictor 12.
According to an embodiment, the ML predictor comprises a neural network (NN) comprising a plurality of layers 13 (e.g. layers 13 of
For example, the convolutional layer comprises a plurality of output channels, e.g., a channel is configured for applying a filter (which may be specific to the channel, and may differ from the filter applied by a different channel) to a plurality of portions of input data of the channel (e.g., the input data comprises a plurality of features, a portion comprising a subset of the features), so as to determine, for each of the portions, one output feature of the channel. E.g., the filter may be a tensor. E.g., the filter may be spatially invariant or equal for each of the portions of the input data. E.g., the input data may be represented in one or more 2D arrays, or multi-dimensional arrays. E.g., the input data may comprise one or more channels, each comprising a data array (2D or multi-dimentional). A portion of the input data may comprise the data of a region within one or more channels of the input data, e.g., in case of multiple channels, data within collocated regions in the arrays of the multiple channels.
The filter may be composed of a convolutional kernel (of one or multiple dimensions, e.g. three dimension in case of the input data comprising multiple channels of 2D arrays), the kernel comprising a plurality of weights (e.g. weights wij of section 4). The channel may convolve the input data using the kernel. Each of the above-mentioned portions of the input data may be defined by one scan position of the kernel in performing the convolution. E.g., in the notation of section 4, ai may represent a feature of the input data, the sum in equation 3 being over all features (or activations) of one portion of the input data, e.g. input data covered by the kernel at a specific scan position of the convolution.)
In other words, in the context of a convolutional layer, one of the above-mentioned neurons may be represented by one scan position of a convolution performed in the convolutional layer. The output of the neuron may be one feature or activation (e.g. one value), e.g. aj in equation 4, which may be derived by aggregating the weighted input activations (the features of the portion of the input data, weighted using the weights of the kernel). The output of one neuron may be part of a feature map, or output channel, of one channel of the layer, which channel may be used, e.g. in conjunction with further channels of the layer, as an input of a further layer of the NN.
In view of the just described, the network portion, i.e., the predictor portion, may, for example, be one or multiple channels of a convolutional layer. Alternatively, the network portion may be a portion of a channel, e.g. one or more neurons, e.g. a portion of a channel representing a region within the channel.
It is noted that the above description may optionally apply to the embodiments of both alternatives of the first aspect and to embodiments of the second aspect, which may also relate to NNs with a convolutional layer.
According to an embodiment, the data structure 16 is a digital image, and the portion 22 comprises a region within the digital image, or comprises one or more samples (or pixels) of the digital image.
According to an embodiment, the apparatus 10 is configured for filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through each of a plurality of predetermined PPs, which includes the second predetermined PP, differently than a second propagation path through the ML predictor, the second propagation path circumventing (or not passing through) at least one of the plurality of predetermined PPs.
That is, for example, the reverse propagation may be filtered by (or conditioned on) multiple PPs, such allowing to analyzes sub-concepts of a concept. E.g., by filtering the reverse propagation using a predetermined PP, which is upstream of the second PPP 14′, the second PPP may be analyzed with respect to sub-concepts contributing to a concept represented by the second PPP, see
According to an embodiment, the first predetermined PP 24 represents (or corresponds to) a predictor output of the ML predictor (e.g., the predictor output is associated with one out of a set of one or more inference results of the inference, which inference result may be, but is not necessarily, a “highest rated” inference result, e.g., in case of a classifying ML predictor, the first predetermined PP 24 may correspond to one or more classes, not necessarily the class having attributed the highest confidence), or an intermediate predictor portion (not being an output of the ML predictor).
According to an embodiment, the data structure is, or is a combination of,
According to an embodiment, the apparatus 10 is configured for determining respective relevance scores of the portion 22 of the data structure with respect to a plurality of first predetermined PPs (e.g., assigning respective relevance scores to the portion, the respective relevance scores indicating respective shares with which propagation paths, which connect the portion with the first predetermined PPs, contribute to an activation of the first predetermined PPs in the inference) by performing respective reverse propagations of respective initial relevance scores attributed to the first predetermined PPs (e.g., the plurality of first predetermined PPs comprises the first predetermined PP). The apparatus 10 may rank the first predetermined PPs according to the relevance scores determined for the portion of the data structure with respect to the first predetermined PPs and/or the apparatus 10 may select one or more first predetermined PPs out of the plurality of predetermined PPs based on the relevance scores determined for the portion of the data structure with respect to the first predetermined PPs.
Alternatively, cf. the second alternative of the first aspect, the apparatus 10 may perform the determination of the respective relevance scores for PP 26 rather than for portion 22. That is, apparatus 10 may be configured for determining respective relevance scores of the PP 26 with respect to a plurality of first predetermined PPs by performing respective reverse propagations of respective initial relevance scores attributed to the first predetermined PPs. The apparatus may rank the first predetermined PPs according to the relevance scores determined for the PP 26 with respect to the first predetermined PPs, and/or select one or more first predetermined PPs out of the plurality of predetermined PPs based on the relevance scores determined for the PP 26 with respect to the first predetermined PPs.
For example, the first predetermined PPs are output portions of the ML predictor, e.g. classes of a classifying PP. For example, the ranking/selection may be indicative of first predetermined PPs, for which a concept associated with the second predetermined PP is relevant (i.e., e.g., contribute to an activation).
According to an embodiment, the apparatus 10 is configured for pruning the ML predictor in dependence on the relevance score.
According to an embodiment, the apparatus 10 is configured for performing the inference for the data structure, and assigning the relevance score to the portion 22 (or the PP 26 in case of the second aspect). If the relevance score fulfils a predetermined criterion (e.g., exceeds or not exceeds a predetermined threshold), apparatus 10 may perform a further inference for the data structure, wherein the apparatus is configured for deactivating (or altering/manipulating, e.g., deactivation may be performed as pruning in the sense of a bit blunt. Alternatively, the model internal substructures may be altered in different ways, such as boosting or reducing activation strength of particular concept encodings in order to alter the model behavior.) the second predetermined PP in performing the further inference (e.g., disregarding an activation of the second predetermined PP in the inference). Optionally, the apparatus 10 may compare an inference result of the inference with a further inference result of the further inference, so as to obtain a confidence score on the inference result.
Now reverting to the embodiments of the second aspect described with respect to
Apparatus 20 determines the affiliation score 53 of one of the data structures 16 by determining a relevance score 51 for the PP 26 with respect to an inference 8 performed by the ML predictor 12 on the respective data structure 16, wherein the relevance score indicates a contribution of the PP 26 to an activation of a first predetermined PP 24 of the ML predictor, which activation is associated with the inference 8 performed by the ML predictor on the data structure. The apparatus is configured for determining the relevance score by performing a reverse propagation of an initial relevance score (which is attributed to the first predetermined PP) from the first predetermined PP to the PP (the target PP), e.g., the reverse propagation as already described with respect to
Further details and advantages of apparatus 20 are described below in section B, 4.2.2, and in particular with respect to
According to an embodiment, (see, e.g.,
For example, the apparatus 20 performs a reverse propagation of an initial relevance score from the first predetermined PP, or from an output portion of the ML predictor, through the ML predictor, passing, in the reverse propagation, a plurality of PPs 14, thereby assigning them respective relevance scores. E.g., the apparatus 20 may reversely propagate the result of an inference of performed on one of the data structures.
Accordingly, the relevance scores assigned to the PPs may indicate, which of the PPs are particularly relevant to the output of the inference. A PP having a particularly high relevance score may, e.g., encode a concept, which is relevant for the output. Apparatus 20 may select the target PP 26 out of the set 41 of PPs, e.g., by selecting a PP having assigned a relevance score, which fulfils a predetermined condition such as exceeding a threshold.
For example, apparatus 20 may assign a data structure to the subset, if the assigned affiliation score fulfils a predetermined criterion, e.g. exceeds a threshold. Accordingly, the subset may be a collection of data structures, which provoke a high relevance of the PP. Accordingly, the subset may be representative of a concept of the PP 26.
According to an embodiment, apparatus 20 is configured for selecting the subset of data structures by comparing the affiliation scores of the data structures to a threshold, or ranking the data structures according to their affiliation scores, and selecting, out of the set of data structures, a predetermined number of data structures having the highest ranked affiliation scores.
According to an embodiment, apparatus 20 is configured for presenting the selected subset of data structures, or respective portions thereof, at a user interface (e.g. a display).
According to an embodiment, see
For example, selecting the set of portions may localize portions of the data structure affiliated with the first predetermined PP. In other words, by determining the selected subset of data structures, the apparatus my identify data structures out of the set of data structures, which are affiliated with, or represent a concept of, the first predetermined PP.
In the following, the description of what is disclosed in
For example, in the reverse propagation for the PPs, a filter may be applied, e.g. as described below, or as described with respect to the first aspect. In this case, the relevance scores assigned to the portions may further indicate an affiliation of the portions to concepts associated with the “filtering” PP, i.e. the second predetermined PP. Accordingly, by applying the filter, the concept of the first predetermined PP may be resolved in a finer granularity, or, in other words, may allow the identification of a sub-concept of the concept of the first predetremined PP, the sub-concept being associated with the second predetremined PP. The selected portions may be representative of this sub-concept, and may thus facilitate to reveal a semantic meaning of the sub-concept.
According to an embodiment, apparatus 20 is configured for, for each of the selected subset of data structures (e.g. the data structure being part of the subset indicates that the PP, to which the relevance score is assigned, is representative of a concept of the first predetermined PP), labelling the PP (e.g., the PP (the target PP) is associated to or represents a portion of the respective data structure, e.g., the PP may correspond to one or more interconnections of the ML predictor; in this respect, also an interconnection from a portion of the data structure to an input portion or input unit/neuron may be regarded as PP; accordingly, the apparatus may determine a relevance score for a portion of the data structure, e.g. as defined by claim group A, wherein, the portion of the data structure may be understood as a PP of the ML predictor) as being affiliated to the first predetermined PP, and/or as being associated with a concept represented by the first predetermined PP.
According to an embodiment, apparatus 20 is configured for filtering the reverse propagation by weighting a first propagation path 301 through the ML predictor, the first propagation path passing through a second predetermined PP 14′ of the ML predictor, differently than a second propagation path 302 through the ML predictor, the second propagation path circumventing the second predetermined PP. For each of the selected subset of data structures (e.g. the data structure being part of the subset indicates that the PP, to which the relevance score is assigned, is representative of a concept of the first predetermined PP), apparatus 20 may label the PP (e.g., the PP (the target PP) is associated to or represents a portion of the respective data structure, e.g., the PP may correspond to one or more interconnections of the ML predictor; in this respect, also an interconnection from a portion of the data structure to an input portion or input unit/neuron may be regarded as PP; accordingly, the apparatus may determine a relevance score for a portion of the data structure, e.g. as described with respect to apparatus 10 of
According to an embodiment, apparatus 20 is configured for determining, for one of the selected data structures, an activation of the PP (the target PP) with respect to an inference on the data structure (by performing the inference using the ML predictor), and performing a reverse propagation of an initial relevance score derived from the activation of the PP, from the PP onto a further PP of the ML predictor. In other words, the PP may serve as the first predetermined PP 24 in a determination of a relevance score as described with respect to apparatus 10 before. Accordingly, the PP may be used as the first predetermined PP 24 of
According to an embodiment, apparatus 20 is configured for filtering the reverse propagation by weighting a first propagation path through the ML predictor, the first propagation path passing through a second predetermined PP (e.g., a set of one or more neurons) of the ML predictor, differently than a second propagation path through the ML predictor, the second propagation path circumventing (or not passing through) the second predetermined PP (e.g., the propagation paths connecting the predetermined network portion with the network output; e.g., the apparatus derives the relevance score by aggregating relevance values resulting from reversely propagating the initial relevance value along propagation paths connecting the network output and the predetermined network portion).
Details described with respect to filtering the reverse propagation, and the second predetermined PP, described with respect to the first aspect, i.e. apparatus 10 of
The described apparatuses may also serve as a description of respective methods in a sense that further embodiments provide methods, which are defined by comprising, as steps of the methods, the steps performed by the described apparatuses.
Considerable advances have been made in the field of Machine Learning (ML), with especially Deep Neural Networks (DNNs) [19] achieving impressive performances on a multitude of domains [10, 38, 14]. However, the reasoning of these highly complex and non-linear DNNs is generally not obvious [33, 35], and as such, their decisions may be (and often are) biased towards unintended or undesired features [41, 18, 37, 2]. This in turn hampers the transferability of ML models to many application domains of interest, e.g., due to the risks involved in high-stakes decision making [33], or the requirements set in governmental regulatory frameworks [12] and guidelines brought forward [9].
In order to alleviate the “black box” problem and gain insights into the model and its predictions, the field of explainable Artificial Intelligence (XAI) has recently emerged. In fact, multitudes of XAI methods have been developed that are able to provide explanations of a model's decision while approaching the subject from different angles, e.g., based on gradients [25, 42], as modified backpropagation processes [4, 40, 39, 27], by probing the model's reaction to changes in the input [46, 32] or visualizing stimuli specific neurons react strongly to [11, 29]. The field can roughly be divided into local XAI and global XAI. Methods from local XAI commonly compute attribution maps in input space highlighting input regions or features, which carry some form of importance to the individual prediction process (i.e., with respect to a specific sample).
However, the visualization of important input regions is often of only limited informative value on its own as it does not tell us what features in particular the model has recognized in those regions, as
According to local XAI, for example, the attribution of one of the classes of output 18 may be backpropagated (or reversely propagated) through the entire ML predictor to the input 16 to obtain a heatmap 139, which indicates, to which extent individual parts or pixels of the input contribute to the attribution of the respective class. Accordingly, the heatmap 139 may indicate, where the model is looking at. Global XAI, on the other hand, may use a feature visualization, thereby answering the question, what features exist. To this end, a set 181 of categorized samples may be input to the ML predictor to check, e.g., the activation of a specific portion or unit of the ML predictor when processing samples of a specific category. That way, a concept represented by a category may be attributed to a specific portion or unit of the ML predictor.
According to embodiments of the present invention, an attribution attributed to a portion or a unit of the ML predictor (e.g. a class of the output 18 or a portion to which a specific concept is assigned, e.g. by measuring activations of categorized samples as described with respect to global XAI, or a portion to which a particularly high attribution is attributed in backpropagating the attribution of a class) may be backpropagated to the input 16 (or only up to a further portion of the ML predictor located upstream from the portion serving as origin) by filtering the backpropagation in a sense that only propagation paths through a certain portion of the ML predictor are considered or in a sense that propagation paths passing throug different portions of the ML predictor are weighted differently. E.g., as indicated in
In
As it is illustrated in
Assuming for example an image classification setting and an attribution map computed for a specific prediction, it might be clear where (in terms of pixels) important information can be found, but not what this information is, i.e., what characteristics of the raw input features the model has extracted and used during inference, or whether this information is a singular characteristic or an overlapping plurality thereof. This introduces many degrees of freedom to the interpretation of attribution maps generated by local XAI, rendering a precise understanding of the models' internal reasoning a difficult task.
Global XAI, on the other hand, attempts to address the very issue of understanding the what question, i.e., which features or concepts have been learned by a model or play an important role in a model's reasoning in general. Some approaches from this category synthesize example data in order to reveal the concept a particular neuron activates for [11, 43, 21, 26, 29], but do not inform which concept is in use in a specific classification or how it can be linked to a specific output. From these approaches, we can at most obtain a global understanding of all possible features the model can use, but how these features interact with each other given some specific data sample and how the model infers a decision remains hidden. Other branches of global XAI propose methods, e.g., to test a model's sensitivity to a priori known, expected or pre-categorized stimuli [15, 31, 5, 6]. These approaches require labeled data, thus limiting, and standing in contrast to, the exploratory potential of local XAI.
Some recent works have begun to bridge the gap between local and global XAI by, for example, drawing weight-based graphs that show how features interact in a global, yet class-specific scale, but without the capability to deliver explanations for individual data samples [13, 20]. Others plead for creating inherently explainable models in the hope of
replacing black box models [33]. These methods, however, either require specialized architectures, data and labels, or training regimes (or a combination thereof) [7, 8] and do not support the still widely used off-the-mill end-to-end trained DNN models with their extended explanation capabilities.
Embodiments of the present invention connect lines of local and global XAI research by introducing Concept Relevance Propagation (CRP), a next-generation XAI technique that explains individual predictions in terms of localized and human-understandable concepts. Other than the related state-of-the-art, CRP answers both the “where” and “what” questions, thereby providing deep insights into the model's reasoning process. As a post-hoc XAI method, CRP can be applied to (almost) any ML model with no extra requirements on the data, model or training process. As demonstrated on multiple datasets, model architectures and application domains, CRP-based analyses according to the present invention may allow one to (1) gain insights into the representation and composition of concepts in the model as well as quantitatively investigate their role in prediction, (2) identify and counteract Clever Hans filters [18] focusing on spurious correlations in the data, and (3) analyze whole concept subspaces and their contributions to fine-grained decision making. Similar to Activation Maximization [28], Embodiments according to a further aspect of the invention make use of the Relevance Maximization (RelMax) approach, which may use CRP in order to search for the most relevant samples in the training dataset, and show its advantages when “explaining by example”. In summary, by lifting XAI to the concept level, CRP opens up a new way to analyze, debug and interact with a ML model, which can be particularly beneficial for safety-critical applications and ML-supported investigations in the sciences.
Beginning with Section 2.1, we present approaches to study the role of learned concepts in individual predictions using our glocal CRP approach. The understanding of hidden features and their function then allows to interact with the model and to test its robustness against feature ablation in Section 2.2. In Section 2.3, we study concept subspaces in order to identify (dis)similarities and roles of concepts in fine-grained decision making.
Attribution maps, e.g. the heatmaps 139 of
By conditioning the explanation on relevant hidden-layer channels via CRP, embodiments of the invention can assist in concept understanding and overcome the interpretation gap.
With the selection of a specific neuron or concept, CRP allows investigating, how relevance flows from and through the chosen network unit to lower-level neurons and concepts, as is discussed in Section 4.1. This gives information about which lower-level concepts carry importance for the concept of interest and how it is composed of more elementary conceptual building blocks, which may further improve the understanding of the investigated concept and model as a whole.
In this section, it is described how CRP can be leveraged as a Human in the Loop (HITL) solution for dataset analysis. In a first step, embodiments of methods are described, which uncover a Clever Hans artifact [18], suppress it by selectively eliminating the most relevant concepts in order to assess its decisiveness for the recognition of the correct class of a particular data sample. Then, embodiments of methods are described, which utilize class-conditional reference sampling (cf. Section 4) to perform an inverse search to identify multiple classes making use of the filter encoding the associated concept, both in a benign and a Clever Hans sense.
for the 6 most relevant channels 454, 361, 414, 203, 486, 443 in the selected region in descending order of their relevance contribution from left to right. Diagram 351 shows the relevance contribution of 20 most relevant filters inside the region. These filters are successively set to zero and the change in prediction confidence of different classes, namely class “safe” 301, class “lock” 302, class “monitor” 303, and class “pay-phone” 304 is recorded and shown in diagram 352. In particular,
Activation Maximization (ActMax) as illustrated in Section 4.2, we conclude that they approximately encode for white strokes. Using the herein disclosed Relevance Maximization (RelMax) approach, which uses CRP for identifying the most relevant samples, allows to gain a deeper insight into the model's preferred usage of the filters and discover that the model utilizes them to detect white strokes in “written characters”. To test the robustness of the model against this Clever Hans artifact, we successively set the activation output map of the 20 most relevant filters activating on the watermark to zero. In diagram 352 of
In an inverse search, it can then be explored, for which samples and classes these filters also generate high relevance. This allows to understand the behavior of the filter in more detail and to find other possible contaminated classes.
In the above described embodiments, the conditional attribution flow or relevance propagation was described based on the non-limiting example of using single filters as functions assumed to (fully) encode learned concepts. Consequently, we have visualized examples and quantified effects based on per-filter granularity. While previous work suggest that individual neurons or filters often encode for a single human comprehensible concepts, it can generally be assumed that concepts are encoded by sets of filters. The learned weights of potentially multiple filters might correlate and thus redundantly encode the same concept, or the directions described by several filters situated in the same layer might span a concept-defining subspace. In this section, we now aim to investigate the encodings of filters of a given neural network layer for similarities in terms of activation and use within the model.
obtained via RelMax. As per the reference images, the over-all concept of the cluster seems to be related to keyboard keys, round buttons as well as rectangular roofing shingles, as shown by sets 447 of most relevant samples of these channels. In other words,
To summarize, embodiments of the present invention exploit the finding that although several filters may show signs of correlation in terms of output activation, they are not necessarily encoding redundant information or are serving the same purpose. Conversely, using the herin disclosed CRP in combination with the RelMax-based process for selecting reference examples representing seemingly correlating filters, allows to discover and understand the subtleties a neural network has learned to encode in its latent representations.
Embodiments of the present invention provide CRP, a post-hoc explanation method, which not only indicates which part of the input is relevant for an individual prediction, but may also communicate the meaning of involved latent representations by providing human-understandable examples. Since CRP combines the benefits of the local and global XAI perspective, it computes more detailed and contextualized explanations, considerably extending the state-of-the-art. Among its advantages are the high computational efficiency (order of a backward pass) and the out-of-the-box applicability to (almost) any model without imposing constraints on the training process, the data and label availability, or the model architecture. Furthermore, CRP introduces the idea of conditional backpropagation tied to a single concept or a combination of concepts as encoded by the model, within or across layers. Via this ansatz, the contribution of all neurons' concepts in a layer can be faithfully attributed, localized in the input space, and finally their interaction can be studied. As shown in this work, such an analysis allows one to disentangle and separately explain the multitude of in-parallel partial forward processes, which transform and combine features and concepts before culminating into a prediction. Finally, with RelMax we move beyond the decade-old practice of communicating latent features of neural networks based on examples obtained via maximized activation. In particular, we show that the examples which stimulate hidden features maximally are not necessarily useful for the model in an inference context, or representative for the data the model is familiar and confident with. By providing examples based on relevance, however, the user is presented with data with characteristics which actually play an important role in the prediction process. Since the user can select examples wrt. any (i.e., not necessarily the ground truth) output class, the disclosed approach constitutes a new tool to systematically investigate latent concepts in neural networks.
The above discussed experiments have qualitatively and quantitatively demonstrated the additional value of the CRP approach for common datasets and end-to-end trained models. Specifically, they showed that reference samples selected with relevance-based criteria, concept heatmaps and atlases, as well as concept composition graphs open up the ability to understand model reasoning on a more abstract and conceptual level. These insights then allow to identify Clever Hans concepts, to investigate their impact, and finally to correct an ML model for these misbehaviors. Further, using relevance-based reference sample sets, embodiments of the disclosed method enables to identify concept themes spanned by sets of filters in latent space. Although channels of a cluster have a similar function, they seem to be used by the model for fine-grained decisions regarding details in the data, such as the particular type of buttons to partially decide whether an image shows a laptop keyboard, a mechanical typewriter or a TV remote control. Finally, embodiments of the herein disclosed CRP method are useful in non-image data domain, where traditional attribution maps are often difficult to interpret and comprehend by the user. Our experiments on time series data have shown that as long as a visualization of the data can be found, the meaning of latent concepts can be communicated via reference examples.
Overall, embodiments of the tools proposed in this disclosure, and the resulting increase in semantics and detail to be found in sample-specific neural network explanations, allows to advance the applicability of post-hoc XAI to novel or previously difficult to handle models, problems and data domains.
This section presents embodiments of the techniques used and introduced in this disclosure.
In the following, an embodiment of CRP is described, a backpropagation-based attribution method extending the framework of Layer-wise Relevance Propagation (LRP) [4]. As such, CRP inherits the basic assumptions and properties of LRP.
The description starts with a description of LRP. Assuming a predictor with L layers
LRP follows the flow of activations computed during the forward pass through the model in opposite direction, from the final layer fL back to the input mapping f1. Given a particular mapping f*(⋅), we consider its pre-activations zij mapping inputs i to outputs j and their aggregations zj at j. Commonly in neural network architectures such a computation is given with
where ai are the layer's inputs and wij its weight parameters. Finally, σ constitutes a (component-wise) non-linearity producing input activation for the succeeding layer(s). The LRP method distributes relevance quantities Rj corresponding to aj and received from upper layers towards lower layers proportionally to the relative contributions of zij to zj, i.e.,
Lower neuron relevance is obtained by losslessly aggregating all incoming relevance messages Ri←j as
This process ensures the property of relevance conservation between a neuron j and its inputs i, and thus adjacent layers. LRP is mathematically founded in Deep Taylor Decomposition [23].
In the following, a method, referred to as CRP, according to embodiments of the present invention is described. CRP extends the formalism of LRP by introducing conditional relevance propagation determined by a set of conditions θ. Each condition c∈θ can be understood as an identifier for neural network elements, such as neurons j located in some layer, representing latent encodings of concepts of interest. One such condition could, for example, represent a particular network output to initiate the backpropagation process from. Within the CRP framework, the basic relevance decomposition formula of LRP given in Equation (5) then becomes
following the potential for a “filtering” functionality briefly discussed in [24]. Here, Rjl(x|θ) is the relevance assigned to layer output j given from the CRP process performed in upper layers under conditions θ, to be distributed to lower layers. The sum-loop over cl∈θl then “selects” via the Kronecker-Delta δjc
and
in the conditional heatmap 541 of
One effect of CRP over LRP and other attribution methods is an increase in detail of the obtained explanations, as illustrated with
Here, the tuple (p, q, j) uniquely addresses an output voxel of the activation tensor z(p,q,j) computed during the forward pass with p and q indicating the spatial tensor positions and j the channel.
Due to the conservation property of CRP inherited from LRP, the global relevance of individual concepts to per-sample inference can be measured by summation over input units i as
in any layer l where θ has taken full effect. This can easily be extended to a localized analysis of conceptual importance, by restricting the relevance aggregations to regions of interest
as illustrated in
the tuple (u, v, i) addresses the spatial axes with u and v, and the channel axis i at layer l−1. An aggregation over spatial axes with
communicates the dependency between channel j to lower-layer channel i, and thus related concepts, in terms of relevance in the prediction context of sample x. Following the LRP methodology, an adaptation of the CRP approach beyond CNN, e.g., to recurrent [3] or graph [36] neural networks, is possible.
In the following, optional features of CRP are described in more detail, starting from LRP.
LRP may be regarded a white-box attribution method grounded on the principles of flow conservation and proportional decomposition. Its application is aligned to the layered structure of machine learning models. Assuming a model with L layers
LRP may follow the flow of activations and pre-activations computed during the forward pass through the model in opposite direction, from the final layer fL back to the input mapping f1. Let us consider some (internal) layer or mapping function f*(⋅) in the model. Within such a layer, LRP assumes the computation of pre-activations zij, mapping inputs i to outputs j, which are then aggregated as zj at j, e.g., by summation. Commonly, in neural network architectures, such a computation is given with
where ai are the input activations passed from the previous layer and wij are the layer's learned weight parameters, mapping inputs i to layer outputs j. Note that the aggregation by summation to zj can be generalized, e.g., to also support max-pooling by formulating the sum as a p-means pooling operation. Finally, σ constitutes a (component-wise) non-linearity producing input activation for the succeeding layer(s). In order to be able to perform its relevance backward pass, Irp assumes the relevance score of a layer output j as given as Rj. The algorithm usually starts by using any (singular) model output of interest as an initial relevance quantity. In its most basic form, the method then distributes the quantity Rj towards the neuron's input as
i.e., proportionally wrt. the relative contribution of zij to zj. Lower neuron relevance is obtained by simply aggregating all incoming relevance messages Ri←j without loss:
This proportionality simultaneously ensures a conservation of relevance during decomposition as well as between adjacent layers, i.e.,
Note, that above formalism, at the scope of a layer, introduces the variables i and j as the inputs and outputs of the whole layer mapping, and assumes zij=0 for unconnected pairs of i and j, as it is the case in single applications of filters in, e.g., convolutional layers. For component-wise non-linearities σ in Equation (4), commonly implemented by, e.g., the tanh or ReLU functions which (by lrp) typically are treated as a separate layer instances, this results in zij=δijzj (with δij being the Kronecker-Delta representing the input-output connectivity between all i and j) and consequently in an identity backward pass through o. This principle of attribution computation by relevance decomposition can be implemented and executed efficiently as a modification of gradient backpropagation.
In order to ensure robust decompositions and thus stable heatmaps and explanations, several purposed LRP rules may be applied, for which Equations (5) and (6) serve as a conceptual basis. A composite strategy, mapping different rules to different parts of a neural network may qualitatively and quantitatively increases attribution quality for the intent of explaining prediction outcomes. In the following analysis, different composite strategies are therefore used.
LRP, like other backpropagation-based methods, may compute attribution scores for all hidden units of a neural network model in order to allot a score to the model input. While in some recent works those hidden layer attribution scores have been used as a (not further semantically interpreted) means to improve deep models, or as proxy representations of explanations for the identification of systematic Clever Hans behavior, they are usually disregarded as a “by-product” for obtaining per-sample explanations in input space. The reason is fairly simple: End-to-end learned representations of data in latent space are usually difficult (or impossible) to interpret, other than the samples in input space, e.g., images. Using attribution scores for rating the importance of individual yet undecipherable features and their activations does not provide any further insight about the model's inference process.
Assuming an understanding of the distinct roles of latent filters and neurons in end-to-end learned DNN. Then, another problem emerges for interpreting model decisions in input space, rooted in the mathematics of modified backpropagation approaches. As Equation (7) summarizes for intermediate layers, LRP (and related approaches) propagates quantities from all layer outputs j simultaneously to all layer inputs i. This leads at a layer input to a weighted superposition of attribution maps received from all upper layer representations, where detailed information about the individual roles of interacting latent representations is lost. What remains is a coarse map identifying the (general) significance of an input feature (e.g., a pixel) to the preceding inference step. A notable difference to this superimposing backpropagation procedure within the model is the initialization of the backpropagation process, where usually only one network output, of which the meaning (e.g., representation of categorical membership) generally is known, is selected for providing an initial relevance attribution quantity, and all others are masked by zeroes. This guarantees that an explanation heatmap represents the significance of (input) features to only the model output of choice. Let us call this heatmap representation a (class or output) conditional relevance map R(x|y) specific to a network output y and a given sample x. Would one backpropagate all network outputs simultaneously, as it is demonstrated in
In the following, embodiments are described which use different strategies for disentangling attribution scores for latent representations in order to increase the semantic fidelity of explaining heatmaps via CRP. We introduced the notion of a class-or output-conditional relevance quantity R(x|y) for describing the use of knowledge about the meaning of particular neural network neurons and filters and their represented concepts—here the categories represented by the neurons at the model output. The key idea for obtaining R(x|y) is the masking of unwanted network outputs prior to the backpropagation process via a multiplication with zeroes. Perpetuating the notation introduced in the previous Section 4.1.1, obtaining the attribution scores Ri1(x|y) for input units i corresponding to the individual components, features or dimensions xi of input sample x at layer l=1 and model output category y is achievable by initializing the layer-wise backpropagation process with the initial relevance quantity RjL(x|y)=δjyfjL(x), with fjL(x) being the model output of the j-th neuron at output layer L. Using the Kronecker-Delta δjy, only the output of the neuron corresponding to the output category y is propagated in the backward pass. Let us uphold our assumption of knowledge about the concepts encoded by each filter or neuron within a DNN. We generalize the principle of masking or selecting the model output for a particular outcome to be explained by introducing the variable θ describing a set of conditions ct bound to representations of concepts and applying to layers l. Multiple such conditions in combination might extend over multiple layers of a network. Note that we use natural numbers as identifying indicators for neural network filters (or elements in general) in compliance to the Kronecker Delta. Here, θ then allows for a (multi-) concept-conditional computation of relevance attributions R(x|θ).
We therefore extend the relevance decomposition formula in Equation (5), following the potential for a “filtering” functionality to
where δjc
The most basic form of relevance disentanglement is the masking of neural network outputs for procuring class-specific heatmaps. Here, heatmaps gain (more) detailed meaning by specifying a class output for attribution distribution, answering the question of “which features are relevant for predicting (against) a chosen class”. Backpropagation-based XAI methods also assign attribution scores to neurons of intermediate layers, and thus further reveal the relevance of hidden neurons for a prediction. Regarding DNNs, these hidden neurons can represent human-understandable concepts. It has been shown that the meaning of filters in a neural network is hierarchically organized within its sequence of layered representations, meaning that an abstract latent representation within the model is based on (weighted) combinations of simpler concepts in lower layers. Such concepts can be allocated to individual neurons or groups of neurons, e.g., a filter or filters of a convolutional layer of a dnn. By introducing (multi-)conditional CRP, i.e., via a masking of hidden neurons, the relevance contribution of individual concepts used by a neural network can be, in principle, disentangled and individually investigated as well. This expands the information horizon to questions such as “how relevant a particular concept is for the prediction”, or “which features are relevant for a specific concept”.
Visualizations of concept-conditional relevance in heatmaps show where concepts contributing to a chosen network output are localized and recognized by the model in the input space. Typically, as discussed in Section 4.1, an explanation heatmap regarding a singular specific output class may be described by a combination of interactions of individual concepts. The bar chart in the bottom left of of neurons i in layer l. Specifically, relevance scores Ril(x|θc) in, e.g., input space, can be aggregated meaningfully over image regions for the concept c to a localized relevance score
measuring the importance of a concept to the prediction on a set of given input features, e.g., pixels. Extending the notation introduced in Equation (9), for a convolutional layer with J channels, local relevance aggregation along the spatial axis is given by
aggregating over all positions (p, q) defined in the set .
For methods adhering to a conservation property such as CRP, this property permits the comparison of multiple local image regions and/or sets of concepts c in terms of (relative) importance of learned latent concepts, as illustrated in
Then we aggregate the resulting attribution maps according to Equation (11) over two input regions and
in order to locally measure the relative importance of by the model perceived concepts. As seen later, the capabilities of localized crc can be utilized to visualize a “Concept Atlas” which demonstrates which concepts models perceive and use locally for their decision making process.
In the following, we discuss the widely-used Activation Maximization approach to procuring representations for latent neurons, and present a novel CRP-based Relevance Maximization technique to improve concept identification and understanding.
A large part of feature visualization techniques rely on ActMax, where in its simplest form, input images are sought that give rise to the highest activation value of a specific network unit. Recent work [45, 8] proposes to select reference samples from existing data for feature visualization and analysis. In the literature, the selection of reference samples for a chosen concept c manifested in groups of neurons is often based on the strength of activation induced by a sample. For data-based reference sample selection, the possible input space is restricted to elements of a particular finite dataset
⊂
. The authors of [8] assume convolutional layer filters to be spatially invariant. Therefore, entire filter channels instead of single neurons are investigated for convolutional layers. One particular choice of maximization target
(x) is to identify samples x*∈
, which maximize the sum over all channel activations, i.e.,
resulting in samples x*sumact which are likely to show a channel's concept in multiple (spatially distributed) input features, as maximizing the entire channel also maximizes . However, while targeting all channel neurons, reference samples including both concept-supporting and contradicting features might result in a low function output of
, as negative activations are taken into account by the sum. Alternatively, a non-linearity can be applied on zi(x), e.g., ReLU, to only consider positive activations. A different choice is to define maximally activating samples by observing the maximum channel activation
leading to samples x*maxact with a more localized and strongly activating set of input features characterizing a channel's concept. These samples x*maxact might be more difficult to interpret, as only a small region of a sample might express the concept.
In order to collect multiple reference images describing a concept, the dataset consisting of n samples is first sorted in descending order according to the maximization target
, i.e.
Subsequently, we define the set
containing the k≤n samples ranked first according to the maximization target to represent the concept of the filter(s) under investigation. We denote the set of samples obtained from as
and the set obtained from
as
.
We introduce the method of RelMax as a complement to ActMax. Regarding RelMax, we do not search for images that produce a maximal activation response. Instead, we aim to find samples, which contain the relevant concepts for a prediction. In order to select the most relevant samples, we define maximization targets by using the relevance Ri(x|θ) of neuron i for a given prediction, instead of its activation value zi. Specifically, the maximization targets are given as
and reference samples 602 selected based on relevance
for each two filters (upper and lower line). Samples selected based on ActMax only represent maximized latent neuron activation, while samples based on RelMax represent features which are actually useful and representative for solving a prediction task.
for x1 and x2, respectively, and feature groups 6041 and 6042 are selected by relevance
for x1 and x2, respectively—it does not necessarily result in high relevance or contribution to inference: The feature transformation {right arrow over (w)} of a linear layer with inputs x1 and x2, which is followed by a ReLU non-linearity, is shown. Here, samples from the blue cluster of feature activations lead to high activation values for both features x1 and x2, and would be selected by ActMax, but receive zero relevance, as they lead to an inactive neuron output after the ReLU, and are thus of no value to following layers. That is, even though the given samples activate features x1 and x2 maximally strong, they can not contribute meaningfully to the prediction process through the context determined by {right arrow over (w)}, and samples selected as representative via activation might not be representative to the overall decision process of the model. Representative examples selected based on relevance, however, are guaranteed to play an important role in the model's decision process.
By utilizing relevance scores Ri(x|θ) instead of relying on activations only, the maximization target or
is class- (true, predicted or arbitrarily chosen, depending on θ), model-, and potentially concept-specific (depending on θ), as illustrated in
and
can occur, is depicted in
We propose a simple but qualitatively effective method for comparing filters in terms of activations based on reference samples, for grouping similar concepts in CNN layers. Based on the notation in previous sections, denotes a set of k reference images for a channel q in layer l and zql(W, xm) the ReLU-activated outputs of channel q in layer l for a given input sample xm with all required network parameters W for its computation. Specifically, for each channel q and its associated full-sized (i.e. not cropped to the channels' filters' receptive fields) reference samples xm∈
we compute zmq=zql(W, xm), as well as zmp=zpl(W, xm) for all other channels p≠q, by executing the forward pass, yielding activation values for all spatial neurons for the channels. We then define the averaged cosine similarity ρqp between two channels q and p in the same layer l as
Note that we symmetrize ρqp in Equation (18) as the cosine similarities cos(ϕ)qp and cos(ϕ)pq are in general not identical, due to the potential dissimilarities in the reference sample sets and
. Thus, cos(ϕ)qp measures the cosine similarity between filter q and filter p wrt. the reference samples representing filter q. The from Equation (18) resulting symmetric similarity measures ρqp=ρpq∈[0,1] can now be clustered, and visualized via a transformation into a distance measure dqp=1−ρqp serving as an input to t-SNE [44] which visually clusters similar filters together in, typically,
. Note that
normally, the output value of the cosine distance covers the interval [−1,1], where for −1 the two measured vectors are exactly opposite to one another, for 1 they are identical and for 0 they are orthogonal. In case output channels of dense layers are analyzed, i.e. scalar values, the range of output values reduces to the set {−1,0,1}, as both values are either of same or different signs, or at least one of the values is zero. Since we are processing layer activations after the ReLU nonlinearities of the layer, which yields only positive values for zmq and zmp. This results in ρpq∈[0,1], and a conversion to a canonical distance measure dqp∈[0,1].
In general, the filtered reverse propagation disclosed herein, e.g. the CRP algorithm, may allow gaining more detailed insight into the inference compared to previous methods. While reverse propagating an inference result from the predictor output to the input data structure may limit the investigation to a predefined concept associated with the inference result, filtering the reverse propagation allows revealing concepts associated with predictor portions, which are not predefined in the model training, but which develop during model training, i.e. which are developed by the predictor itself.
Furthermore, by filtering the reverse propagation, for a concept associated with a specific inference result, sub-concepts on which the concept is built, may be revealed. In other words, the filtered reverse propagation may provide details about a concept of an inference result, and may thus lead to explanations that are more detailed.
In the following, advantages of the invention with respect to some applications are described.
A General Application would be to use the Relevance Score Assignment (RS assignment) proposed here as part of a larger, more complex algorithm (CA). One can think of situations where it is very expensive to apply algorithm CA, so our RS assignment could define some regions of interest where algorithm CA could be applied. For example,
Further, in the Image Application field,
In the Video Application field,
In the case of Text Applications,
In the case of Financial Data Applications,
In the field of Marketing/Sales,
In the Linguistics/Education field
In the above description, different embodiments have been provided for assigning relevance scores to a set of items. For example, examples have been provided with respect to pictures. In connection with the latter examples, embodiments have been provided with respect to a usage of the relevance scores, namely in order to highlight relevant portions in pictures using the relevance scores, namely by use of a conditional heatmap which may be overlaid with the original picture. In the following, embodiments, which use or exploit the relevance scores are presented, i.e. embodiments which use the above-described relevance score assignment as a basis.
For the sake of completeness,
The processing performed by processing apparatus 1102 in
Alternatively, the processing performed by apparatus 1102 may represent a data replenishment. For example, the data replenishment may refer to a reading from a memory. As another alternative, the data replenishment may involve a further measurement. Imagine, for example, that set 16 is again an ordered collection, i.e. is a feature map belonging to a picture 1106, is a picture itself or a video. In that case, processing apparatus 1102 could derive from the relevance scores Ri in information of an ROI, i.e. a region of interest, and could focus the data replenishment onto this ROI so as to avoid performing data replenishment with respect to the complete scene which set 16 refers to. For instance, the first relevance score assignment could be performed by apparatus 10 on a low resolution microscope picture and apparatus 1102 could then perform another microscope measurement with respect to a local portion out of the low resolution microscope picture for which the relevance scores indicate a high relevance. The processing result 1104 would accordingly be the data replenishment, namely the further measurement in the form of a high resolution microscope picture.
Thus, in the case of using system 1100 of
In other words,
According to an embodiment, the processing is a lossy processing and the apparatus for processing is configured to decrease a lossiness of the lossy processing for portions of the data structure having higher relevance scores assigned therewith than compared to portions of the data structure having lower relevance scores assigned therewith.
According to an embodiment, the processing is a visualizing, wherein the apparatus for adapting is configured to perform a highlighting in the visualization depending on the relevance scores.
According to an embodiment, the processing is a data replenishment by reading from memory or performing a further measurement wherein the apparatus 1102 for processing is configured to focus the data replenishment depending on the relevance scores.
However, the relevance graph 1114 may, alternatively, be represented in the form of a histogram or the like. A graph generator 1112 may include a display for displaying the relevance graph 1114. Beyond this, graph generator 1112 may be implemented using software such as a computer program which may be separate to or included within a computer program implementing relevance score assigner 10.
As a concrete example, imagine that the set 16 of items is an image. The pixel-wise relevance scores for each pixel obtained in accordance with the assigner may be discretized/quantized into/onto a set of values and the discretization/quantization indices may be mapped onto a set of colors. The mapping may be done in graph generator 1112. The resulting assignment of pixels to colors, i.e. such as an “heatmap” in case of the relevance-color mapping following some CCT (color temperature)-measure for the colors, can be saved as an image file in a database or on a storage medium or presented to a viewer by generator 1112.
Alternatively the assignment of pixels to colors can be overlaid with the original image. In that case, the processor 1102 of
It has already been outlined above with respect to
or a quantile of the pixel-wise scores for the pixels of the region could be computed. The data set, e.g. the video, would then be subject to a compression algorithm by processor 1102 for which the compression rate can be adjusted for regions according to the computed score. A monotonous (falling or rising) mapping of region scores to compression rates could be used. Each of the regions would then be encoded according to the mapping of the region scores to compression rates.
Further, the processor 1102 could act as follows in case of an image as the set 16: The just outlined segmentation could be applied to the set of scores for all pixels or to an overlay image or to the color map, and segments corresponding to regions with scores which are very high or to regions with scores which have large absolute values, may be extracted. The processor may then present these co-located segments of the original image 16 to a human or another algorithm for checking of content for possibility of conspicuous or anomalous content. This could be used, for example, in security guard applications. Likewise, the set 16 could be a video. The whole video, in turn, is composed of a set of frames. An item in the set 16 of items could be a frame or a subset of frames or a set of regions from a subset of frames as already stated above. Spatio-temporal video segmentation could be applied to the relevance score assignment to the items, as to find spatio-temporal regions with either high average scores for the items or high average absolute values of scores for the items. As mentioned above, the average scores assigned to items within a region could be measured for example using a p-mean or a quantile estimator. The spatio-temporal regions with highest such scores, such as scores above some threshold, can be extracted by processor 1102 (for example by means of image or video segmentation) and presented to a human or another algorithm for checking of content for possibility of conspicuous or anomalous content. The algorithm for checking could be included in the processor 1102, or could be external thereto with this being true also for the above occasions of mentioning the checking of regions of high(est) score.
In accordance with an embodiment, the just-mentioned spatio-temporal regions with highest such scores are used for the purpose of training improvement for predictions made on videos. As stated, the set 16 of items is the whole video which can be represented by a set of frames. An item in the set of items is a frame or a subset of frames or a set of regions from a subset of frames. Video segmentation is then applied to find spatio-temporal regions with either high average scores for the items or high average absolute values of scores for the items. Processor 1102 may select neurons of the neural network which are connected to other neurons such that via indirect connections above regions are part of the input of the selected neurons. Processor 1102 may optimize the neural network in the following way: given the input image and a neuron selected as above (for example by having direct or indirect inputs from regions with high relevance scores or high absolute values of them), processor 1102 tries to increase the network output or the square of the network output, or to decrease the network output by changing the weights of the inputs of the selected neuron and the weights of those neurons which are direct or indirect upstream neighbors of the selected neuron. Such a change can be done for example by computing the gradient of the neuron output for the given image with respect to the weights to be changed. Then the weights are updated by the gradient times a stepsize constant. Needless to say, that the spatio-temporal region may also be obtained by segmentation of pixel-wise scores, i.e. by using pixels as the items of set 16, with then performing the optimization which was outlined above.
Even alternatively, the relevance assignment may be applied to graph data consisting of nodes, and directed or undirected edges with or without weights; an item of set 16 would then be a subgraph, for example. An element-wise relevance score would be computed for each subgraph. A subgraph can be an input to a neural network for example if it is encoded as an integer by encoding nodes and their edges with weights by integer numbers while separating semantic units by integers which are reserved as stop signs. Alternatively, an item of set 16 for computing the relevance score per item could be a node. Then we compute item-wise relevance scores. After that a set of subgraphs with high average score could be found (the average score can be computed by p-mean
or by a quantile of the scores over the nodes) by graph segmentation. The scores for each node are discretized into a set of values and the discretization indices are mapped onto a set of colors. The resulting assignment of nodes and subgraphs to colors and/or the extracted subgraphs can be saved as a file in a database or on a storage medium or presented to a viewer.
In other words,
Further, it may be that the relevance Score Assignment process gives out a heatmap, and that same is analyzed with respect to e.g. smoothness and other properties. Based on the analysis, some action may be triggered. For example, a training of a neural network may be stopped because it captures the concepts “good enough” according to the heatmap analysis. Further it should be noted that the heatmap analysis result may be used along with the neural network prediction results, i.e. the prediction, to do something. In particular, relying on both heatmap and prediction results may be advantageous over relying on only the prediction results only because, for example, the heatmap may tell something about the certainty of the prediction. The quality of a neural network can be potentially evaluated by analysis the heatmap.
In other words,
Finally, it is emphasized that the proposed relevance propagation has primarily illustrated above with respect to networks trained on classification tasks, but without loss of generality, the embodiments described above may be applied to any network that assigns a score attributed to output units or output portions. These scores can be learned using other techniques such as regression or ranking.
Thus, in the above description, embodiments have been presented which embody a methodology which may be termed conditional relevance propagation that allows to understand neural network predictors. Different applications of this novel principle were demonstrated. For images it has been shown that pixel contributions can be visualized as heatmaps and can be provided to a human expert who can intuitively not only verify the validity of the classification decision, but also focus further analysis on regions of potential interest. The principle can be applied to a variety of tasks, classifiers and types of data i.e. is not limited to images, as noted above.
This section describes implementation alternatives for the embodiments described in sections A, B and C.
Although some aspects have been described as features in the context of an apparatus it is clear that such a description may also be regarded as a description of corresponding features of a method. Although some aspects have been described as features in the context of a method, it is clear that such a description may also be regarded as a description of corresponding features concerning the functionality of an apparatus.
Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software or at least partially in hardware or at least partially in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.
The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
In the foregoing Detailed Description, it can be seen that various features are grouped together in examples for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed examples require more features than are expressly recited in each claim. Rather, as the following claims reflect, subject matter may lie in less than all features of a single disclosed example. Thus the following claims are hereby incorporated into the Detailed Description, where each claim may stand on its own as a separate example. While each claim may stand on its own as a separate example, it is to be noted that, although a dependent claim may refer in the claims to a specific combination with one or more other claims, other examples may also include a combination of the dependent claim with the subject matter of each other dependent claim or a combination of each feature with other dependent or independent claims. Such combinations are proposed herein unless it is stated that a specific combination is not intended. Furthermore, it is intended to include also features of a claim to any other independent claim even if this claim is not directly made dependent to the independent claim.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
22177382.3 | Jun 2022 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2023/065138, filed Jun. 6, 2023, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 22 177 382.3, filed Jun. 6, 2022, which is incorporated herein by reference in its entirety. Embodiments of the present inventions relate to apparatuses and methods for handling, or analyzing, an inference performed by a machine learning (ML) predictor. Some embodiments relate to apparatuses and methods for providing information for revealing concepts or meaningfulness an inference performed by a ML predictor. Some embodiments relate to Human-Understandable Explanations of ML predictors through Concept Relevance Propagation.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2023/065138 | Jun 2023 | WO |
Child | 18966574 | US |