This invention relates to a robustness verification device, a robustness verification method, and a recording medium.
Adversarial training (AX) has been proposed as a countermeasure against attacks using adversarial examples. Adversarial training is the practice of learning by including adversarial examples in the training data in a case where learning a feature amount extraction model. By using a feature amount extraction model that has undergone adversarial training, it is expected that the output results are less likely to be affected by the input of adversarial examples.
For example, Non Patent Document 1 shows experimentally that adversarial training is effective against attacks on content-based image retrieval using adversarial examples.
Non Patent Document 1: Mo Zhou, et al. “Adversarial Ranking Attack and Defense,” The 2020 European Conference on Computer Vision (ECCV 2020), 2020.
Non Patent Document 1 shows Adversarial training that depends on the attack method in a case where the attack method is known, such as how the adversarial examples are generated.
On the other hand, there may be unknown attack methods against content-based image retrieval using adversarial examples, which should be addressed. It should be possible to verify the degree of impact on search results given an adversarial example and to ascertain the impact in a case where the attack method is unknown.
An example of an object of the present invention is to provide a robustness verification device, a robustness verification method, and a recording medium that can solve the above-mentioned problems.
According to the first example aspect of the present invention, a robustness verification device includes: a similar image identification means that uses the similarity between feature amounts obtained by a feature amount extractor to identify a similar image having a predetermined rank of similarity with respect to an input image within a candidate image group; a rank counting means that counts the rank of the similar image with respect to the input image in the candidate image group in a case where adversarial perturbation is applied to the image; a rank calculation means that calculates the rank of the similar image with respect to the input image in the candidate image group in a case where adversarial perturbation is not applied to the image; and a rank verification means that verifies whether or not the rank of the similar image counted by the rank counting means is within a predetermined range that includes the rank of the similar image calculated by the rank calculation means.
According to the second example aspect of the present invention, a robustness verification method includes: a step that uses the similarity between feature amounts obtained by a feature amount extractor to identify a similar image having a predetermined rank of similarity with respect to an input image within a candidate image group; a step that counts the rank of the similar image with respect to the input image in the candidate image group in a case where adversarial perturbation is applied to the image; a step that calculates the rank of the similar image with respect to the input image in the candidate image group in a case where adversarial perturbation is not applied to the image; and a step that verifies whether or not the rank of the similar image counted in a case where adversarial perturbation is applied to an image is within a predetermined range that includes the rank of the similar image calculated in a case where adversarial perturbation is not applied to an image.
According to the third example aspect of the present invention, a recording medium records a program for causing a computer to execute: a step that uses the similarity between feature amounts obtained by a feature amount extractor to identify a similar image having a predetermined rank of similarity with respect to an input image within a candidate image group; a step that counts the rank of the similar image with respect to the input image in the candidate image group in a case where adversarial perturbation is applied to the image; a step that calculates the rank of the similar image with respect to the input image in the candidate image group in a case where adversarial perturbation is not applied to the image; and a step that verifies whether or not the rank of the similar image counted in a case where adversarial perturbation is applied to an image is within a predetermined range that includes the rank of the similar image calculated in a case where adversarial perturbation is not applied to an image.
According to the robustness verification device, robustness verification method, and recording medium described above, it is possible to verify the degree of influence on search results in content-based image retrieval in a case where an adversarial example with adversarial perturbation is applied.
The following describes example embodiments of the present invention, but these example embodiments are not intended to limit the invention as claimed. Not all of the combinations of features described in the example embodiments are essential to the solution of the invention.
First, an example of a content-based image retrieval device 900 that is subject to robustness verification by a robustness verification device 100 (200) shall be described.
In the real world, content-based image retrieval (CBIR) is used in medical image retrieval systems, similar product retrieval systems, facial recognition systems, and others. Content-based image retrieval is a system that, given an input image q∈χ as a search query, finds an image ci∈C that is highly similar to q from a set of candidate images C={ci∈χ}(i=1 to N). Here, χ represents the input space of images.
In content-based image retrieval, a feature amount extraction model f is used, which is learned by a machine learning model such as Deep Metric Learning (DML), for example. For example, a feature amount extraction model f is a function f·χ→Rn from the input space of images to the n-dimensional vector space of real numbers representing feature amounts. Deep Metric Learning learns the feature amount extraction function f so that feature amounts can be computed such that the distance between images with high similarity is close and the distance between images with low similarity is far.
Content-based image retrieval outputs results based on the Euclidean distance dist(f(q), f(c)) of features amounts between the input image q and any candidate image c∈C. For example, content-based image retrieval outputs the top k candidate images c∈C with the smallest distance from the input image q as similar images of q.
In a case where an input image q∈χ is given as a search query the content-based image retrieval device 900 retrieves and outputs images similar to q from a set of candidate images C={ci∈χ}(i=1 to N). Here, χ represents the input space of images. The content-based image retrieval device 900 includes an image storage portion 902, a feature amount extraction portion 904, and a rank calculation portion 906.
The image storage portion 902 stores a group of candidate images (hereinbelow referred to as the candidate image group). C={ci∈χ}(i=1 to N) is stored. Each ci is called a candidate image. Note that the candidate image group C may be input to the content-based image retrieval device 900 without being stored in the storage portion.
The feature amount extraction portion 904 extracts the feature amount of the input image q and the image ciEC (i=1 to N) obtained from the image storage portion 902 using the feature amount extractor f. The feature amount extractor f is, for example, a function f·χ→Rn from the image space of images χ to an n-dimensional vector space of a real number representing the feature amount (hereinbelow referred to as the feature amount space). This feature amount extractor f is a function that has been pre-learned using a deep learning model such as deep distance learning, for example. Deep distance learning learns the feature amount extractor f to compute feature amounts such that the distance between images with high similarity is close and the distance between images with low similarity is far.
The rank calculation portion 906 calculates the Euclidean distance dist(f(q), f(ci)) between the extracted feature amount f(q) and each f(ci) (i=1 to N). Then, the rank calculation portion 906 outputs the predetermined number of images ci in order of decreasing distance as images similar to the input image q. The image that is j-th similar (j-th smallest distance) to the input image q is described as IR(q, C)j. IR stands for Image Retrieval.
The content-based image retrieval device that the robustness verification device 100 (200) targets for robustness verification is not limited to the content-based image retrieval device 900, as long as the feature amount extractor f can be used to rank the candidate image groups C with respect to the input image q.
The robustness verification device 100 (200) has as part of its input the input image q, the candidate image group C, and the feature amount extractor f, which are the parameters of the content-based image retrieval device 900.
Next, an explanation shall be given about the noise, which is a small adversarial perturbation intentionally added to images, as assumed by the robustness verification device 100 (200).
An Adversarial Example (AX) is known to be a serious problem for the security of machine learning models. An adversarial example is data that is created by intentionally adding minute perturbations that cause machine learning models, such as feature amount extraction models, to make incorrect decisions. Perturbations added by an adversarial example are referred to as adversarial perturbations. Machine learning models may output different classes or values in a case where inputted with adversarial examples compared to data without adversarial perturbations.
Attacks by adversarial examples are also possible against content-based image retrieval using feature amount extraction models. In this case, the adversarial perturbation is noise, etc. added to the image. Two potential threats to content-based image retrieval by adversarial examples are the query attack and the candidate attack. A query attack is one that manipulates the output of content-based image retrieval by entering an adversarial example as an input image, which is the search query. A candidate attack is one that manipulates the output of content-based image retrieval by inputting adversarial examples as candidate images. Both attacks are accomplished by manipulating the output of the feature amount extraction model with adversarial examples.
An example of an attack using an adversarial example would be to give priority to recommending one's own products in a similar product search system used for online sales using content-based image retrieval. Another example would be impersonation of another person's face in a face recognition system using content-based image retrieval.
The robustness verification device 100 (200) verifies that, given the input space of images X, the rank of the input image q and the j-th nearest image IR(q, C)j in the candidate image group C varies only at most α, even if the image x∈χ is given noise δ∈χ with a radius of or less in the infinity norm L∞. That is, the robustness verification device 100 (200) verifies that the rank of image IR(q, C)j varies only at most α even if image x is a noise-laden image x+δ for ∀δ∈{δ∈χ|δ∞≤ε}. The image x to which noise is added is the input image q in the case of the robustness verification device 100 and any candidate image ci∈C in the case of the robustness verification device 200.
Next, the first example embodiment of the present invention shall be described. The first example embodiment is a robustness verification device 100 that verifies the robustness of the content-based image retrieval device 900 against query attacks.
Let q be the input image that is the query of the content-based image retrieval device 900, and q+δ be the input image with noise δ of magnitude ε or less. Let IR(q, C)j be the j-th similar image to q that the content-based image retrieval device 900 retrieves from the candidate image group C={ci∈χ}(i=1 to N) in a case where the input image q is input.
In this case, the robustness verification device 100 verifies whether the rank of IR(q, C)j does not change so much in a case where the input image q+δ is input to the content-based image retrieval device 900. Specifically, the robustness verification device 100 verifies whether the rank of IR(q, C)j varies only by at most α in the target image group Cβ with low similarity to IR(q, C)j, even if the input image q+δ is input to the content-based image retrieval device 900.
[(α, β)-robustness verification against query attack]
First, (α, β)-robustness verification, which is a fundamental concept in a case where verifying robustness with the robustness verification device 100, shall be explained.
The (α, β)-robustness verification against query attacks is defined as follows.
Let α be a natural number greater than or equal to 0 and let β be a real number greater than or equal to 0. At this time, with respect to
IR(q, C)j being (α, β)-robustly verified means that
holds true. Here,
is.
Expression (1) represents the range of noise δ imparted to the input image. χ represents the input space of images. “δ∈χ” denotes that δ is also an element of the input space of images. “δp” denotes the infinity norm L∞ of δ. “δp≤ε” indicates that the magnitude of δ is less than or equal to ε in a case where the infinity norm L∞ of δ is taken. This is illustrated in
Expression (3) expresses the set of images CB (hereafter referred to as the target image group) subject to the robustness verification in Expression (2). “{IR(q, C)j}” indicates that IR(q, C)j (the candidate image j-th similar to q that the content-based image retrieval device 900 retrieves from the candidate image group C in a case where input image q is input) is included in Cβ.
For “{c|c∈C, β≤f(c)−f(IR(q, C)j)q}”, first, “c∈C” represents the condition that c is an element of the candidate image group C. “f(c)” represents the feature amount extracted by the feature amount extractor f for image c. This feature amount extractor f is the feature amount extractor of the content-based image retrieval device 900. “{β≤f(c)−f(IR(q, C)j)q}” represents the condition that the magnitude in the q-norm of the difference between the feature amount of image IR(q, C)j and the feature amount of image c is greater than or equal to β. That is, such an image c is included in Cβ. β is a parameter that determines the candidate images considered for variation in ranking. β represents that variation in ranking with images similar to IR(q, C)j is acceptable, by not including images with a distance difference less than β in Cβ. The larger the value of β, the easier it is to achieve (α, β)-robustness verification. Note that the q-norm can be any of 1, 2, p, or infinity norms.
Expression (2) represents the specific conditions for (α, β)-robustness verification. Rank (q, c, C) represents the rank of image c in the candidate image group C with respect to similarity using the feature amount extractor f in a case where the input image is q. Therefore, “Rank (q+δ, IR(q, C)j, Cβ)” represents the rank of image IR(q, C)j in the target image group Cβ calculated by Expression (3), with q+δ being the input image. “Rank (q, IR(q, C)j, Cβ)” represents the rank of image IR(q, C)j in the target image group Cβ calculated by Expression (3), where q is the input image. Therefore, Expression (2) expresses the condition that the rank of image IR(q, C)j in the target image group Cβ calculated by Expression (3) in a case where the input image is q+δ varies only at most α from the rank of image IR(q, C)j in a case where the input image is q. α is a parameter indicating the amount of variation in rank that is acceptable, i.e., that a ranking variation of at most α is permissible. The larger α is, the easier it is to verify (α, β)-robustness.
As shown in
Since the ranking by the content-based image retrieval device 900 using the input image q before noise is applied is in order of proximity to f(q) in the images included in Cβ, the images are ranked in the order of
On the other hand, since the ranking by the content-based image retrieval device 900 using the input image q+δ after the noise δ is applied is in order of proximity to f(q+δ) in the images included in Cβ, the images are ranked in the order of
Thus, in Expression (2), since Rank(q+δ, IR(q, C)j, Cβ)=3, and Rank(q, IR(q, C)j, Cβ)=1, Expression (2) is 1−α≤3≤1+α. Therefore, if α≥2, IR(q, C)j is (α, β)-robustness verified, and if α=0 or 1, IR(q, C)j is not (α, β)-robustness verified.
The robustness verification device 100 performs the (α, β)-robustness verification described above, but accurate computation of (α, β)-robustness verification is difficult due to computational complexity issues. In other words, it is difficult for the robustness verification device 100 to verify Expression (2) for any δ that satisfies (1). Therefore, the robustness verification device 100, with respect to
utilizes the ability to calculate the upper and lower limits of d(f(q+δ), f(c)) with minimal computational effort. Here, q is the input image while c is an element of the target image group C.
In other words, the robustness verification device 100 calculates the lower and upper limits that satisfy
Here, “d(f(q+δ), f(c))” represents the Euclidean distance d(f(q+δ), f(c)) between the feature amount f(q+δ) of the input image q with noise δ and the feature amount f(c) of image c.
The robustness verification device 100 performs calculations using, for example, the well-known technique Interval Bound Propagation (IBP), described in the following non patent document. IBP is a method for computing the upper and lower limits of each element i of the feature amount f(q+δ) by sequentially computing the upper and lower limits of each element of the intermediate layer representation in each layer in a case where an image q+δ with noise δ added is input for noise δ∈{δ|δ∞≤ε} with a magnitude in the infinity norm equal to or less than ε. Here, i represents the i-th element (1≤i≤n) of the feature amount, assuming that the feature amount is an n-dimensional vector.
The robustness verification device 100 uses IBP to calculate the upper limit
In Expression (6), “|
In Expression (7), “
In a case where the robustness verification device 100 calculates the upper and lower limits of Expression (5) using the IBP-based calculation method described above, the norm in Expression (4) is the infinity norm.
The robustness verification device 100 may also calculate the upper and lower limits of d(f(q+δ), f(c)) using calculation methods other than IBP, in which case the norm in Expression (4) is not limited to the infinity norm.
The robustness verification device 100 performs (α, β)-robustness verification using the upper limit
The rank verification portion 108 corresponds to an example of a rank verification means.
The similar image identification portion 102 receives the input image q, candidate image group C, feature amount extractor f, and rank j, and outputs the image IR(q, C)j that is the j-th most similar to the input image q in the candidate image group C. Specifically, the similar image identification portion 102 uses the feature amount extractor f to calculate the feature amounts f(q), f(ci) (i=1 to N) of the input image q and each candidate image ci∈C. Then, the similar image identification portion 102 calculates the Euclidean distance dist(f(q), f(ci)) between the feature amount f(q) and each f(ci). The similar image identification portion 102 then outputs the image that is j-th similar (j-th smallest distance) to the input image q as IR(q, C)j. The similar image identification portion 102 corresponds to the search of the content-based image retrieval device 900.
The similar image identification portion 102 is an example of a similar image identification means.
IR(q, C)j, the candidate image group C, the feature amount extractor f, and the parameter β are input to the comparison target image calculation portion 104, which calculates the target image group Cβ, which is the set of images subject to robustness verification as shown in Expression (8).
Specifically, the comparison target image calculation portion 104 includes IR(q, C)j in Cβ. The comparison target image calculation portion 104 calculates the feature amounts f(IR(q, C)) and f(cj) for IR(q, C)j and each target image c of the candidate image group C by the feature amount extractor f. Then, the comparison target image calculation portion 104 determines whether “f(c)−f(IR(q, C)j)q”, the magnitude of the difference in the q-norm of the feature amounts, is equal to or greater than β. If β is greater than or equal to β, the target image c is included in Cβ. The comparison target image calculation portion 104 includes the target image c in Cβ.
Note that β is a parameter that determines the candidate images to be considered for ranking variation. B represents that variation in ranking with images similar to IR(q, C)j is acceptable by not including images with a distance difference less than β in Cβ. The larger the value of β, the easier it is to achieve (α, β)-robustness verification. Note that the q-norm can be any of 1, 2, p, or infinity norms.
The comparison target image calculation portion 104 corresponds to an example of a comparison target image calculation means.
The target image group Cβ to be verified for robustness, the input image q, the feature amount extractor f, and the perturbation size e are input to the upper limit/lower limit calculation portion 106, which, for each target image c∈Cβ, calculates the upper limit
Specifically, the upper limit/lower limit calculation portion 106 uses the aforementioned IBP to calculate, for each target image c∈Cβ, the upper limit
The method by which the upper and lower limits of d(f(q+δ), f(c)) are calculated by the upper limit/lower limit calculation portion 106 is not limited to IBP, and other methods may be used.
The upper limit/lower limit calculation portion 106 is an example of the upper limit/lower limit calculation means.
The rank verification portion 108 receives as input the input image q, the image IR(q, C)j, the target image group Cβ that is subject to robustness verification, the upper limit
The conditions for (α, β)-robustness verification performed by the rank verification portion 108 are not based on the definition in Expression (2), but on the upper and lower limits of d(f(q+δ), f(c)). Specifically, the rank verification portion 108 verifies whether or not the following Expressions (9) and (10) are satisfied.
The rank calculation portion 110 of the rank verification portion 108 finds “Rank(q, IR(q, C)j, Cβ)” in Expressions (9) and (10). “Rank (q, IR(q, C)j, Cβ)” represents the rank of image IR(q, C)j in the target image group Cβ calculated by Expression (8) in a case where the input image is q. Specifically, the rank calculation portion 110 calculates the rank by using the feature amount extractor f to find the Euclidean distance between each feature amount f(q), f(IR (q, C)j), and f(c) for q, IR(q, C)j, ∀c∈Cβ. The rank calculation portion 110 is an example of a rank calculation means.
The rank counting portion 112 of the rank verification portion 108 calculates the right side of Expression (9) and the left side of Expression (10).
The rank counting portion 112 first calculates the right side of Expression (9). “
The rank counting portion 112 counts “1[
The rank counting portion 112 then calculates the left side of Expression (10). “
The rank counting portion 112 counts “1[
The rank counting portion 112 is an example of a rank counting means.
The rank verification portion 108 verifies whether the value of the right side of the calculated Expression (9) is equal to or greater than “Rank(q, IR(q, C)j, Cβ)−α”, and the value of the left side of the calculated Expression (10) is equal to or less than “Rank(q, IR(q, C)j, Cβ)+α”. Here, α is a parameter indicating the amount of variation in ranking that is acceptable, i.e., that a ranking variation of at most α is permissible. The larger α is, the easier it is to verify (α, β)-robustness.
If the condition is satisfied, the rank verification portion 108 outputs that (α, β)-robustness is verified, and if the condition is not satisfied, it outputs that (α, β)-robustness is not verified.
If the conditions in Expressions (9) and (10), which are verified by the rank verification portion 108, hold, then the conditions in Expression (2) of the (α, β)-robustness verification are known to hold (sufficient conditions). Thus, the conditions in Expressions (9) and (10) mean that the rank of image IR(q, C)j in the target image group Cβ in a case where the input image is q+δ varies only at most α with respect to the rank of image IR(q, C)j in a case where the input image is q.
Because the upper limit/lower limit calculation portion 106 of the robustness verification device 100 uses the upper and lower limits of d(f(q+δ), f(c)), according to the definition of (α, β)-robustness verification, there is a possibility that inputs which would originally be verified may be deemed as not verified in the robustness verification device 100.
Next, the operation of the robustness verification device 100 is described with reference to
First, the robustness verification device 100 receives as input input image q∈χ, which is the query candidate image group C={ci∈χ}(i=1 to N), feature amount extractor f, perturbation size e, parameters α, β, and rank j (Step S101).
Next, the similar image identification portion 102 identifies the image IR(q, C)j that is the j-th most similar to the input image q in the candidate image group C. Specifically, the similar image identification portion 102 uses the feature amount extractor f to calculate the feature amounts f(q), f(ci) (i=1 to N) of the input image q and each candidate image ci∈C, and calculates the Euclidean distance dist (f(q), f(ci)) between the feature amounts f(q) and each f(c). Then, the similar image identification portion 102 identifies the image with the j-th smallest distance from the input image q as IR(q, C)j (Step S102).
Next, the comparison target image calculation portion 104 selects the target image group Cβ, which is the set of images to be subject to robustness verification. Specifically, the comparison target image calculation portion 104 includes IR(q, C)3 in Cβ. The comparison target image calculation portion 104 calculates the feature amounts f(IR(q, C)j) and f(ci) for IR(q, C)j and each target image c of the candidate image group C. Then, the comparison target image calculation portion 104 includes that target image in Cβ if “f(c)−f(IR(q, C)j)q”, the magnitude of the difference in the q-norm of the feature amounts, is equal to or greater than β (Step S103).
Next, for each target image c in the target image group Cβ, the upper limit/lower limit calculation portion 106 calculates the upper limit
Next, the rank calculation portion 110 of the rank verification portion 108 calculates Rank(q, IR(q, C)j, Cβ). Specifically, the rank calculation portion 110 calculates the rank by finding the Euclidean distance between each feature amount f(q), f(IR(q, C)j), f(ci) of q, IR(q, C)j, ∀c∈Cβ using the feature amount extractor f (Step S105).
Next, the rank counting portion 112 of the rank verification portion 108 calculates the right side of Expression (9). That is, the rank counting portion 112 counts “1[
Next, the rank verification portion 108 verifies whether the value of the right side of the calculated Expression (9) is equal to or greater than “Rank(q, IR(q, C)j, Cβ)−α”, and the value of the left side of the calculated Expression (10) is equal to or less than “Rank(q, IR(q, C)j, Cβ)+α”. If the condition holds, the rank verification portion 108 outputs that (α, β)-robustness is verified, and if the condition does not hold, it outputs that (α, β)-robustness is not verified (Step S107).
After Step S107, the robustness verification device 100 ends the process in
The robustness verification device 100 may not perform (α, β)-robustness verification only for a specific rank j, and may perform (α, β)-robustness verification for multiple j, or for all j with 1≤j≤N.
As explained above, the similar image identification portion 102 identifies similar images IR(q, C)j. The comparison target image calculation portion 104 calculates the target image group Cβ to be subject to robustness verification. The upper limit/lower limit calculation portion 106 calculates the upper and lower limits of d(f(q+δ), f(c)) for each target image. The rank verification portion 108 verifies the conditions in Expressions (9) and (10).
Thereby, the robustness verification device 100 can perform (α, β)-robustness verification, i.e., can verify that the rank of image IR(q, C)j in the target image group Cs in a case where the input image is q+δ varies only at most α with respect to the rank of image IR(q, C)j in a case where the input image is q. In other words, the robustness verification device 100 can verify whether, in content-based image retrieval, the search results are not affected even if an adversarial example in which adversarial perturbation is applied to an input image that is a query is applied.
For each target image c in the target image group Cβ subject to robustness verification, the upper limit
In addition, the comparison target image calculation portion 104 uses the parameter δ to allow for variation in ranking in a case where determining the target image group Cβ.
This allows the robustness verification device 100 to adjust the accuracy of the verification, such as making (α, β)-robustness verification easier as β increases.
The rank verification portion 108 also uses the parameter α to determine the amount of rank that is acceptable.
This allows the robustness verification device 100 to adjust the accuracy of the verification, such as the larger α is, the easier it is for (α, β)-robustness verification to be performed.
Next, the second example embodiment of the present invention shall be described. The second example embodiment is a robustness verification device 200 that verifies the robustness of the content-based image retrieval device 900 against candidate attacks.
Let q be the input image that is the query of the content-based image retrieval device 900. Let IR(q, C)j be the j-th similar image to q that the content-based image retrieval device 900 retrieves from the candidate image group C={ci∈χ}(i=1 to N) in a case where the input image q is input. The candidate image group to which noise δi(i=1 to N) of magnitude ε or less is added is denoted as ˜C{ci+δi|ci ∈C}(i=1 to N).
In this case, the robustness verification device 200 verifies whether the rank of IR(q, C)j does not vary that much, even if the candidate image group of the content-based image retrieval device 900 is the noise-added candidate image group ˜C. Specifically the robustness verification device 200 verifies whether the rank of IR(q, C)j varies only at most α from the ranking j in a case where the candidate image group is C even if the candidate image group of the content-based image retrieval device 900 is ˜C.
First, α-robustness verification, which is a fundamental concept in verifying robustness with the robustness verification device 200, shall be described.
α-robustness verification against query attacks is defined as follows.
Let α be a natural number greater than or equal to 0. At this time, with respect to
IR(q, C)j is considered to be α-robustness verified if
holds true. Here,
is.
Expression (11) represents the range of noise δi applied to each image ci(i=1 to N) of the candidate image group C. χ represents the input space of images. “δ∈χ” denotes that δ is also an element of the input space of images. “δp” denotes the infinity norm L∞ of δ. “δp≤ε” indicates that the magnitude of δ is less than or equal to ε in a case where the infinity norm L∞ of δ is taken. This is illustrated in
Expression (13) expresses the set of images ˜C (hereafter referred to as the target image group) subject to the robustness verification in Expression (12). “˜C={ci+δi|ci∈C}(i=1 to N)” indicates that for each candidate image ci of the candidate image group C, the image ci+δi with any noise δi is an element of the target image group ˜C. Note that, β, a parameter that determines the candidate images to be considered for ranking changes, is not introduced, as in the case of query attacks. This is because the robustness verification in a candidate attack assumes that noise can ride on any candidate image ci (i=1 to N) in the candidate image group C. In other words, this is because, assuming the feature amount extractor f, since f(ci) (i=1 to N) can be modified by noise, it is not possible to exclude similar images based on distance in the feature amount space, as in the case of a query attack.
Expression (12) represents the specific condition for α-robustness verification. Rank (q, c, C) represents the rank of image c within the candidate image group C with respect to similarity using the feature amount extractor f in a case where the input image is q. The feature amount extractor f is the feature amount extractor of the content-based image retrieval device 900. Therefore, “Rank (q, IR(q, C)j, ˜C)” represents the rank of image IR(q, C)j in the target image group ˜C obtained by Expression (13), where the input image is q. Also, “j” stands for Rank (q, IR(q, C)j, C), which is the rank j of image IR(q, C)j in a case where the input image in candidate image group C is q. Therefore, Expression (12) expresses the condition that the rank of image IR(q, C)j in the target image group ˜C calculated by Expression (13) in a case where the input image is q varies only at most α with respect to the rank of image IR(q, C)j in a case where the input image is q. α is a parameter indicating the amount of variation in rank that is acceptable, i.e. that a ranking variation of at most α is acceptable. The larger α is, the easier it is to verify α-robustness.
The robustness verification device 200 performs the α-robustness verification described above, but accurate computation of α-robustness verification is difficult due to computational complexity issues. In other words, it is difficult for the robustness verification device 200 to verify Expression (12) for any δ that satisfies Expression (11). Therefore, the robustness verification device 200, with respect to
utilizes the ability to calculate the upper and lower limits of d(f(q), f(c+δ)) with minimal computational effort. Here, q is the input image while c is an element of the candidate image group C.
In other words, the robustness verification device 200 calculates the lower and upper limits that satisfy
where “d(f(q), f(c+δ))” is the Euclidean distance d(f(q), f(c+δ)) between the feature amount f(q) of the input image q and the feature amount f(c+δ) of image c with noise δ. “
The robustness verification device 200 performs calculations using, for example, the well-known technique Interval Bound Propagation (IBP), described in the aforementioned non patent document. IBP is a method for computing the upper and lower limits of each element i of the feature amount f(x+δ) by sequentially computing the upper and lower limits of each element of the intermediate layer representation in each layer in a case where an image x+S with noise S added is input for noise δ∈{δ|δ∥∞≤ε} with a magnitude in the infinity norm equal to or less than ε. Here, i represents the i-th element (1≤i≤n) of the feature amount, assuming that the feature amount is an n-dimensional vector.
The robustness verification device 200 uses IBP to calculate the upper limit
In Expression (16), “|
In Expression (17), “
In a case where the robustness verification device 200 calculates the upper and lower limits of Expression (15) using the IBP-based calculation method described above, the norm in Expression (14) is the infinity norm.
The robustness verification device 200 may also calculate the upper and lower limits of d(f(q), f(c+δ)) using calculation methods other than IBP, in which case the norm in Expression (4) is not limited to the infinity norm.
The robustness verification device 200 performs α-robustness verification by using the upper limit
The similar image identification portion 202 receives the input image q, candidate image group C, feature amount extractor f, and rank j, and outputs the image IR(q, C)j that is the j-th most similar to the input image q in the candidate image group C. Specifically, the similar image identification portion 202 uses the feature amount extractor f to calculate the feature amounts f(q), f(ci) (i=1 to N) of the input image q and each candidate image ciEC. Then, the similar image identification portion 202 calculates the Euclidean distance dist(f(q), f(ci)) between the feature amount f(q) and each f(ci). The similar image identification portion 202 then outputs the image that is j-th similar (j-th smallest distance) to the input image q as IR(q, C)j. The similar image identification portion 202 corresponds to the search of the content-based image retrieval device 900.
The candidate image group C, the input image q, the feature amount extractor f, and the perturbation size e are input to the upper limit/lower limit calculation portion 206, which, for each target image c∈C, calculates the upper limit
Specifically, the upper limit/lower limit calculation portion 206 uses the aforementioned IBP to calculate, for each target image c∈Cβ, the upper limit
The method by which the upper and lower limits of d(f(q), f(c+δ)) are calculated by the upper limit/lower limit calculation portion 206 is not limited to IBP, and other methods may be used.
The rank verification portion 208 receives as input the input image q, the image IR(q, C)j, the candidate image group C, the upper limit
The conditions for α-robustness verification performed by the rank verification portion 208 are not based on the definition in Expression (12), but on the upper and lower limits of d(f(q), f(c+δ)). Specifically, the rank verification portion 208 verifies whether or not the following Expressions (18) and (19) are satisfied.
The rank counting portion 212 of the rank verification portion 208 calculates the right side of Expression (18) and the left side of Expression (19).
The rank counting portion 212 first calculates the right side of Expression (18). “
The rank counting portion 212 counts “1[
The rank counting portion 212 then calculates the left side of Expression (19). “
The rank counting portion 212 counts “1[
The rank verification portion 208 verifies whether the value of the right side of the calculated Expression (18) is equal to or greater than “j−α”, and the value of the left side of the calculated Expression (19) is equal to or less than “j+α”. Note that “j” is Rank(q, IR(q, C)j, C), which is the rank of image IR(q, C)j in terms of similarity with input image q in a case where no noise is added to candidate image group C. Here, α is a parameter indicating the amount of variation in ranking that is permitted, i.e., that a ranking variation of at most α is permissible. The larger α is, the easier it is to verify α-robustness.
If the condition holds, the rank verification portion 208 outputs that α-robustness is verified, and if the condition does not hold, it outputs that α-robustness is not verified.
If the conditions in Expressions (18) and (19), which are verified by the rank verification portion 208, hold, then the condition in Expression (12) of the α-robustness verification is known to hold (sufficient conditions). Accordingly, the conditions in Expressions (18) and (19) mean that, in a case where the target image group with noise added to the candidate image group C is denoted as ˜C, the rank of image IR(q, C)j in the target image group ˜C in a case where the input image is q varies only at most α with respect to the rank j of image IR(q, C)j in a case where the input image is q. Note that since the upper limit/lower limit calculation portion 206 of the robustness verification device 200 uses the upper and lower limits of d(f(q), f(c+δ)), according to the definition of α-robustness verification, there is a possibility that inputs which would originally be verified may be deemed as not verified in the robustness verification device 200.
As with the robustness verification device 100, the rank verification portion 208 of the robustness verification device 200 may include a rank calculation portion 210. In this case, the rank calculation portion 210 outputs the input “j” to the robustness verification device 200 as it is. “j” is Rank(q, IR(q, C)j, C), which is the rank of image IR(q, C)j in terms of similarity with input image q in a case where no noise is added to candidate image group C.
Next, the operation of the robustness verification device 200 shall be described with reference to
First, the robustness verification device 200 receives as input input image q∈χ, which is the query candidate image group C={ci∈χ}(i=1 to N), feature amount extractor f, perturbation size e, parameter α, and rank j (Step S201).
Next, the similar image identification portion 202 identifies the image IR(q, C)j that is the j-th most similar to the input image q in the candidate image group C. Specifically, the similar image identification portion 202 uses the feature amount extractor f to calculate the feature amounts f(q), f(ci) (i=1 to N) of the input image q and each candidate image ci∈C, and calculates the Euclidean distance dist (f(q), f(ci)) between the feature amounts f(q) and each f(ci). Then, the similar image identification portion 202 identifies the image with the j-th smallest distance from the input image q as IR(q, C), (Step S202).
Next, for each target image c in the candidate image group C, the upper limit/lower limit calculation portion 206 calculates the upper limit
Next, the rank counting portion 212 of the rank verification portion 208 calculates the right side of Expression (18). That is, the rank counting portion 212 counts “1[
Next, the rank verification portion 208 verifies whether the value of the right side of the calculated Expression (18) is equal to or greater than “j−α”, and the value of the left side of the calculated Expression (19) is equal to or less than “j+α”. If the condition holds, the rank verification portion 208 outputs that α-robustness is verified, and if the condition does not hold, it outputs that α-robustness is not verified (Step S205).
After Step S205, the robustness verification device 200 ends the process in
The robustness verification device 200 may not perform α-robustness verification only for a specific rank j, and may perform α-robustness verification for multiple j, or for all j with 1≤j≤N.
As explained above, the similar image identification portion 202 identifies similar images IR(q, C)j. The upper limit/lower limit calculation portion 206 calculates the upper and lower limits of d(f(q), f(c+δ)) for each target image. The rank verification portion 208 verifies the conditions in Expressions (18) and (19).
Thereby, the robustness verification device 200 can perform α-robustness verification, i.e., can verify, in a case where the target image group with noise added to the candidate image group C is denoted as ˜C, that the rank of image IR(q, C)j in the target image group ˜C in a case where the input image is q varies only at most α with respect to the rank j of image IR(q, C)j in a case where the input image is q. That is, the robustness verification device 200 can verify the degree of influence on search results in content-based image retrieval in a case where an adversarial example to which adversarial perturbation is added is applied.
For each target image c in the candidate image group C, the upper limit/lower limit calculation portion 206 calculates the upper limit
This allows the robustness verification device 200 to perform α-robustness verification with a small amount of computation (practical computation time).
The rank verification portion 208 also uses the parameter α to determine the amount of variation in rank that is acceptable.
This allows the robustness verification device 200 to adjust the accuracy of the verification, such as the larger α is, the easier it is for α-robustness verification to be performed.
In such a configuration, the similar image identification portion 411 identifies similar images in the candidate image group that have a predetermined rank in similarity to the input image using the similarity between the features by the feature amount extractor. The rank counting portion 412 counts the rank of similar images to the input image in the candidate image group in a case where adversarial perturbation is applied to the image. The rank counting portion 413 calculates the rank of similar images to the input image in the candidate image group in a case where adversarial perturbation is not applied to the image. The rank verification portion 414 verifies whether the rank of similar images counted by the rank counting portion 412 is within a predetermined range including the rank of similar images calculated by the rank calculation portion 413.
The similar image identification portion 411 corresponds to an example of a similar image identification means. The rank counting portion 412 corresponds to an example of a rank counting means. The rank calculation portion 413 corresponds to an example of a rank calculation means. The rank verification portion 414 corresponds to an example of a rank verification means.
The robustness verification device 410 can verify α-robustness, i.e., that the rank of a similar image with respect to the input image in the target image group to which noise is added varies only at most α relative to the rank of a similar image to the input image in the candidate image group. That is, the robustness verification device 410 can verify the degree of influence on search results in content-based image retrieval in a case where an adversarial example to which adversarial perturbation is added is applied.
In identifying similar images (Step S411), the similarity between features by the feature amount extractor is used to identify a similar image in the candidate image group having a predetermined rank in similarity to the input image. In counting the rank (Step S412), the ranks of similar images relative to the input image in the candidate image group are counted in a case where the image is subjected to adversarial perturbation. In calculating the rank (Step S413), the rank of similar images relative to the input image in the candidate image group is calculated in a case where adversarial perturbation is not applied to the image. In verifying the rank (Step S414), it is verified whether the rank of similar images counted in a case where the image is subjected to adversarial perturbation is within a predetermined range including the rank of similar images calculated in a case where the adversarial perturbation is not applied to the image.
According to the robustness verification method shown in
In the configuration shown in
Any one or more of the above robustness verification devices 100, 200, and 610 may be implemented in the computer 300. In that case, the operations of each of the above-mentioned processing portions are stored in the auxiliary memory device 330 in the form of a program. The CPU 310 reads the program from the auxiliary memory device 330, expands it in the main memory device 320, and executes the aforementioned processing according to the program. The CPU 310 also reserves a memory area in the main memory device 320 corresponding to each of the above-mentioned memory portions according to the program. Communication between each device and other devices is executed by the interface 340, which has a communication function and communicates according to the control of the CPU 310.
In a case where the robustness verification device 100 is implemented in the computer 300, the operations of the similar image identification portion 102, the comparison target image calculation portion 104, the upper limit/lower limit calculation portion 106, and the rank verification portion 108 are stored in auxiliary memory device 330 in program form. The CPU 310 reads the program from the auxiliary memory device 330, expands it in the main memory device 320, and executes the aforementioned processing according to the program.
The CPU 310 also allocates a memory area in the main memory device 320 for the robustness verification device 100 to perform processing according to the program. The output of the (α, β)-robustness verification of the robustness verification device 100 is executed by the interface 340, which has output functions such as communication or display functions and performs output processing according to the control of the CPU 310. Communication between the robustness verification device 100 and other devices is executed by the interface 340, which has communication functions and operates according to the control of the CPU 310. The interaction between the robustness verification device 100 and the user is executed by the interface 340, which has a display and input device and operates according to the control of the CPU 310.
In a case where the robustness verification device 200 is implemented in the computer 300, the operations of the similar image identification portion 202, upper limit/lower limit calculation portion 206, and the rank verification portion 208 are stored in the auxiliary memory device 330 in the form of programs. The CPU 310 reads the program from the auxiliary memory device 330, expands it in the main memory device 320, and executes the aforementioned processing according to the program.
The CPU 310 also allocates a memory area in the main memory device 320 for the robustness verification device 200 to perform processing according to the program. The output of the α-robustness verification of the robustness verification device 200 is executed by the interface 340, which has output functions such as communication or display functions and performs output processing according to the control of the CPU 310. Communication between the robustness verification device 200 and other devices is executed by the interface 340, which has communication functions and operates according to the control of the CPU 310. The interaction between the robustness verification device 200 and the user is executed by the interface 340, which has a display and input device and operates according to the control of the CPU 310.
In a case where the robustness verification device 610 is implemented in the computer 300, the operations of the robustness verification device 410, the similar image identification portion 411, the rank counting portion 412, the rank calculation portion 413, and the rank verification portion 414 are stored in the auxiliary memory device 330 in the form of programs. The CPU 310 reads the program from the auxiliary memory device 330, expands it in the main memory device 320, and executes the aforementioned processing according to the program.
The CPU 310 also allocates a memory area in the main memory device 320 for the robustness verification device 610 to perform processing according to the program. The output of the robustness verification device 610 is executed by the interface 340, which has output functions such as communication or display functions and performs output processing according to the control of the CPU 310. Communication between the robustness verification device 610 and other devices is executed by the interface 340, which has communication functions and operates according to the control of the CPU 310. The interaction between the robustness verification device 610 and the user is executed by the interface 340, which has a display and input device and operates according to the control of the CPU 310.
A program for executing all or part of the processes performed by the robustness verification devices 100, 200, and 610 may be recorded on a computer-readable recording medium, and the computer system may read and execute the program recorded on this recording medium to perform the processes of each part. The term “computer system” here shall include an operating system and hardware such as peripherals.
In addition, “computer-readable recording medium” means a portable medium such as a flexible disk, magneto-optical disk, ROM (Read Only Memory), CD-ROM (Compact Disc Read Only Memory), or other storage device such as a hard disk built into a computer system. The above program may be used to realize some of the aforementioned functions, and may also be used to realize the aforementioned functions in combination with programs already recorded in the computer system.
While preferred example embodiments of the invention have been described and illustrated above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Additions, omissions, substitutions, and other modifications can be made without departing from the scope of the present invention. Accordingly, the invention is not to be considered as being limited by the foregoing description, and is only limited by the scope of the appended claims.
Example embodiments of the present invention may be applied to a robustness verification device, a robustness verification method, a program, and recording medium.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/JP2022/009573 | 3/4/2022 | WO |