The present invention relates generally to computer vision techniques, and more particularly, to techniques of generating an adversarial patch-based image for a computer vision neural network.
Nowadays, computer vision techniques are widely applied in various scenarios like surveillance, auto-driving, and so on. Deep learning models, especially those based on convolutional neural network (CNN), have been successfully used in the computer vision techniques. However, recent researches have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks. The vulnerability of DNN-based computer vision techniques has brought huge potential safety risks to scenarios like auto-driving, which makes it necessary to study the adversarial attacks towards a computer vision neural network.
Perturbation-based attack and patch-based attack are two types of mainstream attack methods. Perturbation-based method is based on small perturbations and learns full-image additive noise which can affect the prediction of deep learning models with perturbations that are nearly imperceptible to humans. Since this method manipulates every pixel of an image, it is not feasible for attack in physical world. Patch-based method uses one or more adversarial patches to attack certain parts of an image and produce patch-level changes on the image. Since the patch-based attack only change one or several regions of the image, it has the potential to occur in physical world, like hiding a person or a stop sign, which is dangerous for auto-driving.
Therefore, patch-based adversarial attacks worth more researches in order to study the vulnerability of a computer vision neural network to physical attacks and improve the safety of the computer vision neural network correspondingly.
The following presents a simplified summary of one or more aspects according to the present includes in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
In an aspect of the present invention, a method for generating a set of adversarial patches for an image is provided. According to an example embodiment of the present invention, the method may comprise segmenting the image into a plurality of regions; selecting a set of target regions that satisfies an attacking criterion by discretely searching of the plurality of regions; and generating a set of adversarial patches by using the set of target regions.
In another aspect of the present invention, an apparatus for generating a set of adversarial patches for an image is provided. According to an example embodiment of the present invention, the apparatus may comprise a memory and at least one processor coupled to the memory. The at least one processor may be configured to segment the image into a plurality of regions; select a set of target regions that satisfies an attacking criterion by discretely searching of the plurality of regions; and generate a set of adversarial patches by using the set of target regions.
In another aspect of the present invention, a computer readable medium storing computer code for generating a set of adversarial patches for an image is provided. According to an example embodiment of the present invention, the computer code when executed by a processor may cause the processor to segment the image into a plurality of regions; select a set of target regions that satisfies an attacking criterion by discretely searching of the plurality of regions; and generate a set of adversarial patches by using the set of target regions.
In another aspect of the present invention, a computer program product for generating a set of adversarial patches for an image is provided. According to an example embodiment of the present invention, the computer program product may comprise processor executable computer code for segmenting the image into a plurality of regions; selecting a set of target regions that satisfies an attacking criterion by discretely searching of the plurality of regions; and generating a set of adversarial patches by using the set of target regions.
Other aspects or variations of the present invention will become apparent by consideration of the following detailed description and the figures.
The following figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the methods and structures disclosed herein may be implemented without departing from the spirit and principles of the present invention described herein.
Before any embodiments of the present disclosure are explained in detail, it is to be understood that the disclosure is not limited in its application to the details of construction and the arrangement of features set forth in the following description. The disclosure is capable of other embodiments and of being practiced or of being carried out in various ways.
Object detection is one type of computer vision tasks, which deals with identifying and locating object of certain classes in an image. The present disclosure will be described by taking object detection as an example, while it should be noted that the present disclosure may be applied to other computer vision neural networks which may provide various types of prediction based on an input image.
Deep neural network has been applied in object detection and achieved great success. Object detectors with deep neural network, especially based on convolutional neural network, may be classified into one-stage detectors and two-stage detectors.
For two-stage detectors, their prediction consists region proposals and classification afterwards. The first detector with deep neural network was OverFeat, which combines a sliding window and a convolutional neural network (CNN) to do the detection. After that, Regions with Convolutional Neural Networks (R-CNN) was proposed. The two-stage detectors first search for region proposals and then classify each of them. One problem of R-CNN is that it runs too slow. Therefore, some modern detectors like Fast R-CNN, Faster R-CNN and Mask R-CNN have been proposed to speed up.
One-stage detectors, also known as “single shot detectors”, may predict the bounding box, object score and class score with only a single pass through the network. The one-stage detector only extracts features with one CNN and produce the result for object localization and classification right away. This difference makes one-stage detector faster but easier to be attacked. On the contrast, two-stage detector is slower, but more precious and more difficult to be attacked.
Many adversarial attack schemes have been proposed against computer vision systems including two-stage object detectors, in order to evaluate the safety of a learned computer vision system based on DNN. From the perspective of pattern form, the adversarial attacks may be classified as perturbation-based attack and patch-based attack. Since the perturbation-based attack manipulates every pixel of an image and is not feasible in physical world, while the patch-based attack uses one or more adversarial patches to attack an image and may cause potential safety risk in physical world, the present disclosure mainly focuses on the patch-based method for generating an adversarial attacked image, in order to evaluate and improve the safety of a computer vision neural network.
The patch-based attack generally modifies certain parts of image with visible changes like adversarial patches. The textures in modified parts are not constrained by the original image, which means patches can have perceptible textures. However, existing patch-based attack methods are not flexible in terms of the positions and shapes of patches, which may limit the attack performance when these properties are constrained.
Firstly, the positions of patches in existing patch-based methods are fixed. For example, DPatch method uses the up-left corner of an image, AdvPatch method takes the center of person in an image as the attack region, and UPC method puts its adversarial patches on 8 manually chosen parts of person. Secondly, the shapes of patches in the existing patch-based method are also fixed, usually rectangle. Rectangle is easy to define in digital image, but not correlated to the shape of target object. The shape and position settings of these existing patch-based methods may lead to poor attack performance in different tasks of object detection, like misclassification, position shift and disappearing, which shows that these constrained attack regions are not efficient enough for evaluating safety of a computer vision neural network.
In an embodiment of optimizing all parameters of the positions, the shapes, and the textures for each patch, the i-th patch may be denoted as a tuple of Pi=(si, pi, ti), where si, pi, ti represent the shape, position, and texture of the patch respectively, as illustrated in
where f is an object detection model (such as, Faster R-CNN, Mask R-CNN, etc.), x is an original image before attacked (such as, image 110), y is ground-truth annotation of the image (such as, the bounding box 125 in image 120), ⊕ represents covering patches onto an image, x⊕Σii represents the image with modified textures at regions according to each i, L(⋅) is a loss function measuring the difference between prediction of the object detection model on the attacked image f (x⊕Σii) and the ground-truth annotation y, φ(⋅) is a function to calculate the region area of every patch, and λ is a balance parameter. A specific loss function may be dependent on a particular attack task.
In one embodiment, the patches may be constrained with and , where is the set of all convex graphics on 2D plane, and is the background of the image. In other words, the patches are constrained to be convex and only exist in foreground. By maximizing L(f(x ⊕Σii), y), the difference between the predicted output and ground truth annotation is maximized, which results in the performance decrease of detector in different attack tasks. Meanwhile, Σiφ(i) will penalize the region area, which is weighted by a balancing parameter λ. This encourages the optimization to get patches with smaller area while achieving satisfying performance.
However, since the parameters of position and shape are defined in a discrete space on pixel level, the loss function in equation (1) become non-differentiable to optimize. Therefore, the positions and shapes of patches cannot reach their optimal points only with traditional gradient descent methods.
Instead of directly optimizing the parameters of positions and shapes, an image may be over-segmented into a plurality of small regions, and then some of these small regions may be selected as attack regions, in accordance with one aspect of the present disclosure. In this way, the original optimization problem in equation in (1) may be reformulated into a discrete search problem.
In other words, a generated patch may be comprised of several adjacent regions, and thus to some extent, the shape of a patch may be optimized by optimizing the selection of each region.
In this way, the optimization of the parameters of positions and shapes may be transformed into a discrete search problem of whether or not to select a region as a target region, such as the region Ri (which means the i-th region) in
Then, for the selected regions, their textures may be modified by iterative gradient back-propagation and minimizing the confidence score of the predicted bounding boxes on the attacked image covered by textures in the corresponding selected regions. The textures for each of the selected regions may also be maintained as the textures predetermined or calculated during the region selection stage.
Since superpixels may be used as an important cue to measure the performance of object detector prediction with the degree of superpixels straddling the bounding box, an adversarial attack on superpixels can influence the predictions of the object detector.
Therefore, as shown in
Then, a set of regions may be selected as attack regions, and textures for each of the selected regions may be determined, similarly as described above with reference to
At block 610, method 600 may comprise segmenting an image into a plurality of regions. The image may be an original image with ground-truth labels for training and/or testing a computer vision neural network. The computer vision neural network may be used for object detection, instance segmentation, etc. The segmentation may be based on a predetermined shape or a predetermined number of regions. The predetermined shape may be a regular polygon shape, such as, triangle, square, rhomb, trapezoid, pentagon, hexagon, etc. The predetermined shape may also be an irregular polygon shape. The segmentation may be based on different patterns, which may be dependent on the characteristic of the input image. The segmentation may be constrained to a foreground object of the image.
In one embodiment, the segmenting at block 610 may comprise segmenting the image into a plurality of regions based on pixels having values within a threshold range. For example, each of the plurality of regions of the image may have uniform color or texture values, and may be referred as a superpixel. The shape of each superpixel may be different at different positions of the image based on the pattern of the image. In this embodiment, the segmenting at block 610 may further comprise changing the plurality of regions into convex shapes by getting a convex envelope for each of the plurality of regions.
At block 620, method 600 may comprise selecting a set of target regions that satisfies an attacking criterion by discretely searching of the plurality of regions. In one embodiment, a selection vector of the plurality of regions may be used to indicate whether each of the plurality of regions is to be selected into the set of target regions. The dimension of the selection vector depends on the number of the plurality of regions. Each element of the selection vector has a value of 1 or 0 indicating whether a corresponding region is to be selected or not, and is assumed to obey a Bernoulli distribution respectively and independently. Therefore, the selecting a set of target regions at block 620 may comprise optimizing a probability distribution of the selection vector by calculating a search gradient; and selecting the set of target regions based on a selection vector sampled based on the optimized probability distribution.
In other words, the selecting a set of target regions at block 620 may be based on different objective functions of an output from a computer vision neural network for an image applied by the set of patches, a ground-truth label of the original image, and a total area of the generated set of patches, in accordance with different embodiments. In the example of
where f is an object detection model, x is an original image before attacked (such as, image 110), y is ground-truth annotation of the image, m=(m1, m2, . . . , mM)∈{0, 1}M is the selection vector indicating whether a corresponding region Ri is selected or not, M depends on the number of the segmented plurality of regions, Ri represents the i-th segmented region, since after segmentation the shapes of regions are determined and positions may be represented by the corresponding element in the selection vector, only the texture is considered for Ri, ⊕ represents covering patches (based on the regions and textures) onto an image. L(⋅) is a loss function measuring the difference between prediction of the object detection model on the attacked image f(x⊕ΣimiRi) and the ground-truth annotation y, φ(⋅) is a function to calculate the sum area of all attack regions, and λ is a balance parameter. The constraints in equation (1) may be removed by well-designed segmentation method. In the example of
where f is an object detection model, x is an original image before attacked (such as, image 110), y is ground-truth annotation of the image, m=(m1, m2, . . . , mM)∈{0, 1}M is the selection vector, M depends on the number of the segmented superpixels, si represents the i-th superpixel, Ω(⋅) is a function to get convex envelopes of the superpixels, since after over-segmentation the shapes of regions depends on superpixels and positions may be represented by the corresponding element in the selection vector, only the texture is considered for si, ⊕ represents covering patches (based on the regions and textures) onto an image. L(⋅) is a loss function measuring the difference between prediction of the object detection model on the attacked image f(x⊕Ω(Σimisi) and the ground-truth annotation y, φ(⋅) is a function to calculate the sum area of all attack regions, and λ is a balance parameter.
According to equations (2) and (3), the optimization of shapes, positions, and textures in equation (1) is transformed to the optimization of a selection vector (m) and textures ({Ri} or {si}). In one embodiment, the selection vector may be optimized by using natural evolution strategies (NES) , and the textures may be optimized by using iterative gradient ascent. For example, according to the reformulated objective function in equation (3), to optimize m with NES, a fitness function may be defined as:
which means si* is the superpixel si with its optimal texture determined by m and y given. Here the optimal texture of each superpixel in {si* } may be denoted together as t*. t* is calculated through gradient ascent. The expected fitness under a search distribution may be defined as:
J(θm)=π(m|θ
where π is a search distribution of m. For m∈{0, 1}M, it is assumed to obey Bernoulli distribution Bern(g(θm)) , where θm∈RM is the distribution parameter, g(⋅)=½(tanh(⋅)+1) is the function that constrains the value of probability to [0, 1]. Then, the search gradient may be calculated as:
In one embodiment, an estimate of the search gradient may be obtained from samples m1, m2, . . . mK as:
where K is a population size, which may be an integer from 20˜50.
The loss function in equations (1)˜(3) may be dependent on a particular attack task, such as, misclassification, position shift, and disappearing. The misclassification task involves with two different tasks, target attack and untarget attack. In target attack, adversarial attack should make the detector predict the target class on the target object. In untarget attack, the adversarial attack should make the detector not predict the correct class. In position shift task, adversarial attack should shift the predicted bounding box of victim object as far as possible. In disappearing task, adversarial attack should make the given object invisible to detector.
For example, for untarget misclassification task, the loss function in equation (3) may be set as:
where b is the predicted bounding box, B* is the bounding boxes which can be detected as same position as ground truth bounding box, C is the classifier of detector, C(b, x⊕Ω(Σimisi) the classification score predicted by the classifier on bounding box b, ζ(⋅) outputs the classification score of ground truth category predicted by model.
For position shift task, the loss function in equation (3) may be set as:
where b is predicted bounding box, B′ is the set of detected bounding boxes which are closest to the ground truth object, |pc(b)−pc(y)| is the L1 norm of coordinate difference, pc(⋅) is the central coordinates of the predicted bounding box or ground truth bounding box.
For disappearing task, the opposite value of sum of object confidence and classification scores over a certain threshold may be set as the loss function in equation (3) . Object confidence may be used to measure the probability of existence of any object in bounding box. Classification score may be used to measure the probability of existence of a certain class in bounding box.
At block 630, method 600 may comprise generating a set of adversarial patches by using the set of target region.
In one embodiment, the textures of the set of adversarial patches may be kept as the textures used while selecting the set of target regions. In another embodiment, at block 630, the generating the set of adversarial patches further comprises modifying textures for the set of adversarial patches. The textures of the set of adversarial patches may be based on default configuration. For example, the textures of the set of adversarial patches may be modified with iterative gradient ascent. The textures of the set of adversarial patches may also be selected from a texture dictionary.
Method 600 may further comprise applying the set of adversarial patches to the original image to generate an adversarial attacked image. The set of adversarial patches may be applied to the original image by covering the set of adversarial patches onto the corresponding target regions of the original image, or replacing the corresponding target regions of the original image with the set of adversarial patches.
The method 700 may begin at block 710, comprising receiving an input image x, ground truth label y, initial value of distribution parameter θmhu init, and population size K. In one example, the distribution parameter may be initial parameter of Bernoulli distribution, and the population size may be configured as 30 or 40. At block 720, method 700 comprises initializing superpixels, i.e., over-segmenting the image x into superpixels, each superpixel corresponding a region of the image and having a different shape. The over-segmentation may be constrained on a foreground object of the image.
After the segmentation, method 700 may comprise, at block 730, drawing a sample of selection vector mi based on a search distribution π(m|θm), where i=1˜K; calculating optimal t* based on y and mi; evaluating the fitness (m, t*; y) in equation (4); and calculating log-derivatives ∇θ
The various operations, models, and networks described in connection with the disclosure herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. According an embodiment of the disclosure, a computer program product for generating adversarial patches may comprise processor executable computer code for performing the method 600 and method 700 described above with reference to
The preceding description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the various embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the various embodiments. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/088875 | 4/22/2021 | WO |