This application is based upon and claims the benefit of priority of the prior Israel Patent Application No. 310277, filed on Jan. 21, 2024, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a computer-readable recording medium having, an information processing method, and an information processing device that use image AI for object detection and the like.
Adversarial patch attacks against image artificial intelligence (AI) for object detection and the like cause false recognition of objects to be detected, for example, in fraud detection at self-checkouts in the retail domain and license plate recognition in the public security area. Adversarial patches are a variant of adversarial example attacks and are also called adversarial example patches.
To detect adversarial patch attacks, for example, a class output by object detection for an input image subjected to image processing is compared with a class output by object detection for an input image without image processing, and, if there is a mismatch, it is determined that a patch attack has occurred. Here, examples of the image processing include processing of adding missing straight lines with a randomly set width and spacing, and processing of adding a missing square with a randomly set size and position to the input image.
According to an aspect of an embodiment, a non-transitory computer-readable recording medium stores therein an information processing program that causes a computer to execute a process including: acquiring a first value representing a size of a region of an object included in a first image; determining a missing rate for an adversarial patch, based on a second value representing a minimum size of the adversarial patch acquired according to the first value; generating a second image in which missingness exceeding the missing rate is added to the first image; and comparing a first detection result obtained by inputting the first image into an object detection model with a second detection result obtained by inputting the second image into the object detection model. This configuration can improve detection accuracy against adversarial patch attacks.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
However, for example, when missingness with a random size and position is added, the missingness may sometimes fail to cover a patch and fail to reduce the effectiveness of the patch and to detect a patch attack, leading to overlooking of fraudulent use of self-checkouts, license plates, and the like.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. The present embodiment is not intended to be limited by the examples. The examples may be combined as appropriate as long as there is no contradiction.
First, conventional techniques for detecting adversarial patch attacks and its drawback will be described. An adversarial patch attack is, for example, an attack that causes image AI (for example, object detection model) to falsely recognize an object to be detected when the object is detected from an image. Existing object detection algorithms include YOLO (You Only Look Once) and Faster R-CNN (Faster Region-based Convolutional Neural Networks), for example.
Such an adversarial patch attack using the adversarial patch 99 poses a significant threat to image AI such as object detection AI.
The patch detection technique illustrated in
The patch detection technique as illustrated in
Although
However, even such patch detection technique using missingness addition, which is more effective against adversarial patch attacks, has a drawback.
One of the objects in the present embodiment is therefore to improve the detection accuracy against adversarial patch attacks by performing image processing so that missingness covers the adversarial patch 99.
In the present embodiment, for example, as countermeasures against predicting a missing position and circumventing missingness addition to the adversarial patch 99, (2) the orientation and offset (for example, the number of pixels from a predetermined reference position) of the missingness are randomly set, as illustrated in
In the present embodiment, for example, as illustrated in
Thus, as illustrated in
An information processing system for implementing the present embodiment will now be described.
The network 50 may be either wired or wireless, and, for example, a variety of communication networks such as the Internet or an intranet can be employed. The network 50 is not necessarily a single network and, for example, may be configured with an intranet and the Internet through a network device such as a gateway or other devices (not illustrated).
The information processing device 10 is, for example, an information processing device such as a desktop personal computer (PC), a notebook PC, or a server computer installed in the inside of a facility to be monitored and used by facility staff or administrators. The inside of a facility may include outdoor areas as well as indoor areas.
The information processing device 10 receives, for example, a video taken by the camera device 100 from the camera device 100 in order to detect an adversarial patch attack. Strictly speaking, the video taken by the camera device 100 is a plurality of images taken by the camera device 100, that is, a series of frames of moving images.
The information processing device 10 detects objects such as products and license plates from images taken by the camera device 100, for example, using image AI such as an existing object detection model. In detecting objects such as products from the taken images, for example, as illustrated in
The information processing device 10 acquires, for example, a first value representing the size of the region of an object included in a first image taken by the camera device 100. The information processing device 10 also determines, for example, a missing rate for the adversarial patch, based on a second value representing the minimum size of the adversarial patch 99 acquired according to the first value. The information processing device 10 generates, for example, a second image in which missingness exceeding the determined missing rate is added to the first image. The information processing device 10 compares, for example, a first detection result obtained by inputting the first image into the object detection model with a second detection result obtained by inputting the second image into the object detection model. For example, if the comparison between the first detection result and the second detection result indicates a mismatch, the information processing device 10 determines that the adversarial patch 99 has been used to cause false recognition of an object, detects the adversarial patch 99, and issues an alert or the like.
In
The camera device 100 is, for example, a surveillance camera installed in the inside of a facility to be monitored. As illustrated in
Devices other than the devices illustrated in
A functional configuration of the information processing device 10 will now be described.
The communication unit 20 is a processing unit that controls communication with other information processing devices such as the camera device 100, for example, a communication interface such as a network interface card or a universal serial bus (USB) interface.
The storage unit 30 has a function of storing various data and computer programs to be executed by the control unit 40 and is implemented, for example, by a memory, a hard disk, or other storage device. The storage unit 30 stores, for example, image information 31, model information 32, and adversarial patch information 33.
The image information 31 stores, for example, an image taken by the camera device 100. The image stored in the image information 31 is an image taken by the camera device 100 and transmitted to the information processing device 10. The image information 31 may also store, for example, an identifier to uniquely identify the camera device 100 that took the image, and the date and time of taking the image. The image information 31 may store, for example, a processed image after image processing such as object detection, missingness addition, image segmentation and edge extraction, which will be described later, on the image taken by the camera device 100.
The model information 32 stores, for example, information about an object detection model, which is a machine learning model for detecting an object from an image taken by the camera device 200, and model parameters for constructing the model. The machine learning model is generated by machine learning using, for example, videos taken by the camera device 100, that is, the taken images as input data, as well as a region including an object, a class indicating what the object is, and the confidence level of the class as correct labels. The region including an object may be, for example, a bounding box surrounding such a region with a rectangle on the taken image. The model information 32 stores, for example, information about a segmentation model described later and model parameters for constructing the model.
The adversarial patch information 33 stores, for example, information about the adversarial patch 99. The adversarial patch 99, for example, has a certain size relative to the size of an object to be falsely recognized in order to exert the effect on the object. In other words, the minimum size of the adversarial patch 99 can be predefined for each object. The adversarial patch information 33 therefore may store, for example, the minimum size of the adversarial patch 99 for each object such as a product, and a missing rate for the minimum size. The missing rate is, for example, a value indicating how much missingness in the adversarial patch 99 can reduce the impact of the adversarial patch 99.
The above information stored in the storage unit 30 is only an example, and the storage unit 30 can store various other information in addition to the above information.
The control unit 40 is a processing unit that controls the entire information processing device 10, for example, a processor. The control unit 40 includes an acquisition unit 41, a determination unit 42, a generation unit 43, and a detection unit 44. Each processing unit is an example of an electronic circuit that the processor has or a process that the processor executes.
The acquisition unit 41 acquires, for example, the first value representing the size of the region of an object included in the first image taken by the camera device 100. The acquisition unit 41 also acquires, for example, the position of the object in the first image. The first value representing the size of the region of the object may be, for example, “200×300” indicating the width×height of pixels in the first image. The position of the object in the first image may be, for example, “(120, 130)” indicating the coordinates of the center of the object in the first image. For example, when a plurality of objects are included in the first image, the first value and the position are acquired for each object. The size of the region of the object may be, for example, the size of the bounding box surrounding the object region obtained from the segmentation model.
Processing using the segmentation model is a kind of image recognition processing based on deep learning, which is called, for example, image segmentation or object segmentation. The processing using the segmentation model divides the region for each object in the input image and recognizes the kind of the object. Examples of the processing using the segmentation model include semantic segmentation, instance segmentation, and panoptic segmentation. Semantic segmentation is, for example, a technique that labels each pixel in an image and is good at extracting irregular shapes such as the sky and roads. Instance segmentation is, for example, a technique that divides a region for each object according to an object class and is good at extracting cars, people, and the like. Panoptic segmentation is, for example, a technique that combines semantic segmentation and instance segmentation.
The acquisition unit 41 acquires the edges of an object in the first image taken by the camera device 100, for example, using an existing edge extraction algorithm such as Canny filter. Data obtained by edge extraction is, for example, a value indicating the intensity of the edge at each pixel in the image. The acquisition unit 41 acquires an edge, that is, edge intensity, for example, using an existing edge extraction algorithm, from an image in an initial state (hereinafter referred to as “initial image”) taken by the camera device 100 in advance and in which no objects are seen (only with a background). The initial image and the processed image obtained by performing edge extraction on the initial image may be stored in advance, for example, in the image information 31.
The determination unit 42 determines the missing rate for the adversarial patch 99, for example, based on the second value representing the minimum size of the adversarial patch 99 that is acquired according to the first value acquired by the acquisition unit 41. The adversarial patch 99, for example, has a certain size relative to the size of an object to be falsely recognized in order to exert the effect on the object. The minimum size of the adversarial patch 99 therefore can be predefined according to the size of the object. Thus, the determination unit 42, for example, acquires the minimum size of the adversarial patch 99 stored in advance for each object from the adversarial patch information 33, and determines the missing rate for the adversarial patch 99 based on the minimum size. For example, when the first value representing the size of the region of an object included in the first image is “200×300” in width×height, the second value may be “50×50” in width×height, using 50, which is 25 percent of the smaller value (200 in width) of the size. The determination unit 42 then determines, for example, the missing rate predefined for the minimum size of the adversarial patch 99, that is, the second value. The missing rate is, for example, a value (for example, 20%) indicating how much missingness in the adversarial patch 99 can reduce the impact of the adversarial patch 99. The missing rate is associated with the minimum size of the adversarial patch 99 and stored in advance in the adversarial patch information 33.
Returning to the description with reference to
As indicated by step (2) in
The generation unit 43 may add, for example, missingness having a predetermined shape with a predetermined spacing, a predetermined orientation, and a predetermined offset, and at the position according to the spacing. The predetermined spacing, predetermined orientation, and predetermined offset as well as the position according to the spacing may be set randomly within a predetermined range. The orientation of the missingness may be, for example, the angle of the missingness relative to the horizontal direction, such as 0°, 45°, 90°, or 135°. The offset may be a value greater than 0 and equal to or smaller than the set predetermined spacing.
As indicated by step (3) in
The generation unit 43 may add missingness, for example, outside a predetermined range from the edges of the corresponding object acquired by the acquisition unit 41, in the first image taken by the camera device 100. More specifically, for example, the generation unit 43 calculates the difference between the edge intensity of the initial image and the edge intensity of the first image (hereinafter referred to as “differential edge”) for each pixel. The generation unit 43 then adds missingness to the first image, for example, while avoiding a section where the differential edge is equal to or greater than a predetermined intensity (optionally with a certain range). For example, when a section where pixels with the differential edge (which is not zero) overlap the missingness has a certain number of pixels, or when there are multiple shapes of missingness, multiple widths or other sizes of missingness, and/or a plurality of missingness, generation and addition of missingness may be redone from the setting of the spacing, position, and the like. In this way, the generation unit 43 can add, for example, missingness with fewer interruptions while minimizing a section where the edges of the object and the vicinity thereof overlap the missingness. Alternatively, missingness may be added using only the edge intensity of the first image, instead of the differential edge, while avoiding the edges of the object and the vicinity thereof in the first image (outside a predetermined range from the edges of the object). The addition of missingness by the generation unit 43 is performed for each object when a plurality of objects are included in the first image taken by the camera device 100.
The generation unit 43 can add missingness, for example, outside the object region in the second image at a predetermined angle, width, spacing, and the like. The predetermined angle, width, and spacing may be, for example, 90°, 1 pixel, and 10 pixels, respectively. The outside of the object region in the second image may be, for example, turned into solid black (that is, all missing), instead of adding partial missingness.
The detection unit 44 compares, for example, the first detection result obtained by inputting the first image taken by the camera device 100 into an object detection model with the second detection result obtained by inputting the second image generated by the generation unit 43 adding missingness into the object detection model. The detection unit 44 then, for example, issues an alert if the comparison between the first detection result and the second detection result indicates a mismatch. For example, the detection unit 44 may issue an alert if the output class does not match in at least a certain number of second images in the detection results obtained by inputting each of a plurality of second images generated by adding missingness with gradually changing missing rates into the object detection model.
Referring now to
First, as illustrated in
Next, the information processing device 10 monitors, for example, an input image queue (step S102). The input image queue may be, for example, a region in which an image taken by the camera device 100 and timely transmitted from the camera device 100 is stored as an input image in order to perform fraud detection on the image.
Next, the information processing device 10 determines, for example, whether there is an input image in the input image queue (step S103). If there is no input image in the input image queue (No at step S103), the process returns to step S102.
On the other hand, if there is an input image in the input image queue (Yes at step S103), the information processing device 10 detects an object from the input image, for example, using the segmentation model (step S104). If a plurality of objects are detected from the input image at step S104, the processing at subsequent steps S105 to S107 is performed for each object.
Next, the information processing device 10 refers to, for example, the adversarial patch information 33 for the minimum size of the adversarial patch 99 according to the size of the region of the object detected at step S104 (step S105).
Next, the information processing device 10 determines, for example, the missing rate for the adversarial patch 99, based the minimum size of the adversarial patch 99 referred to at step S105 (step S106).
Next, the information processing device 10 generates a missingness-added image, for example, by adding missingness exceeding the missing rate determined at step S106 to the input image (step S107). A more detailed flow of the missingness adding process at step S107 will be described later with reference to
Next, the information processing device 10 determines, for example, whether there is an object detected from the input image at step S104 that has not yet been processed at steps S105 to S107 (step S108). If there is an unprocessed object (Yes at step S108), the process returns to step S105 and the information processing device 10 repeats steps S105 to S108, for example, until there are no more unprocessed objects.
On the other hand, if there is no unprocessed object (No at step S108), the information processing device 10, for example, adds partial or global missingness outside the object region in the missingness-added image generated at step S107 (step S109).
Next, the information processing device 10 compares, for example, the detection result obtained by inputting the input image into an object detection model with the detection result obtained by inputting the missingness-added image into the object detection model and, if the comparison indicates a mismatch, detects the adversarial patch 99 (step S110). After execution of step S110, the patch detection process illustrated in
Referring now to
First, as illustrated in
Next, the information processing device 10 randomly sets, for example, the angle and offset of the missing lines, within a predetermined range (step S202).
Next, the information processing device 10 extracts, for example, the edges of the input image (step S203).
Next, the information processing device 10 calculates, for example, the differential edge for each pixel between the edge intensity of the initial image acquired at step S101 in the patch detection process illustrated in
Next, the information processing device 10 adds, for example, missing lines to the input image while avoiding the differential edge calculated at step S204 and the vicinity thereof (step S205). After execution of step S205, the missingness adding process illustrated in
As described above, the information processing device 10 acquires the first value representing the size of the region of an object included in the first image, determines the missing rate for the adversarial patch 99, based on the second value representing the minimum size of the adversarial patch 99 acquired according to the first value, generates the second image in which missingness exceeding the missing rate is added to the first image, and compares the first detection result obtained by inputting the first image into an object detection model with the second detection result obtained by inputting the second image into the object detection model.
In this way, the information processing device 10 generates the second image by adding, to the first image, missingness exceeding the missing rate for the minimum patch size according to the first value indicating the size of the object detected from the first image, and performs object detection from each of the first image and the second image. The information processing device 10 then compares the object detection results between the first image and the second image, thereby improving detection accuracy against adversarial patch attacks.
The process of generating the second image that is performed by the information processing device 10 includes a process of generating the second image in which missingness is added inside the region in the first image.
With this configuration, the information processing device 10 can improve detection accuracy against adversarial patch attacks.
The process of generating the second image that is performed by the information processing device 10 includes a process of generating the second image in which a predetermined shape is added as missingness with a predetermined spacing, a predetermined orientation, and a predetermined offset, and at a position according to the spacing.
With this configuration, the information processing device 10 can improve detection accuracy against adversarial patch attacks.
The information processing device 10 randomly sets the predetermined spacing, the predetermined orientation, the predetermined offset, and the position, within a predetermined range.
With this configuration, the information processing device 10 can take countermeasures against predicting a missing position and circumventing missingness addition to the adversarial patch 99, thereby improving detection accuracy against adversarial patch attacks.
The information processing device 10 acquires the edges of an object in the first image, and the process of generating the second image that is performed by the information processing device 10 includes a process of generating the second image in which missingness is added outside a predetermined range from the edges in the first image.
With this configuration, the information processing device 10 can remove the missingness around the periphery of the edges serving as major features for identifying an object, thereby improving detection accuracy against adversarial patch attacks.
The information processing device 10 issues an alert when the comparison between the first detection result and the second detection result indicates a mismatch.
With this configuration, the information processing device 10 can determine that the adversarial patch 99 has been used to cause false recognition of an object and can issue an alert indicating that the adversarial patch 99 has been detected.
The processing procedures, control procedures, specific names, and information including various data and parameters referred to in the description and drawings may be changed as desired unless otherwise specified. The specific examples, distributions, numerical values, and the like described in examples are only by way of example and may be changed as desired.
The specific forms of distribution and integration of the components of each device are not limited to those depicted in the drawings. In other words, all or some of the components may be functionally or physically distributed or integrated in arbitrary units, depending on various loads and use conditions.
Furthermore, the processing functions of each device can be entirely or partially implemented by a central processing unit (CPU) and a computer program analyzed and executed by the CPU, or by hardware using wired logic.
The communication interface 10a is, for example, a network interface card that communicates with other servers. The HDD 10b stores computer programs and DBs for operating the functions illustrated in
The processor 10d is a hardware circuit that operates a process that executes each of the functions illustrated in
In this way, the information processing device 10 operates as an information processing device that performs an operation control process by reading and executing a computer program for executing the same processing as each of the processing units illustrated in
The computer program for executing the same processing as each of the processing units illustrated in
According to one aspect of an embodiment, the detection accuracy against adversarial patch attacks can be improved.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventors to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 310277 | Jan 2024 | IL | national |