This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2017-207087, filed on Oct. 26, 2017, the entire contents of which are incorporated herein by reference.
The embodiments described herein relate to an image processing system, an image processing method, and an image processing program for generating an attack image.
In recent years, deep learning in which machine learning is performed using multilayered neural networks has attracted attention, and for example, the use of deep learning has been put into practice in various fields such as image recognition, speech recognition, machine translation, control and detection of abnormality of an industrial robot, processing of medical information, and diagnosis of medical images.
Incidentally, for example, application of an image classification model using a neural network is expanding as image recognition for automatic driving, but it is conceivable that a malicious attacker tries to cause the model to make a wrong output. More specifically, in an automatic operation system, it is common to use an in-vehicle camera image to recognize the surrounding situation, and in that case, inaccurate recognition also causes big problems, and therefore, for example, high precision is required for recognition of pedestrians, vehicles, signals, traffic signs and the like.
Conventionally, research and development of automatic driving has been verifying the recognition accuracy of automatic operation system and safety of driving in a normal environment where no malicious attacker exists. However, in the future, when the automatic driving gradually becomes practical in real life, there is a possibility that an attacker with malicious intent based on mischief, terror, etc. may appear. For that reason, recognition function with robust classifier is indispensable for recognizing image.
Here, in order to realize a robust classifier capable of accurately recognizing (classifying) the image even when the malicious attacker attempts to cause the model to make a wrong output, for example, a method of an adversarial attack for targeting wrong classification by adding arbitrary noise to an image sample, and for such an attack, a method of a defense (Defense Against Adversarial attack) of generating a more generic and robust classifier are needed, which has become a hot research topic.
For example, similarly to network security research, research on attack methods and research on defensive methods to prevent such attacks make a set. In other words, thinking about a more powerful attack method can lead to studies, research, and development of countermeasures before a malicious person or organization executes such an attack, and therefore, attacks can be prevented beforehand, which has great social significance.
As described above, for example, as a classifier which performs image recognition for automatic driving, more versatile and robust methods against attacks conducted by malicious persons and organizations are required. Generating such a versatile and robust classifier is inextricably linked to an attack that adds arbitrary noise to an image sample and causes wrong classification so that proposing a more powerful attack method is desired.
It should be noted that the stronger attack methods required to generate versatile and robust classifiers for various attacks are not limited to those generating classifiers for image recognition in automatic driving (image classifier, image classification model, and image classification network), and are also applicable to generation of classifiers used in various fields.
By the way, in general, the attack image is generated by adding a certain noise to a given actual image. For example, changing a predetermined pixel in the actual image or sticking the actual image can also be considered as a kind of noise. However, such an approach does not produce an attack image that always has the effect of attack against arbitrary actual image, and is not satisfactory as an attack method that adding arbitrary noise to the actual image and causes wrong classification.
The present embodiments have been made in view of the above-mentioned problems, and it is an object of the present embodiments to provide an image processing system, an image processing method, and an image processing program relating to an attack method for adding arbitrary noise to an image sample to cause wrong classification in order to enable generation of more versatile and robust classifiers system.
According to an aspect of the present embodiments, there is provided an image processing system for generating an attack image including an attack network, and a plurality of image classification networks for an attack target, each including different characteristics. The attack network is configured to generate the attack image by performing forward processing on a given image.
Each of the image classification networks is configured to classify the attack image by performing forward processing on the attack image, and calculate gradients making a classification result inaccurate by performing backward processing. The attack network is configured to perform learning by using the gradients calculated by the plurality of image classification networks.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
The present invention will be understood more clearly by referring to the following accompanying drawings.
First, before embodiments of an image processing system, an image processing method, and an image processing program according to the present embodiments are explained in details, an example of an image processing system and problems associated therewith will be explained with reference to
In the example of the image processing system as illustrated in
However, the image processing system (image processing method, attack method) illustrated in
In another example of the image processing system illustrated in
The attack network 101 directly generates an attack image Ia0 obtained by adding noise to the actual image Ir0, and for example, since the attack network 101 can learn effective noise occurrence process itself (machine learning, depth learning), the attack network 101 is considered have a high degree of versatility. However, the image processing system illustrated in this
Hereinafter, the embodiments of the image processing system, the image processing method, and the image processing program according to the present embodiments will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments described in detail below relate to attacks aiming for incorrect classification by adding arbitrary noise to the image sample (input image, actual image), for example, but similarly to the technical field of network security, the defense that prevents attack is to consider the countermeasure method by considering a more powerful attack method and to make it possible to prevent attacks by research and development, before a malicious human or organization executes such an attack, as described above.
Also, considering the intent of the attacker, specifying a category to be misclassified (erroneous classification) makes it possible to perform a more destructive attack, resulting in a higher degree of severity (Targeted Adversarial Attack). Specifically, in recognition of a traffic sign, for example, there is a possibility that a major problem may arise in cases such as when the sign of temporary stop is erroneously recognized as a sign with a maximum speed of 50 km.
Embodiments of an image processing system, an image processing method and an image processing program according to the present embodiments can also be used for attacks when such categories to be misidentified are specified, and this is made possible by calculating the gradient and the noise to the input layer such that the prediction result of the attack target network tilts to the category to be erroneously determined, as will be described later.
Reference numeral G10 indicates a gradient calculated by the backward processing Pb10 of the attack network 10, reference numeral G21 indicates a gradient (gradient in the direction of incorrectness) in which the classification result calculated by the backward processing Pb21 of the image classification network 21 becomes inaccurate, and reference numeral G22 indicates a gradient in which the classification result calculated by the backward processing Pb22 of the image classification network 22 becomes inaccurate.
As illustrated in
In
Here, in the image processing system of the first embodiment, the classification device selected (set) as the image classification network 21, 22, . . . as the attack target, can be determined such that, for example, in the case where the classification device is actually known or predictable, the classification device can be determined based on its known or predictable classification device. Since the image processing system according to the first embodiment simultaneously gives the attack image Ia to the plurality of image classification networks 21, 22, . . . to learn the attack network 10, the image processing system according to the first embodiment can be efficiently executed in the multi-computer environment.
The attack network 10 includes forward processing Pf 10 that receives an actual image Ir and generates an attack image Ia and backward processing Pb 10 that calculates a gradient G10 based on the attack image Ia. Here, the attack image Ia is generated by using, for example, by adding the gradients (gradients in which the classification result becomes inaccurate) G21, G22, . . . calculated by the backward processing Pb 21, Pb 22, . . . of the plurality of image classification networks 21, 22, . . . , so as to be an image that is likely to induce incorrect determination caused by the plurality of image classification networks 21, 22, . . . . That is, the attack network 10 learns based on the gradient (G10+G21+G22+ . . . ) obtained by adding the gradients G21, G22, . . . to the gradient G10 to generate the attack image Ia.
As described above, the image processing system (image processing method, image processing program) according to the first embodiment sets the plurality of image classification networks 21, 22, . . . as the attack target with the single attack network 10, and accordingly, the attack network 10 learns (machine learning, depth learning) so that the loss function of all the image classification networks 21, 22, . . . becomes worse. Thus, for example, the attack image Ia which has the ability to suppress over learning for one model (one image classification network) and strongly make incorrect determination on another image classification network that was not used as an attack target can be generated. As described above, according to the first embodiment, for example, the attack network 10 which is more accurate than existing methods can be efficiently constructed, particularly in a multi-computer environment.
Next, the second embodiment of the image processing system according to the present embodiments will be explained, which is to let the attack network 10 in
For example, in the convolution part of the attack network (eg, CNN) 10, the known image processing system outputs noise by 3 (RGB)×image size, which is multiplied by E (for example, if ε=4, 4 times, or if ε=16, 16 times) and the result is added to the image (actual image Ir), but this cannot properly generate noise that cancels the texture and cannot produce a satisfactory attack image Ia. Therefore, in order to be able to generate noise of different scales at the same time, in order to be able to generate noise for the number of channels (13: possible values of scale ε (each integer value of 4 to 16))×3×image from one attack network 10, the image processing system according to the second embodiment is configured to learn as separate tasks and make separate outputs.
For example, 13 channels corresponding to multiple scales where e are 4 to 16 are introduced, and noise of 13×3×image size (actual image Ir) is output. In this case, 13 channels are based on 13 possible values of E (each integer value of 4 to 16), i.e., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16, which are totally 13 values. Then, the attack network 10 generates an attack image Ia using noise corresponding to externally given scale c, for example.
Thus, according to the image processing system of the second embodiment, the attack network 10 can generate the attack image Ia corresponding to the scale ε at high speed. Alternatively, according to the image processing system of the second embodiment, the attack image Ia in which the scale ε is flexibly set can be generated in a short time. When the classification device on the defending side (image classification network of the attack target) is considered, the attack network 10 can generate various attack images Ia with different scales ε at high speed and give the attack images I to the image classification network. Therefore, a more versatile and robust classification device (image classification network of attack target) can be generated in a short period of time. Therefore, a more versatile and robust classification device (image classification network of attack target) can be generated in a short period of time.
More specifically, as illustrated in
Here, the neural network includes an input layer, an intermediate layer, and an output layer. However, for example, in a case where the intermediate layer of the image classification network 20 and the intermediate layer of the attack network 10 do not correspond directly, for example, processing is appropriately performed so that the data Df21, Df22, Db21, Db22 obtained from the image classification network 20 can be used by the attack network 10. More specifically, when the number of the intermediate layers of the attack network 10 is much larger than the number of the intermediate layers of the image classification network 20, for example, the data Df21, Df22, Db21, Db2 obtained from the intermediate layers of the image classification network 20 are given to the attack networks 10 for each of the plurality of intermediate layers. Thus, according to the image processing system of the third embodiment, the attack image Ia that can further make the classification result inaccurate can be generated.
Although the image processing system according to the fifth embodiment has the same configuration as the image processing system according to the fourth embodiment described above, the attack images Ia1, Ia2, . . . respectively generated by the image processing units U1, U2, . . . are adopted as the final attack image Ia. The image processing system according to the fifth embodiment adopts the final attack image on the basis of how the image classification network of the actual attack target reacts against the plurality of attack image candidates Ia1, Ia2, . . . respectively generated by the plurality of image processing units U1, U2, . . . and more specifically by confirming the accuracy of the classification of the image classification network.
It should be noted that the usage of the gradients G201, G202 obtained by the backward processing Pb20 of each image classification network 20 and the use of the data Df21, Df22 and Db21, Db22 of the gradients and the activation in the intermediate layer obtained by the forward processing Pf20 and the backward processing Pb20 of the image classification network 20 are similar to those explaining
Here, in the image processing system of the fourth embodiment and the fifth embodiment shown in
By the way, when implementing the image processing system according to the present embodiments, for example, it is preferable to use a computer equipped with GPGPU (General-Purpose computing on Graphics Processing Units (or GPU)) capable of executing parallel data processing at high speed. Of course, it is also possible to use a computer equipped with an accelerator based on an FPGA (Field-Programmable Gate Array) for accelerating parallel data processing, or a computer to which a special processor dedicated to the processing of the neural network is applied, but both computers can perform parallel data processing at high speed. When using such a computer to implement the image processing system according to the present embodiments, for example, as compared to processing of attack network (10), the processing of image classification network is lighter (load is smaller, processing time is shorter), and therefore, resources can be effectively used when multiple sets of processing are performed at the same time.
More specifically, when implementing the image processing system pertaining to the present embodiments, as a precondition, both the attack network (10) and the attack target networks (21 to 24) consume large amounts of GPU (GPGPU) memory, and therefore, it is difficult to perform multi-target and the like using a single GPU or a single computer (single computer), and an environment using multiple computers (multiple computer environment) is required. Therefore, it is preferable that each attack target network of multiple targets is assigned to a different computer, the attack image generated by each computer is shared, and the number processed at one time (batch size) is changed between when used by the attack network and when used by the attack target network.
As illustrated in
More specifically, in the image processing system according to the sixth embodiment, the image classification networks 21 to 24 in the workers W1 to W4 simultaneously receive and process four different attack images Ia11 to Ia41 from the four workers W1 to W4. By parallelizing in this manner, learning efficiency can be improved.
Next, in the image processing system of the seventh embodiment, attention is given to the fact that the parallelism in the data direction and the parallelism in the model direction are independent, i.e., learning of the image classification networks (21 to 24) different from the processing of different input images (actual images Ir 11 to Ir 41) is independent, and thereby efficiency is further improved.
In the image processing system according to the sixth embodiment described above, the four attack images (attack image candidates) Ia11 to Ia41 generated by the respective workers W1 to W4 are collected and commonly given to (shared by) the image classification networks 21 to 24 with the four workers W1 to W4. In contrast, in the image processing system according to the seventh embodiment, five images are given as actual images in the workers W1 to W4. More specifically, in addition to the actual image Ir 11 of the panda, for example, the actual image Ir12 of a tiger, the actual image Ir13 of a mouse, the actual image Ir15 of a cat, and the actual image Ir 14 of a squirrel are given to the attack network of worker W1, and processing is performed in parallel. Likewise, 5 actual images Ir21 to Ir25, Ir31 to Ir35, and Ir41 to Ir45 are the worker W2, W3, and W4, respectively, to perform processing in parallel. More specifically, the attack networks of the workers W1 to W4 receive 5 actual images, perform forward processing, and each outputs 5 attack images (batch size 5).
As a result, the attack images generated by the workers W1, W2, W3, and W4 are five attack images Ia11 to Ia5, Ia21 to Ia25, Ia31 to Ia35, and Ia41 to Ia4, and the image classification networks 21 to 24 of the workers W1 to W4 process 5×4=20 attack images (Ia11 to Ia5, Ia21 to Ia25, Ia31 to Ia35 and Ia41 to Ia45) (Allgather). More specifically, each of the image classification networks 21 to 24 of the workers W1 to W4 receives 20 images and performs the forward processing and the backward processing (batch size 20). Further, the gradients are reduce-scattered, and the attack image candidates (attack images) are given to the workers W1 to W4, and the backward processing is performed in each attack network (batch size 5).
As described above, the image processing system according to the seventh embodiment has such behavior that communication is performed in the middle of computation, and the batch size changes. This is because while the attack network is quite huge and the batch size becomes quite small, whereas the attack target network is a little bit smaller, and the batch size can be increased (it is not efficient unless the batch size is increased). As described above, according to the image processing system of the sixth embodiment and the seventh embodiment, high speed and efficient processing can be realized. It should be noted that the number of workers in
It should be understood that the image processing system according to each embodiment described above can be provided as an image processing program or an image processing method for a computer capable of the above-described high-speed parallel data processing, for example.
In the above, the image processing system, the image processing method, and the image processing program according to the present embodiments are not limited to the application of the attack side which generates the attack image, and for example, the attack target network can be improved by using the output of the attack network, the image processing system, the image processing method, and the image processing program according to the present embodiments can also be applied to the defending side.
According to the image processing system, the image processing method, and the image processing program of the present embodiments, the proposal of the attack method which makes arbitrary noise added to an image sample to make incorrect classification achieves an effect of enabling generating of a more generic and robust classification device.
It should be understood that one or a plurality of processors may realize functions of the image processing system or the image processing unit described above by reading and executing a program stored in one or a plurality of memories.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2017-207087 | Oct 2017 | JP | national |