The present invention concerns a method for obtaining an adversarial input signal, a method for using and/or training a classifier, a method for assessing a robustness of said classifier, and a method for operating an actuator, a computer program and a machine-readable storage medium, a classifier, a control system, and a training system.
U.S. Pat. No. 10,007,866 BB describes a method comprising: accessing, from a memory, a neural network image classifier, the neural network image classifier having been trained using a plurality of training images from an input space, the training images being labeled for a plurality of classes; computing a plurality of adversarial images by, for each adversarial image, searching a region in the input space around one of the training images, the region being one in which the neural network is linear, to find an image which is incorrectly classified into the plurality of classes by the neural network; applying the training image to the neural network and observing a response of the neural network; computing a constraint system which represents the input space using the observed response; and further training the neural network image classifier to have improved accuracy using at least the adversarial images.
“Universal Adversarial Perturbations Against Semantic Image Segmentation”, arXiv preprint arXiv:1704.05712v3, Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, Volker Fischer, describe a method for generating adversarial perturbations.
Classifiers, like, e.g., neural network classification systems can easily be fooled. It is well known that classifiers which may be based on deep learning may be sensitive to small perturbations. In order to deploy such systems in the physical world, it may be important to provide a proof about the system's robustness.
It is possible to compute robust classifiers with respect to adversarial noise that lies within a small Lp ball. Nonetheless, adversarials with respect to more natural perturbations are not necessarily covered by these robustness statements. More natural perturbations include partial translations, rotations and motion blur. Moving for example a dark object by one pixel will lead to a very large L∞-distance if the background is very bright, but will usually be considered as a small change in the physical world. As a consequence, these small physical changes are not covered by L∞-robustness.
Methods in accordance with example embodiments of the present invention may have the advantage to improve robustness with respect to such perturbations.
In a first aspect, the present invention therefore is concerned with a computer-implemented method for obtaining an adversarial input signal (xadv) to a classifier for classifying input signals (x) to said classifier which, wherein said input signals (x) may have been obtained from a sensor, wherein said adversarial input signal (xadv) is obtained from an original input signal (xorg) which may have been obtained from said sensor, and wherein said adversarial input signal (xadv) and said original input signal (xorg) cause the classifier to classify said original input signal (xorg) as belonging to a first class (0) and said adversarial input signal (xadv) as belonging to a second class () different from said first class (0), wherein the method comprises the steps of:
The term “projecting a given point onto a metric ball” may be construed to mean “determining a point from said metric ball that is closest to said given point”.
Said at least approximate Wasserstein distance may be characterized by predefined parameters, like e.g. a predefined radius.
Said Wasserstein distance is hard to compute, and it has not been known so far how to carry out a projection on a Wasserstein ball.
It is therefore proposed to determine said projected input signal (xproj) by minimizing a distance to said modified input signal (xmod) under a constraint that a distance, for example measured by an L2-metric, according to said at least Wasserstein distance is not larger than a predefined radius (∈) of said metric ball.
This enables an exact solution, i.e., a solution where said at least approximate Wasserstein distance is a Wasserstein distance. Conveniently, said aforementioned minimization may be obtained by maximizing a dual problem corresponding to a primal problem that is given by said minimization under said constraints, wherein said dual problem comprises a Langrangian multiplier variable corresponding to said constraints. This is described in detail in
In a preferred embodiment, said at least approximate Wasserstein distance is a Sinkhorn distance which differs from a Wasserstein distance by an entropic term, wherein for any pair of first distribution (P) and second distribution (Q), said entropic term characterizes an entropy of a distribution H that satisfies Π1n=P, ΠT1n=Q. If P and Q are distributions defined over the same domain Ω, then Π is a distribution over the domain Q×Q with P and Q as its marginals.
It has been discovered that the inclusion of said entropic term enables an approximate solution to the projection on a Wasserstein ball that is a lot faster to compute.
It should be noted that, as described in “Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances”, arXiv preprint arXiv: 1306.0895v1, Marco Cuturi (2013), a Sinkhorn distance is in fact not a metric in the mathematical sense, since it is possible to have a zero distance between two distributions that are not the same. Instead, in a mathematical sense, it is a pseudo-metric.
In fact, it has been discovered that a good way for determining said projected input signal (xproj) by solving a convex optimization corresponding to said minimization under said constraints. This is described in detail in the description corresponding to
In one further aspect of the present invention, said adversarial input signal (xadv) may be provided by a targeted attack, i.e., provided to cause said classifier to classify it as belonging to a predefined second class. An efficient way for doing so can be provided if said classifier when provided with said input signal (x), is configured to output a first classification value (fl
In alternative embodiment to said targeted attack, said adversarial input signal (xadv) may be provided by an untargeted attack, i.e., provided to cause said classifier to classify it as belonging to any different second class. In this case, conveniently said modified input signal (xmod) is provided such as to cause said first classification value (fl
In a further aspect of the present invention, the steps of modifying said original input signal (xorg) to yield said modified input signal (xmod) and projecting said modified input signal (xmod) onto said predefined subset to yield said projected input signal (xproj) are carried out iteratively by using said projected input signal (xproj) of a preceding iteration as original input signal (xorg) a subsequent iteration, wherein said step projecting said modified input signal (xmod) onto said predefined subset is carried out after each step of modifying said original input signal (xorg). Such an iterative method is preferable, because it ensures that intermediate modified input signal (xmod) remain close to a boundary of the at least approximate Wasserstein ball, thus enhancing convergence of the method.
Embodiments of the present invention are discussed with reference to the figures in more detail.
Shown in
Thereby, control system 40 receives a stream of sensor signals S. It then computes a series of actuator control commands A depending on the stream of sensor signals S, which are then transmitted to actuator 10.
Control system 40 receives the stream of sensor signals S of sensor 30 in an optional receiving unit 50.
Receiving unit 50 transforms the sensor signals S into input signals x. Alternatively, in case of no receiving unit 50, each sensor signal S may directly be taken as an input signal x. Input signal x may, for example, be given as an excerpt from sensor signal S. Alternatively, sensor signal S may be processed to yield input signal x. Input signal x comprises image data corresponding to an image recorded by sensor 30. In other words, input signal x is provided in accordance with sensor signal S.
Input signal x is then passed on to an image classifier 60, which may, for example, be given by an artificial neural network.
Classifier 60 is parametrized by parameters ϕ, which are stored in and provided by parameter storage St1.
Classifier 60 determines output signals y from input signals x. The output signal y comprises information that assigns one or more labels to the input signal x Output signals y are transmitted to an optional conversion unit 80, which converts the output signals y into the control commands A. Actuator control commands A are then transmitted to actuator 10 for controlling actuator 10 accordingly. Alternatively, output signals y may directly be taken as control commands A.
Actuator 10 receives actuator control commands A, is controlled accordingly and carries out an action corresponding to actuator control commands A. Actuator 10 may comprise a control logic which transforms actuator control command A into a further control command, which is then used to control actuator 10.
In further embodiments, control system 40 may comprise sensor 30. In even further embodiments, control system 40 alternatively or additionally may comprise actuator 10.
In still further embodiments, it may be envisioned that control system 40 controls a display 10a instead of an actuator 10.
Furthermore, control system 40 may comprise a processor 45 (or a plurality of processors) and at least one machine-readable storage medium 46 on which instructions are stored which, if carried out, cause control system 40 to carry out a method according to one aspect of the invention.
Sensor 30 may comprise one or more video sensors and/or one or more radar sensors and/or one or more ultrasonic sensors and/or one or more LiDAR sensors and or one or more position sensors (like e.g. GPS). Some or all of these sensors are preferably but not necessarily integrated in vehicle 100.
Alternatively or additionally sensor 30 may comprise an information system for determining a state of the actuator system. One example for such an information system is a weather information system which determines a present or future state of the weather in environment 20.
For example, using input signal x, the classifier 60 may for example detect objects in the vicinity of the at least partially autonomous robot. Output signal y may comprise an information which characterizes where objects are located in the vicinity of the at least partially autonomous robot. Control command A may then be determined in accordance with this information, for example to avoid collisions with said detected objects.
Actuator 10, which is preferably integrated in vehicle 100, may be given by a brake, a propulsion system, an engine, a drivetrain, or a steering of vehicle 100. Actuator control commands A may be determined such that actuator (or actuators) 10 is/are controlled such that vehicle 100 avoids collisions with said detected objects. Detected objects may also be classified according to what the classifier 60 deems them most likely to be, e.g. pedestrians or trees, and actuator control commands A may be determined depending on the classification.
In further embodiments, the at least partially autonomous robot may be given by another mobile robot (not shown), which may, for example, move by flying, swimming, diving or stepping. The mobile robot may, inter alia, be an at least partially autonomous lawn mower, or an at least partially autonomous cleaning robot. In all of the above embodiments, actuator command control A may be determined such that propulsion unit and/or steering and/or brake of the mobile robot are controlled such that the mobile robot may avoid collisions with said identified objects.
In a further embodiment, the at least partially autonomous robot may be given by a gardening robot (not shown), which uses sensor 30, preferably an optical sensor, to determine a state of plants in the environment 20. Actuator 10 may be a nozzle for spraying chemicals. Depending on an identified species and/or an identified state of the plants, an actuator control command A may be determined to cause actuator 10 to spray the plants with a suitable quantity of suitable chemicals.
In even further embodiments, the at least partially autonomous robot may be given by a domestic appliance (not shown), like e.g. a washing machine, a stove, an oven, a microwave, or a dishwasher. Sensor 30, e.g. an optical sensor, may detect a state of an object which is to undergo processing by the household appliance. For example, in the case of the domestic appliance being a washing machine, sensor 30 may detect a state of the laundry inside the washing machine. Actuator control signal A may then be determined depending on a detected material of the laundry.
Shown in
Sensor 30 may be given by an optical sensor which captures properties of e.g. a manufactured product 12. Classifier 60 may determine a state of the manufactured product 12 from these captured properties. Actuator 10 which controls manufacturing machine 11 may then be controlled depending on the determined state of the manufactured product 12 for a subsequent manufacturing step of manufactured product 12. Or, it may be envisioned that actuator 10 is controlled during manufacturing of a subsequent manufactured product 12 depending on the determined state of the manufactured product 12.
Shown in
Control system 40 then determines actuator control commands A for controlling the automated personal assistant 250. The actuator control commands A are determined in accordance with sensor signal S of sensor 30. Sensor signal S is transmitted to the control system 40. For example, classifier 60 may be configured to e.g. carry out a gesture recognition algorithm to identify a gesture made by user 249. Control system 40 may then determine an actuator control command A for transmission to the automated personal assistant 250. It then transmits said actuator control command A to the automated personal assistant 250.
For example, actuator control command A may be determined in accordance with the identified user gesture recognized by classifier 60. It may then comprise information that causes the automated personal assistant 250 to retrieve information from a database and output this retrieved information in a form suitable for reception by user 249.
In further embodiments, it may be envisioned that instead of the automated personal assistant 250, control system 40 controls a domestic appliance (not shown) controlled in accordance with the identified user gesture. The domestic appliance may be a washing machine, a stove, an oven, a microwave or a dishwasher.
Shown in
Shown in
Shown in
Shown in
Classifier 60 is configured to compute output signals y from input signals x. These output signals x are also passed on to assessment unit 180.
A modification unit 160 determines updated parameters ϕ′ depending on input from assessment unit 180. Updated parameters ϕ′ are transmitted to parameter storage St1 to replace present parameters ϕ.
For example, it may be envisioned that assessment unit 180 determines the value of a loss function depending on output signals y and desired output signals ys. Modification unit 160 may then compute updated parameters ϕ′ using e.g. stochastic gradient descent to optimize the loss function L .
Furthermore, modification unit 160 may compute an adversarial dataset T′ comprising modified input signals xadv based on original input signals x taken, for example, from training set T and their respective desired output signals ys.
Furthermore, training system 140 may comprise a processor 145 (or a plurality of processors) and at least one machine-readable storage medium 146 on which instructions are stored which, if carried out, cause control system 140 to carry out a method according to one aspect of the invention.
Shown in
First (901), classifier 60 is trained with training data of set T in a conventional manner, as discussed above.
Then (902), one or more adversarial input signals xadv and corresponding desired output signals ys are generated with the method according illustrated in
Now (903), classifier 60 is trained with training data of set adversarial dataset T′. The trained classifier 60 may then (904) be used for providing an actuator control signal A by receiving sensor signal S comprising data from sensor 30, determining the input signal x depending on said sensor signal S, and feeding said input signal x into classifier 60 to obtain output signal y that characterizes a classification of input signal x. Actuator 10 or 10a may then be controlled in accordance with provided actuator control signal A. This concludes the method.
Shown in
First (911), parameters 4 that characterize the operation of classifier 60 are provided. Conventionally, they are obtained by a training method for training classifier 60, e.g. by supervised training as outlined above.
The trained classifier 60 may then (912) be used for providing a first output signal y1 by receiving sensor signal S comprising data from sensor 30, determining the input signal x depending on said sensor signal S, and inputting said input signal x into classifier 60 to obtain first output signal y1 that characterizes a classification of input signal x.
Then (913), an adversarial input signal xadv is generated with the method according illustrated in
This adversarial input signal xadv is then (914) inputted into classifier 60 to obtain a second output signal y2 that characterizes a classification of adversarial input signal xadv.
Next (915), a parameter vu indicating a vulnerability of classifier 60 is computed based on said first output signal y1 and said second output signal y2. For example, it is possible to set said parameter vu to a first value (for example “1”) indicating a vulnerability, if said first output signal y1 is not equal to said second output signal y2, and equal to a first value (for example “0”) indicating a non-vulnerability, if said first output signal y1 is equal to said second output signal y2.
An actuator control signal (A) may then (916) be determined in accordance with said parameter vu, and actuator (10) may be controlled in accordance with said actuator control signal (A). For example, if said parameter vu indicates a non-vulnerability, said actuator control signal (A) may then be determined to correspond to normal operation mode, whereas, if said parameter vu indicates a vulnerability, said actuator control signal (A) may then be determined to correspond to a fail-safe operation mode, by e.g. reducing a dynamics of a motion of said actuator (10).
Then (1100), modified input signal xmod inputted into classifier 60 and corresponding vector f(xmod) is determined. Then, a scalar function g(x)=−f0 is evaluated and its gradient ∇g(x)|x=x
x
mod
=x
mod
+τ·∇g(x)|x=x
Next (1200), a projected input signal xproj is be determined by projecting modified input signal xmod onto a Wasserstein ball with a predefined radius ∈ centered around original input signal xorg. This projection may be carried out with one of the methods illustrated in
Then (1300), the counter is incremented counter←counter+1 and it is checked (1400), if the counter is a multiple of a predefined number, e.g. 20. If that is the case (1500), the counter is reset to counter=0 and step size τ is increased by a predefined factor, e.g. τ←τ·1.1.
Both of steps (1400) and (1500) are followed by checking (1600) whether the counter is less than a predefined maximum counter countermax, i.e., if counter<countermax. Furthermore, modified input signal xmod is set equal to projected input signal xproj, and scalar g(xmod) is evaluated. If counter<countermax and if g(xmod)≤ub with an upper bound ub which may be set to any non-negative number, e.g. ub=0 (i.e., classification has not changed from correct classification 0 to target classification , the method iterates back to step (1200). If not (1700), adversarial input signal xadv is provided as equal to modified input signal xmod. Optionally, if g(xadv)≤ub an error message may be provided indicating that no adversarial has been found with the desired confidence. This concludes the method.
(Here, the 1 in Π1=P, ΠT1=Q denotes an n-dimensional vector of ones). Determining said projected input signal xproj then corresponds to solving equation
(Of course, the L2-metric may be replaced by any other metric).
First (1310), it is determined whether the Wasserstein distance WD(xmod, xorg) between modified input signal xmod and original input signal xorg is not larger than predefined radius e, i.e., if
W
D(xmod,xorg)≤∈. (4)
If that is the case (1320), projected input signal xproj is set equal to modified input signal xmod, and the method ends.
If that is not case (1330), and denoting P=xmod and Q=xorg, equation
is solved with e.g. projected gradient ascent to yield maximizing values Φ*, Ψ*, ρ*. It will be appreciated that equation (5) is a dual formulation to a primal problem given by equations (3) and (2).
Next (1340), Π as defined in equation (2) is determined from the maximizing values Φ*, Ψ*, ρ* using e.g. the method illustrated in
Then (1350), projected input signal xproj is set equal to xproj=ΠT1. This concludes the method.
with a predefined variable λ≠0, e.g. λ=1.
First (1311), it is determined whether the Sinkhorn distance WDλ(xmod,xorg) between modified input signal xmod and original input signal xorg is not larger than predefined radius ∈, i.e., if
W
D
λ(xmod,xorg)≤∈. (8)
If that is the case (1321), projected input signal xproj is set equal to modified input signal xmod, and the method ends.
If that is not case (1331), and denoting P=xmod and Q=xorg, a variable ρ is initialized as ρ=1 and two n-dimensional vectors R, S are initialized by setting each of their components equal to e.g. Ri=Si=1/n.
Then (1341), an exponential n×n-dimensional matrix K is computed as
K=exp(−λ·ρ·D). (9)
Next (1351), components Ri of matrix R are updated as
R
i
=P
i/(K·S)i (10)
and components Si of matrix S are updated as
with
tmp=λK
T
R. (12)
Then (1361), scalars g and h are computed as
g=
R,DKS
−∈
(13)
h=−λ
R,DDKS (14)
where . . . denotes a scalar product, i.e., entry-wise multiplication followed by a sum over all multiplied entries. A value α is set to a positive value, e.g. α=1.
Then (1371), α is set such that
but also
e.g. by updating
as long as
Now (1381), ρ is updated as
Next (1391), it is checked whether the method as converged, e.g., if changes to R and/or S over the last iteration are sufficiently small (e.g. less than a predefined threshold). If that is not the case, the method iterates back to step (1341). If the method has converged, however, step (1392) follows.
In this step, one sets
Πij=Ri,K·Si (15)
x
proj=ΠT1 (16)
This concludes the method.
The term “computer” covers any device for the processing of pre-defined calculation instructions. These calculation instructions can be in the form of software, or in the form of hardware, or also in a mixed form of software and hardware.
It is further understood that the procedures cannot only be completely implemented in software as described. They can also be implemented in hardware, or in a mixed form of software and hardware.
Number | Date | Country | Kind |
---|---|---|---|
18213925.3 | Dec 2018 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/082757 | 11/27/2019 | WO | 00 |