The embodiments discussed in the present disclosure are related to Deep Neural Networks and systems and methods of measuring the robustness thereof.
Deep Neural Networks (DNNs) are increasingly being used in a variety of applications. Despite the recent popularity, recent research has shown that DNNs are vulnerable to noise in the input. More specifically, even a small amount of noise injected into the input of the DNN can result in a DNN, which is otherwise considered to be high-accuracy, returning inaccurate predictions.
The subject matter claimed in the present disclosure is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one example technology area where some embodiments described in the present disclosure may be practiced.
According to an aspect of an embodiment, a method of evaluating the robustness of a Deep Neural Network (DNN) model including obtaining a set of training data-points correctly predicted by the DNN model, obtaining a set of realistic transformations of the set of training data-points correctly predicted by the DNN model, the set of realistic transformations corresponding to additional data-points within a predetermined mathematical distance from each of a training data-point of the set of training data-points, creating a robustness profile corresponding to whether the DNN model accurately predicts an outcome for the additional data-points of the set of realistic transformations, and generating a robustness evaluation of the DNN model based on the robustness profile.
The objects and advantages of the embodiments will be realized and achieved at least by the elements, features, and combinations particularly pointed out in the claims.
Both the foregoing general description and the following detailed description are given as examples and are explanatory and are not restrictive of the invention, as claimed.
Example embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Some embodiments described in the present disclosure relate to methods and systems of measuring the robustness of Deep Neural Networks (DNNs). A DNN is an artificial neural network (ANN) which generally includes an input layer and an output layer with multiple layers between the input and output layers. As the number of layers between the input and output increases, the depth of the neural network increases and the performance of the neural network is improved.
The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The network moves through the layers calculating the probability of each output. Each mathematical manipulation as such is considered a layer, and complex DNN have many layers, hence the name “deep” networks.
Deep Neural Networks (DNNs) are increasingly being used in a variety of applications. Examples of a few fields of application include autonomous driving, medical diagnostics, malware detection, image recognition, visual art processing, natural language processing, drug discovery and toxicology, recommendation systems, mobile advertising, image restoration, and fraud detection. Despite the recent popularity and clear utility of DNNs in a vast array of different technological areas, recent research has shown that DNNs are vulnerable to noise in the input, which can result in inaccurate predictions and erroneous outputs. In the normal operation of a DNN, a small amount of noise can cause small perturbations in the output, such as an object recognition system mischaracterizing a lightly colored sweater as a diaper, but in other instances, these inaccurate predictions can result in significant errors, such as an autonomous automobile mischaracterizing a school bus as an ostrich.
In order to create a DNN which is more resilient to such noise and results in fewer inaccurate predictions, an improved system of adversarial testing with an improved ability to find example inputs which result in inaccurate predictions which cause the DNN to fail or to be unacceptably inaccurate is disclosed. One benefit of finding such example inputs may be the ability to successfully gauge the reliability of a DNN. Another benefit may be the ability to use the example inputs which result in inaccurate predictions to “re-train” or improve the DNN so that the inaccurate predictions are corrected.
Embodiments of the present disclosure are explained with reference to the accompanying drawings.
The DNN model 110 being evaluated may include electronic data, such as, for example, the software program, code of the software program, libraries, applications, scripts, or other logic or instructions for execution by a processing device. More particularly, the DNN model 110 may be a part of a broader family of machine learning methods or algorithms based on learning data representations, instead of task-specific algorithms. This learning can be supervised, semi-supervised, or unsupervised. In some embodiments, the DNN model 110 may include a complete instance of the software program. The DNN model 110 may be written in any suitable type of computer language that may be used for performing the machine learning. Additionally, the DNN model 110 may be partially or exclusively implemented on specialized hardware, rather than as a software program running on a computer.
The robustness computation module 102 may include code and routines configured to enable a computing device to perform one or more evaluations of the DNN model 110 to generate the robustness computation and evaluation. Additionally or alternatively, the robustness computation module 102 may be implemented using hardware including a processor, a microprocessor (e.g., to perform or control performance of one or more operations), a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). In some other instances, the robustness computation module 102 may be implemented using a combination of hardware and software. In the present disclosure, operations described as being performed by the robustness computation module 102 may include operations that the robustness computation module 102 may direct a corresponding system to perform.
Modifications, additions, or omissions may be made to
In general, the processor 250 may include any suitable special-purpose or general-purpose computer, computing entity, or processing device including various computer hardware or software modules and may be configured to execute instructions stored on any applicable computer-readable storage media. For example, the processor 250 may include a microprocessor, a microcontroller, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a Field-Programmable Gate Array (FPGA), or any other digital or analog circuitry configured to interpret and/or to execute program instructions and/or to process data. Although illustrated as a single processor in
In some embodiments, the processor 250 may be configured to interpret and/or execute program instructions and/or process data stored in the memory 252, the data storage 254, or the memory 252 and the data storage 254. In some embodiments, the processor 250 may fetch program instructions from the data storage 254 and load the program instructions in the memory 252. After the program instructions are loaded into memory 252, the processor 250 may execute the program instructions.
For example, in some embodiments, the repair module may be included in the data storage 254 as program instructions. The processor 250 may fetch the program instructions of the repair module from the data storage 254 and may load the program instructions of the repair module in the memory 252. After the program instructions of the repair module are loaded into memory 252, the processor 250 may execute the program instructions such that the computing system may implement the operations associated with the repair module as directed by the instructions.
The memory 252 and the data storage 254 may include computer-readable storage media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable storage media may include any available media that may be accessed by a general-purpose or special-purpose computer, such as the processor 250. By way of example, and not limitation, such computer-readable storage media may include tangible or non-transitory computer-readable storage media including Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM)or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices (e.g., solid state memory devices), or any other storage medium which may be used to carry or store particular program code in the form of computer-executable instructions or data structures and which may be accessed by a general-purpose or special-purpose computer. Combinations of the above may also be included within the scope of computer-readable storage media. Computer-executable instructions may include, for example, instructions and data configured to cause the processor 250 to perform a certain operation or group of operations.
Modifications, additions, or omissions may be made to the computing system 202 without departing from the scope of the present disclosure. For example, in some embodiments, the computing system 202 may include any number of other components that may not be explicitly illustrated or described.
Because the training data-points 351a-351c are used to develop the target DNN model 110, there is an expectation that the DNN model 110 will be highly accurate at points near or within a predetermined distance to those training data-points 351a-351c. In this illustration, the areas within a predetermined distance to those training points 351a-351c are referred to as areas 350a-350c of training points 351a-351c. In reality, however, often the DNN model 110 can fail, even spectacularly, within an area of a training point. For example, in the conception shown in
As may be understood, this small, predictable amount of variation, which may arise from the example traffic sign being improperly mounted on a pole, resulting in a slight skew of the traffic sign, may have significant results. This would be particularly true in applications where the image classification is utilized by an autonomous automobile which may fail to slow for the speed bumps or may direct the automobile in an incorrect direction.
At 610, the robustness of a first DNN model is evaluated using a given, domain-specific set of parametrized transforms, which are described more fully below. More particularly, in one embodiment, the parameterized transforms represent real-world sources of variation which approximate a realistic area within which to evaluate the robustness of a DNN model and which may correspond to predictable real-life variations to training data-points. This evaluation may result in the generation of a first robustness profile of the first DNN model, where the first robustness profile represents the average accuracy of prediction of the DNN model over a set of training data-points, as they are suitably perturbed, as a function of the distance of the perturbed point from the original training data-points.
At 620, the robustness of a second DNN model is evaluated using the same given, domain-specific set of parametrized transforms. This evaluation may result in the generation of a second robustness profile of the second DNN model.
At 630, a selection may be made between the first DNN model and the second DNN model based on the robustness profiles and/or the calculated robustness of the first and second DNN models.
The method 600 may improve the ability to properly evaluate and improve DNN models and their ability to effectively and efficiently perform machine learning.
Modifications, additions, or omissions may be made to the method 300 without departing from the scope of the present disclosure. For example, the operations of method 600 may be implemented in differing order. Additionally or alternatively, two or more operations may be performed at the same time. For example, the calculation of robustness of each of the first DNN model at 610 and the calculation of robustness of the second DNN model at 620 may be simultaneously performed. Furthermore, the outlined operations and actions are only provided as examples, and some of the operations and actions may be optional, combined into fewer operations and actions, or expanded into additional operations and actions without detracting from the essence of the disclosed embodiments.
At 710, the robustness of the DNN model is calculated based on a domain-specific set of parameterized transforms, as is described in more detail below. This may include representing the aggregate robustness of the DNN model to generate a robustness profile which represents the average accuracy of prediction over all the training data-points used to generate the DNN model, where the training data-points are suitably perturbed from the original training data-points in manners which correspond to predictable variations, and which are represented as a function of the distance of the perturbed points from the original training data-points.
At 720, the calculated robustness of the DNN model and/or the robustness profile may be analyzed to generate a confidence measure corresponding to the DNN's model to be resilient to predictable variations from training data-points and resilience to noise. This confidence measure may be a function that maps each test input that the user might present to the model to a confidence value that indicates the likelihood of the model having robust predictive behavior in the neighborhood of this input point. At 730, the confidence measure may be used to compute and return to the user a robustness confidence value corresponding to a test input presented to the model by the end-user.
As may be understood, modifications, additions, or omissions may be made to the method 700 without departing from the scope of the present disclosure. Furthermore, the outlined operations and actions are only provided as examples, and some of the operations and actions may be optional, combined into fewer operations and actions, or expanded into additional operations and actions without detracting from the essence of the disclosed embodiments.
More particularly, in an ideally robust system, given a training data point ρ, which is currently correctly classified by the DNN model 110, the distance d(δ) is a function that captures the perceived or human similarity between two data-points. In this example, robustness R(ρ, δ), with respect to ρ and δ is the fraction of input data-points at distance δ that are correctly classified by the DNN model 110.
It should be noted that because there are a potentially infinite number of variations, there is potentially an infinite number of data-points which may be found within the distance δ from the data point ρ. In order to limit the number of realistic variations which may be found, and as is described more fully below, embodiments herein attempt to define and utilize a closed set of realistic transformations, which simulate situations or circumstances which are likely to occur in the natural world during the process of input data capture. As such, the set of transformations T={T1, T2, . . . Tk} are designed to simulate situations or circumstances which introduce realistic variations which are likely or most likely to occur.
For example, for image data there may be predictable or foreseeable differences in image capture variations such as camera angle, lighting conditions, artifacts in the optical equipment, or other imperfections in the image capturing process, such as motion blur, variance in focus, etc. These variations introduce realistic variations of an original subject image which may serve as a training data-point.
Given a set of parametrized transformations T={T1(ρ1), T2(ρ2), . . . Tk(ρk)} that yield realistic or parametric variations of the given data point (ρ), the point-wise robustness may be a function of T which may be used to compute a robustness measure R(ρ, δ, T), which computes robustness only the points produced by the parametrized transformations in T.
It should be noted that the LP-norm is a metric that is used in the computer vision and imaging art to measure a distance between two images by measuring the difference between two vector in a given vector space. In some instances, embodiments herein may use the L2-norm in the pixel space of the images, or Euclidean norm or Sum of Squared Difference (SSD) to measure the distance between two images. This norm is defined as:)
∥x1−x2∥2=√{square root over (Σi(x1i−x2i)2)}
where (x1i−x2i) denotes the distance between ith pixels in the two images.
Returning to
At 810, a point-wise perturbation-distance-calculation distribution is created. In one embodiment this is created according to the method 900 shown in
At 820, the point-wise perturbation-distance-classification distribution is used to calculate a robustness profile of the target DNN model 110. This is described more fully below, with one example illustrated as a block diagram of a method 1000 shown in
At 830, an optional process of using the point-wise perturbation-distance-classification distribution to identify robustness holes in the target DNN model 110. As is described more fully below, with one example illustrated as a block diagram of a method 1100 shown in
At 904, a parameter value ρ of T is obtained. Then, at 905, the transformed data-point pt=T(p, ρ). At 906 a determination is made as to whether the predicted class M(pt) of pt is the same as M(p). If not, then at 909, the prediction status s is set as being equivalent to “false.” If at 907, the determination is determined to be yes, then the method 900 proceeds to 908, where the prediction status s for the data point is set as being “true,” where the term s is equivalent to the value (true or false) of the equality comparison between the class M(p) of point p as predicted by the model M, and the class M(pt) of the point pt as predicted by the model M.
At 909, a distance δ=d(p, pt) is calculated. At 912, a tuple <p, T, p, s> is hashed by distance δ. At 914, a determination is made as to whether there are additional parameter values to be evaluated. If so, the method 900 returns to 904. If there are not more parameter values to be evaluated, the method 900 determines at 915 if there are more transformations to be evaluated. If there are more transformations to be evaluated, the method 900 returns to 903. If there are not more transformations to be evaluated, the method 900 proceeds to 916, where a determination is made as to whether there are more data-points to be evaluated. If there are more data-points to be evaluated, the method 900 returns to 902. If there are not more data-points to be evaluated, the method 900 generates and outputs the hashed δ-bin distribution as a calculated perturbation-distance distribution.
At 1010, the method 1000 retrieves the hashed δ-bin distribution as a calculated perturbation-distance distribution. This may be the result of the method described as the method 900 shown in
At 1025 the δ value of the hashed δ-bin distribution is retrieved and at 1030, the average robustness vs. the δ-value of the bin is plotted.
At 1035, a determination of whether there are remaining δ-bin in the δ-bin distribution requiring evaluation is made. If so, then the method 900 returns to 915 and the next δ-bin is retrieved. If not, then the method 900 outputs the plotted or calculated robustness profile at 1040.
At 1125 a determination is made as to whether u>a particular threshold. If so, the point p is output as an identified robustness hole at 1130. If not, then a determination is made at 1135 as to whether there are any more points p. If so, then the method 1100 returns to block 1115. If not, then the method 1100 ends with the outputted robustness holes having been identified.
As was previously described, the system and method herein calculate a point-wise robustness and/or an overall robustness of a DNN model 110, which may be used to differentiate between various DNN models for a given machine learning application. As may be understood, by providing the ability to calculate or quantify the robustness of a DNN model 110, enables a user to identify areas of the DNN model 110 which need improvement and/or to identify a particular DNN model 110 which is better suited to a particular application.
As may be understood, identifying classes of the DNN model 110 which need improvement may be used as a means for improving existing DNN models 110 or identifying areas of weakness of DNN models 110. Hence, the systems and methods described herein provide the ability to evaluate, quantify, and, in some instances, improve DNN models and provide more accurate machine learning.
As indicated above, the embodiments described in the present disclosure may include the use of a special purpose or general purpose computer (e.g., the processor 250 of
As used in the present disclosure, the terms “module” or “component” may refer to specific hardware implementations configured to perform the actions of the module or component and/or software objects or software routines that may be stored on and/or executed by general purpose hardware (e.g., computer-readable media, processing devices, etc.) of the computing system. In some embodiments, the different components, modules, engines, and services described in the present disclosure may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While some of the system and methods described in the present disclosure are generally described as being implemented in software (stored on and/or executed by general purpose hardware), specific hardware implementations or a combination of software and specific hardware implementations are also possible and contemplated. In this description, a “computing entity” may be any computing system as previously defined in the present disclosure, or any module or combination of modulates running on a computing system.
Terms used in the present disclosure and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including, but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes, but is not limited to,” etc.).
Additionally, if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations.
In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” or “one or more of A, B, and C, etc.” is used, in general such a construction is intended to include A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B, and C together, etc.
Further, any disjunctive word or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” should be understood to include the possibilities of “A” or “B” or “A and B.”
All examples and conditional language recited in the present disclosure are intended for pedagogical objects to aid the reader in understanding the present disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Although embodiments of the present disclosure have been described in detail, various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the present disclosure.