This application claims priority from Republic of Korea Patent Application No. 10-2023-0095277, filed on Jul. 21, 2023, which is hereby incorporated by reference in its entirety.
The present disclosure relates to an adversarial attack method and apparatus, and particularly to, an adversarial attack method and apparatus for verifying vulnerabilities and improving the robustness of artificial intelligence models that process programming languages.
Recently, artificial intelligence technology that helps software developers by automatically processing and analyzing source code has been developed in various fields. These artificial intelligence models are vulnerable to adversarial attacks. Most of the conventional adversarial attack methods against these source code processing artificial intelligence models are sampling-based variable name changes, which may be inefficient due to the large number of attempts and low attack success rate. In particular, the conventional methods have the issue of not completely preserving compilability, which is an important element of source code.
An aspect of the present disclosure is directed to providing a black box adversarial attack method and apparatus that shows a high attack success rate with few attempts when a model is attacked using an adversarial attack technique in a black box situation.
An adversarial attack method according to an embodiment of the present disclosure may include: selecting vulnerable positions of an original source code; acquiring open source codes based on an open source code set; selecting dissimilar codes among the open source codes based on dissimilarity of the open source codes; acquiring an attention score for each of the dissimilar codes; extracting a snippet from at least one of the dissimilar codes based on the attention scores; and generating an adversarial source code based on the at least one snippet.
The selection of the dissimilar codes among the open source codes based on dissimilarity of the open source codes may include: acquiring similarity between the open source codes; acquiring a dissimilarity score of each of the open source codes based on the similarity; and selecting some of the open sources as the dissimilar codes based on the dissimilarity score.
The acquisition of the similarity between the open source codes may include acquiring cosine similarity between the open source codes.
The extraction of the snippet from at least one of the dissimilar codes based on the attention scores may include: extracting sentences from each of the dissimilar codes; and extracting the sentence extracted from the dissimilar code with the highest attention score among the extracted sentences as the snippet.
The generation of the adversarial source code based on the at least one snippet may include: generating a dead code by inserting the at least one snippet into a character string variable name; and generating the adversarial source code by inserting the dead code into the vulnerable positions of the original source code.
An adversarial source code generation apparatus according to an embodiment of the present disclosure may include a processor and a memory storing one or more instructions executed by the processor, wherein the one or more instructions may include: selecting vulnerable positions of an original source code; acquiring open source codes based on an open source code set; selecting dissimilar codes among the open source codes based on dissimilarity of the open source codes; acquiring an attention score for each of the dissimilar codes; extracting a snippet from at least one of the dissimilar codes based on the attention scores; and generating an adversarial source code based on the at least one snippet.
The selection of the dissimilar codes among the open source codes based on dissimilarity of the open source codes may include: acquiring similarity between the open source codes; acquiring a dissimilarity score of each of the open source codes based on the similarity; and selecting some of the open sources as the dissimilar codes based on the dissimilarity score.
The acquisition of the similarity between the open source codes may include acquiring cosine similarity between the open source codes.
The extraction of the snippet from at least one of the dissimilar codes based on the attention scores may include: extracting sentences from each of the dissimilar codes; and extracting the sentence extracted from the dissimilar code with the highest attention score among the extracted sentences as the snippet.
The generation of the adversarial source code based on the at least one snippet may include: generating a dead code by inserting the at least one snippet into a character string variable name; and generating the adversarial source code by inserting the dead code into the vulnerable positions of the original source code.
An adversarial attack performance analysis method according to an embodiment of the present disclosure may include: receiving an original source code from an original source code provision apparatus; receiving an adversarial source code from an adversarial source code generation apparatus; and determining whether an artificial intelligence model has been attacked by the adversarial source code based on the original source code and the adversarial source code, wherein the adversarial source code may be generated by: selecting vulnerable positions of an original source code; acquiring open source codes based on an open source code set; selecting dissimilar codes among the open source codes based on dissimilarity of the open source codes; acquiring an attention score for each of the dissimilar codes; extracting a snippet from at least one of the dissimilar codes based on the attention scores; and generating the adversarial source code based on the at least one snippet.
The selection of the dissimilar codes among the open source codes based on dissimilarity of the open source codes may include: acquiring similarity between the open source codes; acquiring a dissimilarity score of each of the open source codes based on the similarity; and selecting some of the open sources as the dissimilar codes based on the dissimilarity score.
The acquisition of the similarity between the open source codes may include acquiring cosine similarity between the open source codes.
The extraction of the snippet from at least one of the dissimilar codes based on the attention scores may include: extracting sentences from each of the dissimilar codes; and extracting the sentence extracted from the dissimilar code with the highest attention score among the extracted sentences as the snippet.
The generation of the adversarial source code based on the at least one snippet may include: generating a dead code by inserting the at least one snippet into a character string variable name; and generating the adversarial source code by inserting the dead code into the vulnerable positions of the original source code.
According to an embodiment of the present disclosure, vulnerabilities in an artificial intelligence model served in the real world can be efficiently identified through adversarial attacks in a black box situation where no information from the model is used.
According to an embodiment of the present disclosure, the robustness of the artificial intelligence model can be improved by adding the results of the adversarial attack to a learning data set and performing additional adversarial learning.
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the disclosure and together with the description serve to explain the principle of the disclosure. In the drawings:
Description will now be given in detail according to exemplary embodiments disclosed herein, with reference to the accompanying drawings. For the sake of brief description with reference to the drawings, the same or equivalent components may be denoted by the same reference numbers, and description thereof will not be repeated. In general, suffixes such as “module” and “unit” may be used to refer to elements or components. Use of such suffixes herein is merely intended to facilitate description of the specification, and the suffixes do not have any special meaning or function. In the present disclosure, that which is well known to one of ordinary skill in the relevant art has generally been omitted for the sake of brevity. The accompanying drawings are used to assist in easy understanding of various technical features and it should be understood that the embodiments presented herein are not limited by the accompanying drawings. As such, the present disclosure should be construed to extend to any alterations, equivalents, and substitutes in addition to those which are particularly set out in the accompanying drawings.
It will be understood that although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
It will be understood that when an element is referred to as being “electrically connected with” or “connected with” another element, there may be intervening elements present. In contrast, it will be understood that when an element is referred to as being “directly and electrically connected with” or “connected with” another element, there are no intervening elements present. A singular representation may include a plural representation unless context clearly indicates otherwise.
In the following description, it should be understood that the term such as “include” or “have” is designed to designate the presence of a property, a figure, a step, an operation, an element, a component, or the combination thereof, and does not preclude a possibility of the presence of one or more other features or addition to the property, the figure, the step, the operation, the element, the component, or the combination thereof.
Referring to
The original source code provision apparatus 100 may include an original source code. The original source code provision apparatus 100 may provide the original source code to the adversarial attack performance analysis apparatus 300.
The adversarial source code generation apparatus 200 may include the original source code. The adversarial source code generation apparatus 200 may generate an adversarial source code based on the original source code. The adversarial source code generation apparatus 200 may generate the adversarial source code based on the algorithm shown in Table 1 below.
Referring to Table 1, the adversarial source code generation apparatus 200 may define a position to attack as between lines of a source code to find which position is vulnerable, extract one line from among the lines included in the source code that is most dissimilar to the original source code as a snippet, generate a dead code by putting the extracted snippet into an unused character string variable, and generate the adversarial source code by inserting the dead code into a vulnerable position. The adversarial source code generation apparatus 200 may provide the adversarial source code to the adversarial attack performance analysis apparatus 300.
The adversarial attack performance analysis apparatus 300 may receive the original source code from the original source code provision apparatus 100. The adversarial attack performance analysis apparatus 300 may receive the adversarial source code from the adversarial source code generation apparatus 200.
The adversarial attack performance analysis apparatus 300 may perform learning based on the original source code and the adversarial source code. The adversarial attack performance analysis apparatus 300 may input the original source code and the adversarial source code into the artificial intelligence model. The adversarial attack performance analysis apparatus 300 may compare an output value of the artificial intelligence model based on the original source code and an output value of the artificial intelligence model based on the adversarial source code. The adversarial attack performance analysis apparatus 300 may determine whether the artificial intelligence model has been attacked by the adversarial source code through comparison of output values and determine the robustness of the artificial intelligence model.
Referring to
The vulnerable position selection unit 210 may select vulnerable positions from the original source code. Herein, the vulnerable positions may be positions where a dead code may be inserted and may be positions that are targets of attack.
For example, the vulnerable position selection unit 210 may acquire a [CLS] expression vector for each vulnerable position of the original source code. In this connection the [CLS] expression vector may be acquired from a pre-trained program language (PL) model that has not been fine-tuned. The non-fine-tuned and pre-trained PL model may include, but is not limited to, a CodeBERT model.
The vulnerable position selection unit 210 may insert a sequence of [UNK] tokens into the vulnerable positions to acquire a changed [CLS] expression vector at each of the vulnerable positions. The [UNK] token may be a token representing unknown. The sequence of [UNK] tokens may be expressed as Equation 1 below.
In Equation 1, u may be a sequence of [UNK] tokens.
The vulnerable position selection unit 210 may acquire cosine similarity between the changed [CLS] expression vector and the [CLS] expression vector of the original source code at each of the vulnerable positions. The vulnerable position selection unit 210 may acquire a vulnerability score at each of the vulnerable positions based on the cosine similarity at each of the vulnerable positions. The vulnerable position selection unit 210 may sort the vulnerable positions based on the vulnerability score at each of the vulnerable positions. The vulnerable position selection unit 210 may transmit information about the vulnerable positions to the adversarial source code generation unit 240.
The dissimilar code selection unit 220 may acquire open source codes from an open source code set. Herein, the open source codes may be the source code that is dissimilar to the original source code. The dissimilar code selection unit 220 may acquire a [CLS] expression vector for each of the open source codes. The dissimilar code selection unit 220 may acquire the cosine similarity between the [CLS] expression vectors of open source codes. The dissimilar code selection unit 220 may acquire a dissimilarity score for each of the open source codes based on the cosine similarity. The dissimilar code selection unit 220 may extract K open source codes with a high dissimilarity score among the open source codes. The dissimilar code selection unit 220 may transmit the K open source codes to the snippet extraction unit 230.
The snippet extraction unit 230 may receive the K open source codes from the dissimilar code selection unit 220. The snippet extraction unit 230 may acquire an attention score for each of the K open source codes. The snippet extraction unit 230 may acquire the attention score of each token in a statement from a second-to-last layer of the non-fine-tuned and pre-trained PL model. The snippet extraction unit 230 may acquire the attention score of each token from the pre-trained PL model without fine-tuning the attention score of each token. The non-fine-tuned and pre-trained PL model may include, but is not limited to, the CodeBERT model.
The snippet extraction unit 230 may extract sentences from each of the K open source codes. The snippet extraction unit 230 may select the sentence extracted from the open source code with the highest attention score among the extracted sentences as the snippet. The snippet extraction unit 230 may transmit the snippet to the adversarial source code generation unit 240.
The adversarial source code generation unit 240 may receive information about the vulnerable positions from the vulnerable position selection unit 210. The adversarial source code generation unit 240 may receive the snippet from the snippet extraction unit 230. The adversarial source code generation unit 240 may generate the dead code based on the snippet. The adversarial source code generation unit 240 may generate the dead code by inserting the snippet into an unused character string variable name. The adversarial source code generation unit 240 may generate an adversarial dead code based on the algorithm shown in Table 2 below.
← {cn|cn train/test data, n ≠ i}
← {c1, . . . , ck}
← {attT(toklc, . . . , attT(toknc)}
← [(1, a), (a + 1, b), . . . , (n + 1, m)]
← {d1, . . . , dk}
The adversarial source code generation unit 240 may generate the adversarial source code based on the information about vulnerable positions and dead code. The adversarial source code generation unit 240 may generate the adversarial source code by inserting the dead code into the vulnerable positions.
The programming language may be Java, C/C++, or Python. In the case of the victim model, MHM and ALERT may be conventional victim models, and DIP may be a victim model according to an embodiment of the present disclosure, and may correspond to the adversarial source code generation apparatus 200 of
Referring to
Referring to
The adversarial source code generation apparatus may acquire open source codes (S520). The adversarial source code generation apparatus may acquire open source codes from an open source code set. Herein, the open source codes may be source codes that are dissimilar to the original source code.
The adversarial source code generation apparatus may select dissimilar codes among the open source codes (S530). The adversarial source code generation apparatus may acquire similarity between the open source codes. The adversarial source code generation apparatus may acquire dissimilarity scores of the open source codes based on similarity. The adversarial source code generation apparatus may select K open source codes with high dissimilarity scores among the open source codes as dissimilar codes.
The adversarial source code generation apparatus may extract a snippet from at least one of the dissimilar codes (S540). The adversarial source code generation apparatus may acquire an attention score for each of the dissimilar codes and extract the snippet from the at least one of the dissimilar codes based on the attention scores.
The adversarial source code generation apparatus may extract sentences from each of the dissimilar codes, and extract the sentences extracted from the dissimilar code with the highest attention score among the extracted sentences as the snippet.
The adversarial source code generation apparatus may generate an adversarial source code based on the snippet (S550). The adversarial source code may generate a dead code by inserting at least one snippet into a character string variable name, and generate an adversarial source code by inserting the dead code into vulnerable positions in the original source code.
The adversarial source code generation apparatus 400 of
The processor 410 may run a program command that is stored in at least one of the memory 420 and the storage apparatus 460. The processor 410 may mean a central processing unit (CPU), a graphics processing unit (GPU), or an exclusive processor that performs the methods according to the embodiments of the present disclosure. The memory 420 and the storage apparatus 460 may each be configured of at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory 420 may be configured of at least one of a read only memory (ROM) and a random access memory (RAM).
Although most terms used in the present disclosure have been selected from general ones widely used in the art, some terms have been arbitrarily selected by the applicant and their meanings are explained in detail in the following description as needed. Thus, the present disclosure should be understood based upon the intended meanings of the terms rather than their simple names or meanings.
It is apparent to those skilled in the art that the present disclosure may be embodied in other specific forms without departing from essential characteristics of the present disclosure. Accordingly, the aforementioned detailed description should not be construed as restrictive in all terms and should be exemplarily considered. The scope of the present disclosure should be determined by rational construing of the appended claims and all modifications within an equivalent scope of the present disclosure are included in the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0095277 | Jul 2023 | KR | national |