The present invention relates to a vulnerability detection device, a vulnerability detection method, and a vulnerability detection program.
One of fundamental causes of cyber attacks and malware infections is vulnerability that exists in software. Attackers performs malicious acts on a computer through attack codes using vulnerability and malware. In order to prevent such attacks beforehand, it is important to take measures to detect and correct a vulnerability before being attacked by attackers and to cause the attackers not to use the vulnerability as footholds of attacks.
Under these circumstances, research is been carried out on a technique of testing software and detecting a vulnerability that exists in the software. One of techniques of detecting a vulnerability that exists in the software is a vulnerability detection technique using a code clone.
The code clone indicates a code that is similar or identical to another in software. The code clone is caused by an act of copying and pasting a source code of another program having similar functions in order to realize a program having a specific function during development of software by software developers. Here, when a vulnerability is detected in a source code of a copy source, it is necessary not only to correct the source code of the copy source but also to correct a source code of a copy destination. However, even if the vulnerability is detected in the copy source, it is difficult to correct the vulnerability caused by the code clone unless the developer knows all of detected code clones of the vulnerability portion. The vulnerability detection technique using the code clone is a technique of detecting unknown vulnerability in software to be tested by detecting a code clone of a portion in which the vulnerability is detected in the software to be tested.
As a vulnerability detection technique using the code clone, there is a method using a source code of software (see Non Patent Literature 1 and Non Patent Literature 2). In this method, a code clone of a vulnerability portion contained in software to be tested is detected by extracting the source code of the vulnerability portion from the software where vulnerability is detected in the past and by testing the source code of the software to be tested.
However, there is no technique for detecting a vulnerability using the code clone with a program code of the software as a target to be tested. In other words, it is necessary for the software developers to know the source code of the software to be tested in order to detect the vulnerability of the software using the code clone. Therefore, for the software that is difficult to obtain or use the source code (for example, privately owned software or software with monopoly/exclusive right set up), it is difficult to detect an unknown vulnerability.
It is therefore an object of the present invention to solve the problems and to detect an unknown vulnerability even when there is no source code of software to be tested.
To solve the problems, the present invention is a vulnerability detection device including: an extracting unit that extracts a first program code corresponding to an uncorrected vulnerability portion of software; a normalization processing unit that normalizes a parameter varying depending on compilation environment, among parameters included in the first program code extracted by the extracting unit and in a second program code of software as a target to be tested for the vulnerability portion; a similarity calculating unit that calculates a first similarity which is a similarity of an arbitrary portion of the second program code after normalization as a comparison target to the first program code; a determining unit that refers to vulnerability related information for a portion of the second program code in which the calculated first similarity exceeds a predetermined threshold, and that determines whether the portion of the second program code is an unknown vulnerability portion; and an output unit that outputs the portion of the second program code determined as the unknown vulnerability portion.
According to the present invention, even when there is no source code of software to be tested, it is possible to detect an unknown vulnerability.
Exemplary embodiments of the present invention will be explained below with reference to the accompanying drawings. The present invention is not limited by the embodiments explained below.
First of all, a configuration of a vulnerability detection device 10 will be explained below with reference to
The vulnerability detection device 10 includes a vulnerability related DB 11, a disassembling unit 12, a vulnerability portion extracting unit 13, a normalization processing unit 14, a similarity calculating unit 15, a determining unit 16, and an output unit 17.
The vulnerability related DB 11 stores vulnerability related information. The vulnerability related information is, for example, an attack verification code, CVE (Common Vulnerabilities and Exposures), security patch related to vulnerability, and a patch applied (corrected) program code.
The disassembling unit 12 disassembles software. For example, the disassembling unit 12 disassembles input test target software and software with an uncorrected vulnerability.
The vulnerability portion extracting unit 13 extracts the program code of the vulnerability portion from the disassembly result of the software. For example, when receiving the disassembly result of the software with the uncorrected vulnerability from the disassembling unit 12, the vulnerability portion extracting unit 13 refers to the vulnerability related information in the vulnerability related DB 11 to extract the program code of the vulnerability portion from the disassembly result.
To give a specific example, when using an attack verification code of the vulnerability related information, the vulnerability portion extracting unit 13 executes the attack verification code to the disassembly result of the software with the uncorrected vulnerability, and extracts a portion as a starting point of the attack as a program code of the vulnerability portion. Alternatively, when using the common vulnerabilities and exposures of the vulnerability related information, the vulnerability portion extracting unit 13 refers to CVEDB (Common Vulnerabilities and Exposures Data Base) to extract a portion specified based on information for the software as a program code of the vulnerability portion from the disassembly result of the software with the uncorrected vulnerability.
The normalization processing unit 14 performs a normalization process on the program code. The normalization process is a process of abstracting a portion (e.g., a type of register, a value of a memory address to be accessed, and a variable parameter such as an immediate value) that varies depending on compilation environment, of the program code obtained by disassembling.
For example, the normalization processing unit 14 acquires a program code of an uncorrected vulnerability portion from the vulnerability portion extracting unit 13, and acquires a disassembly result (program code of the test target software) of the test target software from the disassembling unit 12. The normalization processing unit 14 performs the normalization process on the program code of the uncorrected vulnerability portion and on the program code of the test target software.
To give a specific example, as illustrated in
The similarity calculating unit 15 calculates a similarity of an arbitrary portion of the program code of the test target software after normalization as a comparison target to the program code of the uncorrected vulnerability portion after normalization.
For example, as illustrated in
Regarding a portion (e.g., a portion indicated by reference sign 301 of
The output unit 17 outputs the portion determined as the unknown vulnerability portion by the determining unit 16, as a candidate for the unknown vulnerability portion.
Similarity Calculating Unit Details of the processing performed by the similarity calculating unit 15 will be explained next with reference to
Herein, it is assumed that the length of A is |A|=M, the length of B is |B|=N, A=a1M=a1, a2, a3, . . . , aM, and B=b1N=b1, b2, b3, . . . , bN. The score can be calculated by applying a technique (see Non Patent Literature 5) called “affine gap”, which distinguishes deduction points according to the position in an insertion or deletion potion of a character string, to Needleman-Wunsch (see Non Patent Literature 4) which is a similar string search algorithm based on dynamic programming and further by changing a score calculated portion. Then, when the score between A and B is F (A, B), the similarity calculating unit 15 can calculate a similarity between A and B by calculating F (A, B)/F (A, A).
Specific processing content for score calculation will be explained below. First of all, the similarity calculating unit 15 calculates each element of three score matrix X=={xij|0≤i≤M, 0≤j≤N}, score matrix Y={yij|0≤i≤M, 0≤j≤N}, and score matrix Z={zij|0≤i≤M, 0≤j≤N} between A and B by the following Equation (1) to Equation (3). The score matrix X is a matrix for managing a match/mismatch score between A and B. The score matrix Y is a matrix for managing a gap score of insertion in B. Moreover, the score matrix Z is a matrix for managing a gap score of deletion in A.
Although scores of match (character strings match) and mismatch (character strings do not match) in Equation (1) can be arbitrarily set, it is preferable that match (first value)>mismatch (second value) and values of |match| and |mismatch| are not too far apart from each other. In Equation (2) and Equation (3), o (open gap) is a starting score of gap (insertion or deletion of character string), and e (extended gap) is a continuing score of gap. Although the scores of o (third value) and e (fourth value) can be arbitrarily set, it is preferable that the values are e>mismatch, e>o, o<mismatch, e<0, and (mismatch×2)<(e+o). The reason will be explained later.
For example, as illustrated in
That is, the similarity calculating unit 15 sets the score for a portion of a character string the same as the character string of A among the character strings of B to “match=+2”, and sets the score for a portion of a character string different from the character string of A among the character strings of B to “mismatch=−2”.
When there is a section in B in which a character string different from A is inserted or a section in B (section where a gap occurs) in which character strings of A are partially deleted, the similarity calculating unit 15 sets the score for a character string at the starting point of the section to “o=−3”, and sets the score for a character string at the continuing point of the section to “e=−0.5”.
For example, when the similarity calculating unit 15 compares the portion indicated by the reference sign 302 of B illustrated in
Although the description is omitted herein, when there is a character string different from the character string of A in the portion indicated by the reference sign 302 of B, the similarity calculating unit 15 subtracts “2” for the character string (mismatch=−2), and when there is a section in B in which the character strings of A are partially deleted, the similarity calculating unit 15 subtracts “3” for the character string at the starting point of the section (o=−3) and subtracts “0.5” for the respective character strings at the continuing points of the section (e=−0.5).
In calculating the score described above, by using values of e>o and (mismatch×2)<(e+o), such as o=−3 and e=−0.5, a score to which the insertion or the deletion is reflected can be calculated when there is a section in B in which a part of A is inserted or deleted (section where a gap occurs).
For example, when the similarity calculating unit 15 calculates respective scores as “−2” by determining all the character strings of the section indicated by the reference sign 302 in
Moreover, by using values that will be e>o for o and e, such as o=−3 and e=−0.5, the similarity calculating unit 15 can prevent a large difference from occurring in the values of the score depending on the length of the section where the gap occurs in B. For example, when the length of the section where the gap occurs in B is “2”, the score is “−3+(−0.5)=−3.5”. On the other hand, when the length of the section where the gap occurs in B is “5”, the score is “−3+(−0.5)×4=−5”. Therefore, in the case in which the section where the gap occurs in B is “2” and the case of “5”, the difference between the respective scores can be made to about “1.5”.
The similarity calculating unit 15 uses the three score matrices calculated using the method to calculate F (A, B)/F (A, A) based on a maximum score point jmax obtained by the following Equation (4).
For example, when the similarity calculating unit 15 uses the Equation (1) to Equation (3) targeting A and B illustrated in
The similarity calculating unit 15 executes the same processing as above to any section, as a target, other than the section in which the maximum score point jmax (e.g., 18.5) is obtained by calculating the previous similarity from B in order to search for more similar portion of A from B, calculates the maximum score point jmax, and calculates F (A, B)/F (A, A). By doing it in this way, the similarity calculating unit 15 can calculate the similarity of any portion in B to A. The calculation result is stored in a predetermined area of a storage (not illustrated) of the vulnerability detection device 10 and is read when the determining unit 16 performs determination processing.
Determining Unit
The processing performed by the determining unit 16 will be explained in detail next with reference to
Specifically, at first, the determining unit 16 reads the calculation results of the similarities of portions of the test target program code by the similarity calculating unit 15 from the storage (not illustrated), and determines whether the similarity (Sim1) of each portion of the test target program code to the program code of the uncorrected vulnerability portion exceeds a predetermined threshold (S1). Here, when there is any portion in the test target program code, in which the similarity (Sim1) to the program code of the uncorrected vulnerability portion exceeds the predetermined threshold (Yes at S1), the determining unit 16 calculates a similarity (Sim2) of the portion to the program code of a corrected vulnerability portion (S2). The calculation of the similarity herein should be performed, for example, by using the same method as the similarity calculation performed in the similarity calculating unit 15, and, for the program code of the corrected vulnerability portion, for example, the information for patch applied program code included in the vulnerability related information in the vulnerability related DB 11 is referred to. On the other hand, when it is determined that there is no portion, in the test target program code, in which the similarity (Sim1) to the program code of the uncorrected vulnerability portion exceeds the predetermined threshold (No at S1), the determining unit 16 ends the process.
After S2, the determining unit 16 compares the similarity (Sim2) of the portion to the program code of the corrected vulnerability portion calculated at S2 with the similarity (Sim1) thereof to the program code of the uncorrected vulnerability portion, and ends the process when it is determined that Sim2>Sim1 (Yes at S3). That is, when the portion is more similar to the program code of the corrected vulnerability portion than the program code of the uncorrected vulnerability portion, the determining unit 16 ends the process. On the other hand, when it is determined that Sim2 Sim1 (No at S3), the determining unit 16 determines the portion as a candidate for an unknown vulnerability portion (S4). In other words, when the similarity (Sim1) of the portion to the program code of the uncorrected vulnerability portion is not less than the similarity (Sim2) to the program code of the corrected vulnerability portion calculated at S2, the determining unit 16 determines the portion as a candidate for the unknown vulnerability portion. That is to say, the determining unit 16 determines the portion determined that Sim2 is not less than Sim1 as a portion more likely to be the known vulnerability portion, and excludes the portion from the candidate for the unknown vulnerability portion.
Which portion in the test target program code is the portion can be calculated by tracing back, to i=1, the element selection order in calculation formula of each matrix (score matrices X, Y, and Z) that reaches calculation of jmax starting at the maximum score point jmax calculated in the similarity calculating unit 15. This operation is called “traceback”. In the traceback, the previous element that calculates the element currently being focused on, that is, any one of (i−1, j−1), (i−1, j), and (i, j−1) is traced back. Specifically, in order to perform the traceback, the similarity calculating unit 15 separately creates pointer matrices P, Q, and R indicated in the following Equation (5) that holds the selection order according to each score matrix when the three score matrices X, Y, and Z are calculated, and stores the created pointer matrices in the storage (not illustrated). As the pointers, the types of the matrix of elements used for calculating the current element and the locations of the elements are held.
P={pijϵ{x,⋅y,⋅z}|1≤i≤M,1≤j≤N}
Q={qijϵ{↑x,↑y}|1≤i≤M,1≤j≤N}
R={rijϵ{←x,←z}|1≤i≤M,1≤j≤N} (5)
where
: (i−1,j−1)
↑: (i−1, j)
←: (i, j−1)
⋅: (i, j)
The similarity calculating unit 15 calculates each element of the three pointer matrices using the following Equation (6) to Equation (8).
According to the vulnerability detection device 10 as explained above, it is possible to detect a candidate for an unknown vulnerability portion using the code clone from the test target program code.
Moreover, the vulnerability detection device 10 explained in the embodiment can be implemented by installing a vulnerability detection program executing the processing into a desired information processing device (computer). For example, by causing the information processing device to execute the vulnerability detection program provided as package software or online software, the information processing device can be functioned as the vulnerability detection device 10. The information processing device mentioned here includes a desktop personal computer or a notebook personal computer. In addition, the information processing device includes a mobile communication terminal such as a smartphone, a mobile phone, and a PHS (Personal Handyphone System), and further includes a slate terminal or the like such as PDA (Personal Digital Assistants). Moreover, the vulnerability detection device 10 may be implemented as a Web server or a cloud.
Programs
The memory 1010 includes ROM (Read Only Memory) 1011 and RAM (Random Access Memory) 1012. The Rom 1011 stores, for example, a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. For example, a removable storage medium such as a magnetic disk and an optical disk is inserted in the disk drive 1100. The serial port interface 1050 is connected with, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected with, for example, a display 1130.
As illustrated in
The vulnerability detection program is stored in the hard disk drive 1090 as, for example, the program module 1093 in which instructions executed by the computer 1000 are written. Specifically, the program module 1093, in which the processes executed by the vulnerability detection device 10 explained in the embodiment are written, is stored in the hard disk drive 1090.
The data used for information processing by the vulnerability detection program is stored, for example, in the hard disk drive 1090 as the program data. The CPU 1020 loads the program module 1093 or the program data 1094 stored in the hard disk drive 1090 into the RAM 1012 as needed, and executes the procedures.
The program module 1093 and the program data 1094 according to the vulnerability detection program are not limited to the case where both are stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 etc. Alternatively, the program module 1093 and the program data 1094 according to the vulnerability detection program may be stored in other computer connected thereto via a network such as LAN (Local Area Network) and WAN (Wide Area Network) and may be read by the CPU 1020 via the network interface 1070.
Number | Date | Country | Kind |
---|---|---|---|
2015-201165 | Oct 2015 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/077738 | 9/20/2016 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2017/061270 | 4/13/2017 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6282698 | Baker et al. | Aug 2001 | B1 |
7284273 | Szor | Oct 2007 | B1 |
8819856 | Tiffe et al. | Aug 2014 | B1 |
20150058984 | Shen | Feb 2015 | A1 |
20170286692 | Nakajima et al. | Oct 2017 | A1 |
Number | Date | Country |
---|---|---|
3 159 823 | Apr 2017 | EP |
2009-193161 | Aug 2009 | JP |
2011-86147 | Apr 2011 | JP |
Entry |
---|
Extended European Search Report dated Mar. 1, 2019, in Patent Application No. 16853424.6, 8 pages. |
International Search Report dated Nov. 1, 2016 in PCT/JP2016/077738 filed Sep. 20, 2016. |
Jiyong Jang et al., “ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions”, IEEE Symposium on Security and Privacy, 2012, 15 total pages. |
Hongzhe Li et al., “A Scalable Approach for Vulnerability Discovery Based on Security Patches”, Application and Techniques in Information Security, 2014, 14 total pages. |
Andreas Saebjoernsen et al., “Detecting Code Clones in Binary Executables”, Proceedings of ISSTA '09, 2009, pp. 1-11. |
Saul B. Needleman et al., “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins”, Journal of Molecular Biology, 1970, vol. 48, pp. 443-453. |
Osamu Gotoh, “An Improved Algorithm for Matching Biological Sequences”, Journal of Molecular Biology, 1982, vol. 162, pp. 705-708. |
Jannik Pewny et al., “Cross-Architecture Bug Search in Binary Executables”, 2015 IEEE Symposium on Security and Privacy, May 18, 2015, pp. 709-724. |
Number | Date | Country | |
---|---|---|---|
20180225460 A1 | Aug 2018 | US |