Detecting exploitable paths in application software that uses third-party libraries

Information

  • Patent Grant
  • 11836258
  • Patent Number
    11,836,258
  • Date Filed
    Thursday, July 22, 2021
    3 years ago
  • Date Issued
    Tuesday, December 5, 2023
    a year ago
  • Inventors
  • Original Assignees
  • Examiners
    • Li; Meng
    Agents
    • KLIGLER & ASSOCIATES PATENT ATTORNEYS LTD
Abstract
A method for software code analysis includes receiving source code of an application program, which includes one or more calls from respective entry points in the source code to a library program. The source code is automatically analyzed in order to generate a first data flow graph (DFG), representing a flow of data to be engendered upon running the application program. One or more vulnerabilities are identified in the library program. The library program is automatically analyzed to generate a second DFG linking at least one of the entry points in the source code to at least one of the vulnerabilities. The first DFG is combined with the second DFG in order to track the flow of data from the application program to the at least one of the vulnerabilities and to report at least one of the vulnerabilities as being exploitable.
Description
FIELD OF THE INVENTION

The present invention relates generally to protecting against security vulnerabilities in computer programs, and particularly to methods, apparatus and software for evaluating application security.


BACKGROUND

Various techniques are known in the art for testing and protecting software applications against security vulnerabilities. A “vulnerability” in this context is a flaw or weakness in the application program that can be exploited by an unauthorized party (also referred to as an attacker) to gain access to secure information or otherwise modify the behavior of the program. For example, static application security testing (SAST) techniques are typically applied in order to detect security vulnerabilities in source code before the code is compiled and run.


Methods for finding vulnerabilities using SAST are described, for example, in U.S. Pat. No. 9,128,728, whose disclosure is incorporated herein by reference. This patent describes a tool that automatically analyzes source code for application-level vulnerabilities. Operation of the tool is based on static analysis, but it makes use of a variety of techniques, for example methods of dealing with obfuscated code.


The Common Vulnerabilities and Exposures (CVE) system provides a database of known vulnerabilities in publicly-available software packages. The Mitre Corporation maintains the system with funding from the United States Department of Homeland Security. The system assigns a CVE Identifier (also known as a CVE number) to serve as a unique, common identifier for each vulnerability that is reported to it.


SUMMARY

Embodiments of the present invention that are described hereinbelow provide improved methods, apparatus and software for detection of vulnerabilities in software code.


There is therefore provided, in accordance with an embodiment of the invention, a method for software code analysis, which includes receiving, into a memory of a computer, source code of an application program, which includes one or more calls from respective entry points in the source code to at least one library program. The source code is automatically analyzed in the computer in order to generate a first data flow graph (DFG), representing a flow of data to be engendered upon running the application program. One or more vulnerabilities are identified in the at least one library program. The at least one library program is automatically analyzed to generate at least one second DFG linking at least one of the entry points in the source code to at least one of the vulnerabilities. The first DFG is combined with the at least one second DFG in order to track the flow of data from the application program to the at least one of the vulnerabilities. It is reported, responsively to the tracked flow, that the at least one of the vulnerabilities is exploitable.


In a disclosed embodiment, identifying the one or more vulnerabilities includes looking up the at least one library program in a database of known vulnerabilities. Additionally or alternatively, the one or more calls from the respective entry points in the source code include invocations of methods in an application program interface (API) of the at least one library program. Further additionally or alternatively, the method includes identifying at least one of the library calls by detecting an unresolved method in the source code.


In some embodiments, identifying the one or more vulnerabilities includes identifying a parameter in the at least one library program, and wherein automatically analyzing the at least one library program includes finding a data flow path from the at least one of the entry points to the vulnerable parameter in the at least one library program. In a disclosed embodiment, combining the first DFG with the at least one second DFG includes tracking the flow of data through the first DFG and the at least one second DFG to the vulnerable parameter, and the at least one of the vulnerabilities is reported to be exploitable when the tracked flow links an input of the application program to the vulnerable parameter. Additionally or alternatively, the at least one of the vulnerabilities is reported as not being exploitable when the tracked flow indicates that the one or more calls from the application program to the at least one library program are not accessible via a data flow from inputs of the application program.


In one embodiment, the at least one library program includes a first library program, which includes a call to a second library program, and identifying the one or more vulnerabilities includes identifying a vulnerability in the second library program, and automatically analyzing the at least one library program includes generating the at least one second DFG so as to track the flow of data through the first library program to the vulnerability in the second library program.


There is also provided, in accordance with an embodiment of the invention, apparatus for software code analysis, including a memory configured to receive source code of an application program, which includes one or more calls from respective entry points in the source code to at least one library program. A processor is configured to automatically analyze the source code in the computer in order to generate a first data flow graph (DFG), representing a flow of data to be engendered upon running the application program, to identify one or more vulnerabilities in the at least one library program, to automatically analyze the at least one library program to generate at least one second DFG linking at least one of the entry points in the source code to at least one of the vulnerabilities, to combine the first DFG with the at least one second DFG in order to track the flow of data from the application program to the at least one of the vulnerabilities, and to report, responsively to the tracked flow, that the at least one of the vulnerabilities is exploitable.


There is additionally provided, in accordance with an embodiment of the invention, a computer software product, including a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive source code of an application program, which includes one or more calls from respective entry points in the source code to at least one library program, to automatically analyze the source code in the computer in order to generate a first data flow graph (DFG), representing a flow of data to be engendered upon running the application program, to identify one or more vulnerabilities in the at least one library program, to automatically analyze the at least one library program to generate at least one second DFG linking at least one of the entry points in the source code to at least one of the vulnerabilities, to combine the first DFG with the at least one second DFG in order to track the flow of data from the application program to the at least one of the vulnerabilities, and to report, responsively to the tracked flow, that the at least one of the vulnerabilities is exploitable.


The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram that schematically illustrates a system for detection of vulnerabilities in software code, in accordance with an embodiment of the invention;



FIG. 2 is a flow chart that schematically illustrates a method for detection of vulnerabilities in a software application program, in accordance with an embodiment of the invention; and



FIG. 3 is an example data flow graph, in accordance with an embodiment of the invention.





DETAILED DESCRIPTION OF EMBODIMENTS

Software developers make frequent use of libraries of software code written by others in order to save time and effort in application program development. Third-party open-source libraries, such as the extensive libraries available on public platforms such as GitHub®, are particularly attractive to software developers, as they offer ready-made solutions to common problems encountered in programming. Third-party library programs, however, often have vulnerabilities, which can be exploited in mounting attacks on application programs that call the library programs.


Conscientious software developers may check for known vulnerabilities in library programs that they use, for example by looking them up in the CVE system mentioned above. This sort of checking is burdensome, however, and also tends to give a large number of false positive results, i.e., warnings of potential security flaws where in fact none exist. Specifically, a vulnerability in a library program will generally be exploitable, i.e., accessible to an attacker, only if the application program has an input that is linked through the program data flow to a vulnerable parameter in the library program. An attacker who discovers this link may then be able to trigger the vulnerability in the library program by setting the input of the application program to an appropriate value. If the application program does not contain such a link—for example because it calls the vulnerable method only with a constant parameter—the vulnerability will still exist, but it will not be exploitable.


There is thus a need for software analysis tools that are able to distinguish between exploitable and non-exploitable vulnerabilities in library programs used in an application under test, in order to alert software developers to actual security risks while minimizing false positive results. Such a tool can enable developers to focus their efforts where they are required and to avoid wasting time on vulnerabilities that are not exploitable. Embodiments of the present invention that are described herein address this need by providing software composition analysis tools that are capable of efficiently analyzing library programs together with the application program that calls them, and thus assessing which vulnerabilities of the library programs are exploitable and which are not.


In the disclosed embodiments, a computer receives into its memory source code of an application program, which includes one or more calls from respective entry points in the source code to one or more library programs. (These entry points typically correspond, for example, to invocations of methods in an application program interface (API) of the library program in question.) The computer automatically analyzes the source code in order to generate a data flow graph (DFG), which represents the flow of data engendered upon running the application program. This DFG typically represents only the application code, and not the library programs that it calls.


The computer also identifies one or more vulnerabilities in the library program or programs called by the application program, for example by looking up the library programs in a vulnerability database. It analyzes the library programs to generate DFGs linking the entry points from the application source code to the locations of the vulnerabilities that were identified in the library programs. These DFGs typically represent only the library code, and not the application program. The library DFGs may be computed on demand or, alternatively or additionally, they may be precomputed and stored in memory, for example in a database, for use when needed.


The computer then combines the DFG of the application program with the DFGs of the library programs in order to track the flow of data from the application program to the vulnerabilities in the library programs. Specifically, in the embodiments described below, the DFG of a given library program may identify a data flow path from an entry point (such as an API call) in the application source code to a vulnerable parameter in the library program. The computer tracks the flow of data through the DFGs of both the application program and the library program to the vulnerable parameter. The computer will report that a given vulnerability is exploitable when the tracked flow is found to link an input of the application program through the library program to the vulnerable parameter. Otherwise, when the tracked flow indicates that the relevant calls from the application program to the library program are not accessible through inputs of the application program (in the sense that there is no data flow from the inputs of the application program to the library program), the computer will report that the vulnerability is not exploitable and thus does not pose a security threat.


It commonly occurs that a first library program, which is called by an application program, will itself call a second library program. Thus, a vulnerability in the second library program may put the application program at risk. To deal with this sort of chained relationship (referred to herein as a “transient dependency”), the computer generates respective DFGs of both the first and second library programs. It uses both of these DFGs, together with the DFG of the application program, in order to track the flow of data from the application program through the library programs to the vulnerability in the second library program, and thus determines whether the vulnerability is exploitable.



FIG. 1 is a block diagram that schematically illustrates a system 20 for detection of vulnerabilities in software code of an application program, in accordance with an embodiment of the invention. The software code in the pictured example is stored on an application server 22, such as a suitable Web server, and runs on either or both of server 22 and client computers 24, which connect to server 22 via a network 25, such as the Internet. The application software includes calls to library programs, which may be stored and accessed on a library server 26.


Prior to deployment of the application program, a verification server 28 checks the application source code for security flaws. Server 28 comprises a memory 32, which receives and stores the source code of the application program, as well as of library programs, which are called from respective entry points in the application source code. Server 28 also comprises a processor 30, typically embodied in a general-purpose or special-purpose computer, which is programmed in software to carry out the functions that are described herein. The software may be downloaded to server 28 in electronic form, over a network, for example. Additionally or alternatively, the software may be provided and/or stored on tangible, non-transitory computer-readable media, such as magnetic, optical, or electronic memory. Memory 32 also holds the software that is run by processor 30 in performing the functions that are described herein.


As a part of its functionality in checking and reporting on security flaws in an application program, processor 30 identifies vulnerabilities in the library programs called by the application program. For this purpose, processor 30 typically looks up the library programs in a listing of vulnerabilities, which may be compiled as a vulnerability database 34, which may be based on the above-mentioned CVE database. Database 34 may be stored in memory 32 or, additionally or alternatively, the database may be accessed on a remote server (not shown) via network 25. Further additionally or alternatively, processor 30 may perform an independent analysis of relevant library programs in order to discover and record their vulnerabilities in memory 32.


In the present embodiment, for each known vulnerability, database 34 identifies the vulnerable method name and metadata, as well as vulnerable parameters in the method. These particular parameters are “vulnerable” in the sense that by manipulating the values of such parameters, an attacker can exploit a corresponding vulnerability in the software package. In other words, the parameters of a vulnerable method become “vulnerable parameters” by virtue of their usage in the method.


By way of example, the following is a record maintained in database 34 for CVE-2018-1000808—a vulnerability in a Python library called pyOpenSSL:

    • Library name—pyOpenSSL
    • Affected library versions—1.0-17.4.9
    • Vulnerable method metadata
      • Relative path to file with vulnerable method—OpenSSL/crypto.py
      • Class name of vulnerable method—no class
      • Vulnerable method signature
        • Method name—pkcs12
        • Input parameters in order—buffer, passphrase
        • Output parameters in order—no output
      • Vulnerable parameters—buffer input parameter
    • Vulnerability description
      • CVE number—CVE-2018-1000808
      • CVSS number—5.9
      • Textual description of vulnerability and implications—Python Cryptographic. Authority pyopenssl version Before 17.5.0 contains a CWE—401: Failure to Release Memory Before Removing Last Reference vulnerability in PKCS #12 Store that can result in Denial of service if memory runs low or is exhausted.


The “CVSS number” listed above is a score between 0 and 10 indicating the relative severity of the vulnerability.


Server 28 applies various static application security testing (SAST) techniques in order to detect security vulnerabilities in the application source code. These functions are described in detail in the above-mentioned U.S. Pat. No. 9,128,728, and a full description is beyond the scope of the present patent application, which relates specifically, as explained above, to the problem of analyzing vulnerabilities that arise from the use of library programs.


For this latter purpose, server 28 applies the techniques of data flow graph (DFG) construction that are described specifically in columns 11-14 of U.S. Pat. No. 9,128,728 and illustrated in FIGS. 7-9. To summarize briefly, processor 30 derives an object-based representation, known as a document object model (DOM), of the software source code under analysis (such as the application code or the relevant library code). The processor uses the DOM to extract flow graphs of the code. These flow graphs typically include the data flow graph (DFG), which represents a flow of data that will be engendered when the code is run, and they may also include a control flow graph (CFG) and a control dependence graph (CDG). Processor 30 stores the analysis results in memory 32, for example in the form of a database to enable convenient access to the data thereafter.



FIG. 2 is a flow chart that schematically illustrates a method for detection of vulnerabilities in a software application program, in accordance with an embodiment of the invention. The method is described, for the sake of clarity and concreteness, with reference to the elements of system 20, and particularly verification server 28. Alternatively, the method can be carried out in other suitable hardware configurations, on a dedicated verification server or on any other suitable computer.


To initiate the method, server 28 receives source code of an application program under test into memory 32, at a code reception step 40. The code includes one or more calls from respective entry points in the source code to at least one library program. Processor 30 applies the source code analysis techniques described above in building a DFG of the application program, at an application DFG generation step 42.


In addition, processor 30 identifies library calls in the application source code, at a call identification step 44. The library calls are the entry points from the application to the library programs and typically (although not necessarily) can be identified as invocations of methods in an API of a library program. (In identifying and analyzing the library calls, processor 30 may use data concerning libraries and their versions stored in a packages or manifest file, for example, although other data sources and logic may alternatively be used.) A given application program can include multiple library calls, which may be directed to the same library program or to multiple different library programs. Furthermore, at step 44, server 28 may retrieve and analyze the source code of the libraries called by the application program, in order to identify and analyze calls from these library programs to other library programs (creating “transient dependencies,” as defined above).


Alternatively or additionally, processor 30 may identify library calls at step 44 by detecting “unresolved methods,” i.e., methods that do not have a declaration in application source code. Such methods can be detected, for example, by generating and querying a model of the application code using a suitable SAST tool. If a method has no declaration in the application source code, it can be inferred that the declaration is in a library, and the relevant libraries can then be searched based on the methods imported to the application.


Processor 30 identifies vulnerabilities in the library programs found at step 44, for example by lookup in database 34, at a vulnerability identification step 46. For each library, the processor collects known vulnerabilities from the database, including details such as metadata of vulnerable methods in the library (including the method name, params, containing class, and relative path in the library) and the vulnerable parameter or parameters in each vulnerable method. For example, the following data can be collected and saved per vulnerability:

    • Library metadata:
      • Library name
      • Affected library versions
    • Vulnerable method metadata:
      • elative path in library to the file that contains the vulnerable method
      • Class name of vulnerable method, if exists
      • Vulnerable method signature
        • Method name
        • Input parameters in order
        • Output parameters in order
    • Vulnerable parameters—input parameters, class members, global objects, etc., which can be used in exploiting the vulnerability.


For each vulnerability found at step 46, processor 30 analyzes the corresponding library program in order to build a DFG of the library program, at a library DFG generation step 48. Alternatively or additionally, the processor may retrieve a pre-computed DFG, for example, from memory 32. In general, each such DFG links one of the entry points found at step 44 in the application source code to a vulnerability in a library program. In the case of transient dependencies, however, the DFG will track the flow of data from an entry point in one library program that calls a second library program to a vulnerable parameter in the second library program.


Processor 30 combines the DFG of the application program built at step 42 with the DFGs of vulnerable library programs built at step 48, in order to track the flow of data from the application program to the vulnerable parameters in the library programs, at a DFG combination step 50. Based on the combined DFG, processor 30 decides whether the vulnerability in the library program is exploitable, at an exploitability evaluation step 52.


Generally speaking, when the data flow tracked through the combined DFGs is found to link an input of the application program to a vulnerable parameter in a library program, processor 30 will identify the vulnerability of the library program as exploitable and will initiate a protective action, at a protection step 54. For example, if a user input to the application program is tracked to the queue of a vulnerable library program, which has an inner thread for handling the elements in the queue, the combined DFG will identify the vulnerability as exploitable. The protective action taken at step 54 typically comprises issuance of a report that an exploitable vulnerability has been found. Alternatively or additionally, step 54 may involve more proactive protection, such as halting compilation of the application program until the vulnerability is resolved, or inserting a routine to “sanitize” the problematic input before it reaches the vulnerable library program.


Alternatively, when the tracked flow indicates at step 52 that a given vulnerability found at step 46 is not exploitable, the method will terminate at a completion step 56. (Server 28 may still report the vulnerability that was found, but will indicate that it presents little or no risk, since it is not exploitable in the existing application configuration.) Specifically, when analysis of the DFGs indicates that the vulnerability in a certain library program is not accessible through any input of the application program that calls the library program (meaning, as noted above, that there is no data flow from the inputs of the application program to the vulnerability), the vulnerability will be considered non-exploitable. This sort of finding will arise, for example, when the application program calls the library program with a constant or other parameter that is not affected by the inputs to the application program.


Example with Transient Dependency

In the following example, an application program calls Library A, which is a library for writing a REST web server. Library A calls Library B to handle HTTP requests. Hence, there is a transient dependency from the application code to Library B. Meanwhile, Library B is known to be vulnerable to an attack via a crafted HTTP request, making Library A, and thus the application program, vulnerable to this sort of attack, as well. By tracing an HTTP request using the DFGs, server 28 constructs the exploitable path from the application program to the vulnerable parameter in Library B, as illustrated below.


Library B:

















class HttpHandler {









// Has a vulnerability that can be triggered by a crafted



// “rawData”



public static HttpPacket parseRawData(byte[ ] rawData) {...}









}










Library A:

















Import HttpHandler;



class RestApiHandler {









public RestApiHandler(Map<string, Method>



endpointToMethod) {...}



private void handleHttpPacket(byte[ ] rawData) {









// Triggers the vulnerability



HttpPacket packet =



HttpHandler.parseRawData(rawData);



// Find relevant endpoint from URL and call the



appropriate method









}









}










APPLICATION CODE
















Import RestApiHandler;



class MyRestApi {









// Method that receives user input



public byte[ ] getNextPacketRawData( ) {...}



public void main( ) {









RestApiHandler restApiHandler(...);



while (true) {









byte[ ] rawData = getNextPacketRawData( );



restApiHandler.handleHttpPacket(rawData);









}









}









}











FIG. 3 is a combined DFG representing the path from a user input to the vulnerability in Library B, in accordance with an embodiment of the invention. In this example, the application code imports the class RestApiHandler from Library A, which in turn imports the class HttpHandler from Library B. The combined DFG is built across application code, Library A and Library B. The application code receives user input via the getNextPacketRawData( ) method. Since the data flow reaches the vulnerable method and its vulnerable parameter in Library B, server 28 will report that the vulnerability is exploitable.


It will be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.

Claims
  • 1. A method for software code analysis, comprising: receiving, into a memory of a computer, source code of an application program, which includes one or more calls from respective entry points in the source code to at least one library program;automatically analyzing the source code in the computer in order to generate a first data flow graph (DFG), representing a flow of data to be engendered upon running the application program;identifying one or more vulnerabilities in the at least one library program;automatically analyzing the at least one library program to generate at least one second DFG linking at least one of the entry points in the source code to at least one of the vulnerabilities;combining the first DFG with the at least one second DFG in order to track the flow of data from the application program to the at least one of the vulnerabilities; andreporting, responsively to the tracked flow, that the at least one of the vulnerabilities is exploitable.
  • 2. The method according to claim 1, wherein identifying the one or more vulnerabilities comprises looking up the at least one library program in a database of known vulnerabilities.
  • 3. The method according to claim 1, wherein the one or more calls from the respective entry points in the source code comprise invocations of methods in an application program interface (API) of the at least one library program.
  • 4. The method according to claim 1, and comprising identifying at least one of the library calls by detecting an unresolved method in the source code.
  • 5. The method according to claim 1, wherein identifying the one or more vulnerabilities comprises identifying a vulnerable parameter in the at least one library program, and wherein automatically analyzing the at least one library program comprises finding a data flow path from the at least one of the entry points to the vulnerable parameter in the at least one library program.
  • 6. The method according to claim 5, wherein combining the first DFG with the at least one second DFG comprises tracking the flow of data through the first DFG and the at least one second DFG to the vulnerable parameter, and wherein the at least one of the vulnerabilities is reported to be exploitable when the tracked flow links an input of the application program to the vulnerable parameter.
  • 7. The method according to claim 6, wherein the at least one of the vulnerabilities is reported as not being exploitable when the tracked flow indicates that the one or more calls from the application program to the at least one library program are not accessible via a data flow from inputs of the application program.
  • 8. The method according to claim 1, wherein the at least one library program comprises a first library program, which includes a call to a second library program, and wherein identifying the one or more vulnerabilities comprises identifying a vulnerability in the second library program, andwherein automatically analyzing the at least one library program comprises generating the at least one second DFG so as to track the flow of data through the first library program to the vulnerability in the second library program.
  • 9. Apparatus for software code analysis, comprising: a memory configured to receive source code of an application program, which includes one or more calls from respective entry points in the source code to at least one library program; anda processor configured to automatically analyze the source code in the computer in order to generate a first data flow graph (DFG), representing a flow of data to be engendered upon running the application program, to identify one or more vulnerabilities in the at least one library program, to automatically analyze the at least one library program to generate at least one second DFG linking at least one of the entry points in the source code to at least one of the vulnerabilities, to combine the first DFG with the at least one second DFG in order to track the flow of data from the application program to the at least one of the vulnerabilities, and to report, responsively to the tracked flow, that the at least one of the vulnerabilities is exploitable.
  • 10. The apparatus according to claim 9, wherein the processor is configured to identify the one or more vulnerabilities by looking up the at least one library program in a database of known vulnerabilities.
  • 11. The apparatus according to claim 9, wherein the one or more calls from the respective entry points in the source code comprise invocations of methods in an application program interface (API) of the at least one library program.
  • 12. The apparatus according to claim 9, wherein the processor is configured to identify at least one of the library calls by detecting an unresolved method in the source code.
  • 13. The apparatus according to claim 9, wherein the processor is configured to identify a vulnerable parameter in the at least one library program, and to find a data flow path from the at least one of the entry points to the vulnerable parameter in the at least one library program.
  • 14. The apparatus according to claim 13, wherein the processor is configured to track the flow of data through the first DFG and the at least one second DFG to the vulnerable parameter, and wherein the at least one of the vulnerabilities is reported to be exploitable when the tracked flow links an input of the application program to the vulnerable parameter.
  • 15. The apparatus according to claim 14, wherein the at least one of the vulnerabilities is reported as not being exploitable when the tracked flow indicates that the one or more calls from the application program to the at least one library program are not accessible via a data flow from inputs of the application program.
  • 16. The apparatus according to claim 9, wherein the at least one library program comprises a first library program, which includes a call to a second library program, and wherein the processor is configured to identify a vulnerability in the second library program and to generate the at least one second DFG so as to track the flow of data through the first library program to the vulnerability in the second library program.
  • 17. A computer software product, comprising a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive source code of an application program, which includes one or more calls from respective entry points in the source code to at least one library program, to automatically analyze the source code in the computer in order to generate a first data flow graph (DFG), representing a flow of data to be engendered upon running the application program, to identify one or more vulnerabilities in the at least one library program, to automatically analyze the at least one library program to generate at least one second DFG linking at least one of the entry points in the source code to at least one of the vulnerabilities, to combine the first DFG with the at least one second DFG in order to track the flow of data from the application program to the at least one of the vulnerabilities, and to report, responsively to the tracked flow, that the at least one of the vulnerabilities is exploitable.
  • 18. The product according to claim 17, wherein the instructions cause the computer to identify the one or more vulnerabilities by looking up the at least one library program in a database of known vulnerabilities.
  • 19. The product according to claim 17, wherein the one or more calls from the respective entry points in the source code comprise invocations of methods in an application program interface (API) of the at least one library program.
  • 20. The product according to claim 17, wherein the instructions cause the computer to identify at least one of the library calls by detecting an unresolved method in the source code.
  • 21. The product according to claim 17, wherein the instructions cause the computer to identify a vulnerable parameter in the at least one library program, and to find a data flow path from the at least one of the entry points to the vulnerable parameter in the at least one library program.
  • 22. The product according to claim 21, wherein the instructions cause the computer to track the flow of data through the first DFG and the at least one second DFG to the vulnerable parameter, and wherein the at least one of the vulnerabilities is reported to be exploitable when the tracked flow links an input of the application program to the vulnerable parameter.
  • 23. The product according to claim 22, wherein the at least one of the vulnerabilities is reported as not being exploitable when the tracked flow indicates that the one or more calls from the application program to the at least one library program are not accessible via a data flow from inputs of the application program.
  • 24. The product according to claim 17, wherein the at least one library program comprises a first library program, which includes a call to a second library program, and wherein the instructions cause the computer to identify a vulnerability in the second library program and to generate the at least one second DFG so as to track the flow of data through the first library program to the vulnerability in the second library program.
CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Patent Application 63/057,534, filed Jul. 28, 2020, which is incorporated herein by reference.

US Referenced Citations (119)
Number Name Date Kind
5107418 Cramer et al. Apr 1992 A
5485616 Burke et al. Jan 1996 A
5586328 Caron et al. Dec 1996 A
5586330 Knudsen et al. Dec 1996 A
5701489 Bates et al. Dec 1997 A
5742811 Agrawal et al. Apr 1998 A
5778233 Besaw et al. Jul 1998 A
5790858 Vogel Aug 1998 A
5875334 Chow et al. Feb 1999 A
5881290 Ansari et al. Mar 1999 A
5978588 Wallace Nov 1999 A
6226787 Serra et al. May 2001 B1
6442748 Bowman-Amuah Aug 2002 B1
7210133 Souloglou et al. Apr 2007 B2
7237265 Reshef et al. Jun 2007 B2
7284274 Walls et al. Oct 2007 B1
7363616 Kalyanaraman Apr 2008 B2
7392545 Weber Jun 2008 B1
7447666 Wang Nov 2008 B2
7565631 Banerjee et al. Jul 2009 B1
7647631 Sima Jan 2010 B2
7860842 Bronnikov et al. Dec 2010 B2
7861226 Episkopos et al. Dec 2010 B1
7971193 Li et al. Jun 2011 B2
7975296 Apfelbaum et al. Jul 2011 B2
8230499 Pereira Jul 2012 B1
8510237 Cascaval et al. Aug 2013 B2
8656364 Kolawa Feb 2014 B1
8819772 Bettini et al. Aug 2014 B2
8844043 Williams et al. Sep 2014 B2
8881288 Levy et al. Nov 2014 B1
8949271 Kocher et al. Feb 2015 B2
9128728 Siman Sep 2015 B2
9141806 Siman Sep 2015 B2
9317399 Boshernitsan et al. Apr 2016 B2
9882930 Holt Jan 2018 B2
9946880 Lee et al. Apr 2018 B2
11087002 Siman et al. Aug 2021 B2
20020178281 Aizenbud-Reshef et al. Nov 2002 A1
20030056192 Burgess Mar 2003 A1
20040088689 Hammes May 2004 A1
20040205411 Hong et al. Oct 2004 A1
20040255277 Berg et al. Dec 2004 A1
20050015752 Alpern Jan 2005 A1
20050198626 Kielstra et al. Sep 2005 A1
20050204344 Shinomi Sep 2005 A1
20050257207 Blumfield et al. Nov 2005 A1
20050273861 Chess et al. Dec 2005 A1
20060070048 Li et al. Mar 2006 A1
20060085858 Noel et al. Apr 2006 A1
20060253841 Rioux Nov 2006 A1
20060282453 Tjong et al. Dec 2006 A1
20070006170 Hasse et al. Jan 2007 A1
20070016949 Dunagan et al. Jan 2007 A1
20070044153 Schuba et al. Feb 2007 A1
20070074169 Chess et al. Mar 2007 A1
20070074188 Huang et al. Mar 2007 A1
20070083933 Venkatapathy et al. Apr 2007 A1
20070143759 Ozgur et al. Jun 2007 A1
20070239606 Eisen Oct 2007 A1
20070294281 Ward et al. Dec 2007 A1
20080209276 Stubbs et al. Aug 2008 A1
20080276317 Chandola et al. Nov 2008 A1
20090019545 Ben-Itzhak et al. Jan 2009 A1
20090094175 Provos et al. Apr 2009 A1
20090113550 Costa et al. Apr 2009 A1
20090183141 Tai et al. Jul 2009 A1
20090187992 Poston Jul 2009 A1
20090254572 Redlich et al. Oct 2009 A1
20090300764 Freeman Dec 2009 A1
20100011441 Christodorescu et al. Jan 2010 A1
20100043072 Rothwell Feb 2010 A1
20100050260 Nakakoji et al. Feb 2010 A1
20100058475 Thummalapenta et al. Mar 2010 A1
20100083240 Siman Apr 2010 A1
20100088688 Edwards Apr 2010 A1
20100088770 Yerushalmi et al. Apr 2010 A1
20100125913 Davenport et al. May 2010 A1
20100180344 Malyshev et al. Jul 2010 A1
20100198799 Krishnan Aug 2010 A1
20100229239 Rozenberg et al. Sep 2010 A1
20100251210 Amaral et al. Sep 2010 A1
20100279708 Lidsrom et al. Nov 2010 A1
20100289806 Lao et al. Nov 2010 A1
20110004631 Inokuchi et al. Jan 2011 A1
20110030061 Artzi et al. Feb 2011 A1
20110034733 Funahashi et al. Feb 2011 A1
20110035800 Atcha Feb 2011 A1
20110191855 De Keukelaere et al. Aug 2011 A1
20110197177 Mony Aug 2011 A1
20110239294 Kim et al. Sep 2011 A1
20110239300 Klein et al. Sep 2011 A1
20120167209 Molnar et al. Jun 2012 A1
20120240185 Kapoor et al. Sep 2012 A1
20120272224 Brackman Oct 2012 A1
20130019314 Ji et al. Jan 2013 A1
20130024942 Wiegenstein et al. Jan 2013 A1
20130167241 Siman Jun 2013 A1
20130247198 Muttik et al. Sep 2013 A1
20130312102 Brake Nov 2013 A1
20140068563 Saltzman et al. Mar 2014 A1
20140109227 Kalman et al. Apr 2014 A1
20140165204 Williams et al. Jun 2014 A1
20140281740 Casado et al. Sep 2014 A1
20140331327 Maor et al. Nov 2014 A1
20140372985 Levin et al. Dec 2014 A1
20150013011 Brucker et al. Jan 2015 A1
20150244737 Siman Aug 2015 A1
20150261955 Huang et al. Sep 2015 A1
20160182558 Tripp Jun 2016 A1
20170091457 Zakorzhevsky et al. Mar 2017 A1
20170255544 Plate Sep 2017 A1
20170270303 Roichman et al. Sep 2017 A1
20170289187 Noel et al. Oct 2017 A1
20180025161 Gauthier et al. Jan 2018 A1
20180107821 Eshkenazi Apr 2018 A1
20180330102 Siman Nov 2018 A1
20190347422 Abadi Nov 2019 A1
20210157583 Yuile May 2021 A1
Foreign Referenced Citations (6)
Number Date Country
2200812 Sep 1998 CA
2003050722 Feb 2003 JP
2005121953 Dec 2005 WO
2008047351 Apr 2008 WO
2016108162 Jul 2016 WO
2016113663 Jul 2016 WO
Non-Patent Literature Citations (37)
Entry
Pingali et al., “Optimal Control Dependence Computation and the Roman Charlots Problem”, ACM Transactions on Programming Languages and Systems, vol. 19, No. 3, pp. 462-485, May 1997.
Sreedhar et al., “A New Framework for Elimination-Based Data Flow Analysis Using DJ Graphs”, ACM Transactions on Programming Languages and Systems, vol. 20, No. 2, pp. 368-407, Mar. 1998.
Helmer et al., “A Software Fault Tree Approach to Requirements Analysis of an Intrusion Detection System”, 1st Symposium on Requirements Engineering for Information Security, Indianapolis, Indiana, USA, pp. 1-14, Mar. 2001.
Redgate, “.NET Reflector: Explore, Browse, and Analyze .NET assemblies”, pp. 1-3, Jul. 2009, as downloaded from www.red-gate.com/productors/reflector.
Beyer et al., “The BLAST Query Language for Software Verification”, Springer-Verlag Berlin Heidelberg, pp. 2-18, year 2004.
Srikant et al., “Mining Sequential Patterns: Generalizations and Performance Improvements”, EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, pp. 3-17, Mar. 1996.
Zaki, M., “SPADE: An Efficient Algorithm for Mining Frequent Sequences”, Machine Learning, vol. 42, pp. 31-60, year 2001.
Pei et al., in “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, No. 10, pp. 1424-1440, Oct. 2004.
Martin et al., “Finding Application Errors and Security Flaws Using PQL: a Program Query Language”, OOPSLA'05, pp. 365-383, Oct. 2005.
Yang et al., “Effective Sequential Pattern Mining Algorithms for Dense Database”, National Data Engineering Workshops (DEWS), pp. 1-7, year 2004.
Ayres et al., “Sequential Pattern Mining using a Bitmap Representation”, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1-7, Jul. 2002.
Wang et al., “BIDE: Efficient Mining of Frequent Closed Sequence”, Proceedings of 2010 International Conference on Information Retrieval & Knowledge Management, pp. 270-275, pp. 1-12, Mar. 2010.
Yan et al., “CloSpan: Mining Closed Sequential Patterns in Large Datasets”, Proceedings of 2003 SIAM International Conference on Data Mining, pp. 1-12, May 2003.
“Design flaw in AS3 socket handling allows port probing”, pp. 1-2, Oct. 15, 2008, as downloaded from http://scan.flashsec.org/.
Ford et al., “Analyzing and Detecting Malicious Flash Advertisements”, Proceedings of ACSAC '09—Annual Computer Security Applications Conference, pp. 363-372, Dec. 7-11, 2009.
Livshits et al., “Finding Security Vulnerabilities in Java Applications with Static Analysis”. Stanford University, computer science department, pp. 1-60, Sep. 25, 2005.
Symantec Corporation, “Symantec AdVantage: Dynamic Anti-Malvertising Solution”, Data Sheet, pp. 1-4 pages, year 2012.
“Zero-day attack”, pp. 1-4 pages, year 2008, as downloaded from http://en.wikipedia.org/wiki/Zero-day_attack.
Lange et al., “Comparing Graph-based Program Comprehension Tools to Relational Database-based Tools”, IEEE 0-7695-1131-7/01, pp. 209-218, year 2001.
Skedzielewski et al., “Data flow graph optimization in IF1”, Functional programming languages and computer architecture (book), publisher Springer Berlin Heidelberg, pp. 17-34, Aug. 22, 2013.
SAP, Java web application security best practice guide, SAP,Document version 2.0, pp. 1-48, May 2006.
Checkmarx CxQuery Language API Guide, V8.6.0 , pp. 1-217, Feb. 2018.
Zhenmin et al, “PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code”, ACM Sigsoft Software Engineering Notes, vol. 30, No. 5, pp. 306-315, Sep. 1, 2005.
Thummalapenta et al, “Alattin: Mining Alternative Patterns for detecting Neglected Conditions”, 24th IEEE/ACM International Conference on IEEE Automated Software Engineering, pp. 283-294, Nov. 16, 2009.
Kim et al, “Supporting software development through declaratively codified programming patterns”, Expert Systems with Applications, vol. 23, No. 4, pp. 405-413, Nov. 1, 2002.
Ashish et al., “Network Intrusion Detection Sequence mining—stide methodology”, IT 608, Data Mining and Warehousing, Indian Institute of Technology, pp. 1-8, Apr. 20, 2005.
Goldsmith et al., “Relational Queries Over Program Traces”, OOPSLA'05, pp. 1-18, Oct. 2005.
Yamada et al., “A defect Detection Method for Object-Oriented Programs using Sequential Pattern Mining”, Information Processing Society of Japan (IPSJ) SIG Technical Report, vol. 2009-CSEC-45, pp. 1-8, Jun. 15, 2009.
Fukami et al., “SWF and the Malware Tragedy Detecting Malicious Adobe Flash Files”, pp. 1-11, Mar. 9, 2008, as downloaded from https://www.owasp.org/images/1/10/OWASP-AppSecEU08-Fukami.pdf.
Cova et al., “Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code”, Proceedings of the 19th international conference on World wide web, pp. 281-290, Jan. 1, 2010.
Sotirov., “Automatic Vulnerability Detection Using Static Source Code Analysis”, Internet citation, pp. 1-118, Jan. 1, 2005.
Lam et al., “Context-Sensitive Program Analysis as Database Queries”, ACM, PODS, pp. 1-12, year 2005.
Balzarotti et al., “Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications”, IEEE Symposium on Security and Privacy, pp. 387-401, May 18, 2018.
Coverity Inc., “Coverity® Development Testing Platform”, pp. 1-5, year 2012.
Chess et al., “Dynamic Taint Propagation”, pp. 1-70, Feb. 21, 2008.
EP Application # 21187307.0 Search Report dated Dec. 14, 2021.
Shuai et al., “Software Vulnerability Detection Based on Code Coverage and Test Cost”, 11th International Conference on Computational Intelligence and Security (CIS), pp. 317-321, 2015.
Related Publications (1)
Number Date Country
20220035928 A1 Feb 2022 US
Provisional Applications (1)
Number Date Country
63057534 Jul 2020 US