Software bugs are errors, flaws, mistakes, or faults in computer programs that prevent it the program from behaving as intended and/or producing an incorrect result. Software testing, i.e. bug checking is a process used to assess and improve the qualities of computer software by identifying bugs in the implementation of the code (e.g., source code, object code, binary/executable code, etc.) so that they may be removed. The qualities of computer software may include the correctness, completeness, security, capability, reliability, efficiency, portability, maintainability, compatibility, usability and any other suitable characteristic.
Software testing which involves manually inspecting code may be tedious and repetitive as most software systems are in the order of thousands to millions of lines of code. Accordingly, dynamic and static program analyses methods have been developed to test software code.
Dynamic software testing, i.e., dynamic program analysis involves the analysis of executable code, i.e., during execution of the software to identify errors within the code. Static software testing, i.e., static program analysis involves the testing of non-executable code, i.e., not during execution of the software, to identify errors within the code and is usually performed on some version of the source code or object code.
Static program analysis of software code allows for classification of a portion or all of the statements within the software code as bug free statements or actual bug statements. Classification of the software code improves the efficiency of manual code inspection as a software tester is able to selectively inspect and/or modify the software code based on the classification.
In general, in one aspect, the invention relates to a method for analyzing a plurality of potential bug statements in source code. The method includes obtaining a plurality of static program analyses; recursively reducing the plurality of potential bug statements in the source code by: selecting a static program analysis for each recursion from a plurality of static program analyses in order from least time consuming to most time consuming; evaluating the plurality of potential bug statements using the static program analysis of the plurality of static program analyses to determine a subgroup of bug free statements of the plurality of potential bug statements in each recursion; and removing the subgroup of the bug free statements from the plurality of potential bug statements to reduce the plurality of potential bug statements in each recursion; thereby filtering at least one subgroup of bug free statements out of the plurality of potential bug statements in the source code.
In general, in one aspect, the invention relates to a system for analyzing a plurality of potential bug statements. The system includes a results repository comprising: a plurality of potential bug statements; a static analysis engine comprising functionality to recursively: select a static program analysis for each recursion from a plurality of static program analyses in order from least time consuming to most time consuming; evaluate the plurality of potential bug statements using the static program analysis of the plurality of static program analyses to determine a subgroup of bug free statements of the plurality of potential bug statements in each recursion; and remove the subgroup of the bug free statements from the plurality of potential bug statements to reduce the plurality of potential bug statements in each recursion; a statement modifier comprising functionality to: modify at least one potential bug statement of the plurality of potential bug statements.
In general, in one aspect, the invention relates to a computer readable medium comprising instructions for analyzing a plurality of potential bug statements. The instructions comprising functionality for obtaining a plurality of static program analyses; recursively reducing the plurality of potential bug statements in the source code by: selecting a static program analysis for each recursion from a plurality of static program analyses in order from least time consuming to most time consuming; evaluating the plurality of potential bug statements using the static program analysis of the plurality of static program analyses to determine a subgroup of bug free statements of the plurality of potential bug statements in each recursion; and removing the subgroup of the bug free statements from the plurality of potential bug statements to reduce the plurality of potential bug statements in each recursion; thereby filtering at least one subgroup of bug free statements out of the plurality of potential bug statements in the source code.
Other aspects and advantages of the invention will be apparent from the following description and the appended claims.
Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.
In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
In general, embodiments of the invention provide a method for analyzing potential bug statements in software code. Specifically, embodiments of the invention provide a method and system for recursively reducing potential bug statements in the software code using multiple static program analyses from least time consuming to most consuming.
In one or more embodiments of the invention, the system (100) is implemented using a client-server topology. The system (100) itself may correspond to an enterprise application running on one or more servers, and in some embodiments could be a peer-to-peer system, or resident upon a single computing system. In addition, the system (100) is accessible from other machines using one or more interfaces (not shown). In one or more embodiments of the invention, the system (100) is accessible over a network connection (not shown), such as the Internet, by one or more users. Information and/or services provided by the system (100) may also be stored and accessed over the network connection.
In one or more embodiments of the invention, the source code (105) corresponds to software code including, but not limited to, code in a high level programming language, low level programming language, and/or machine language, intermediate representations generated by a compiler, executable code, graphical representations of code (e.g., diagrams representing code), or any other form of code. Statements (106) within the source code (105) correspond to simple statements, compound statements, declarations, or any other component of the source code (105). Statements (106) may be separated using statement separators and/or statement terminators defined in the programming language. Further, statements (106) within the source code (105) may include bugs or may be free of bugs.
In one or more embodiments of the invention, the results repository (110) corresponds to a data storage device that includes functionality to store source code. The results repository may include the source code (105) itself or a copy of the source code (105) where statements (106) within the source code (105) are classified as potential bug statements (112), bug free statements (114) and/or actual bug statements (116). In one or more embodiments of the invention, where statements (106) that are classified as potential bug statements (112), bug free statements (114), and/or actual bug statements (116) are actually stored in the results repository (110), access to the results repository (110) may be restricted and/or secured. As such, access to the results repository (110) may require authentication using passwords, secret questions, personal identification numbers (PINs), biometrics, and/or any other suitable authentication mechanism. Those skilled in the art will appreciate that elements or various portions of data stored in the results repository (110) may be distributed and stored in multiple data repositories. In one or more embodiments of the invention, the results repository (110) is flat, hierarchical, network based, relational, dimensional, object modeled, or structured otherwise. For example, the results repository may be maintained as a table of a SQL database. In addition, data in the results repository (110) may be verified against data stored in other repositories.
In one or more embodiments of the invention, a potential bug statement (112) may correspond to any statement (106) of the source code (105) which has not been classified as a bug free statement (114) or an actual bug statement (116) (e.g., by a static program analysis or by a user). In one or more embodiments of the invention, a portion or all of the statements (106) in the source code (105) may initially be classified as potential bug statements (112), until the potential bug statements (112) are re-classified as (e.g., deduced to be) bug free statements (114) or actual bug statements (116).
In one or more embodiments of the invention, a bug free statement (114) is a statement without any errors. A bug free statement (114) may also correspond to a statement that has been identified as not having a particular type of error. An example of a bug free statement is a statement that assigns a constant numerical value to a newly declared integer variable.
In one or more embodiments of the invention, an actual bug statement (116) is a statement that has been identified as containing an error. The actual bug statement (116) may be identified as having a specified error, a type of error, or the specific error itself. Furthermore, an actual bug statement (116) may be stored/identified together or separately from the potential bug statements (112). An example of an actual bug statement is a statement(s) that uses a numerical value entered by a user as an address in memory. In this example, data at an unknown memory address may be accessed for execution. Another example involves a buffer overflow, where an index value for referencing an array may be out of bounds based on input.
Continuing with
In one or more embodiments of the invention, each of the static program analyses (120) are associated with an expected time consumption for execution. The expected time consumption may be a runtime complexity estimate (e.g., linear, exponential, logarithmic, or other suitable estimate). Further, the expected time consumption may be general or code specific. For example, an expected time consumption for execution of a static program analysis (e.g., static program analysis A (122) and static program analysis N (128)) may be based on the type of software (e.g., an operating system, a word processing application, a game). Alternatively, the expected time consumption may be based on a particular software application. Historical data for a previous execution of a static program analysis on the software application (i.e., a different version of the software application or same version of the software application) may be used to determine an estimated time consumption for execution of the static program analysis on the software application.
In one or more embodiments of the invention, the static analysis engine (130) includes functionality to identify a set of potential bug statements (112) within the code for a software application. The static analysis engine (130) may simply identify the entire source code (105) as potential bug statements (112) or may select a subgroup of statements (106) from the source code (105). For example, the static analysis engine may perform a taint analysis to identify statements affected by user input as the set of potential bug statements. The static analysis engine (130) may also identify statements (106) within the source code (105) that may cause one or more specific error types. For example, the static analysis engine may identify all statements that are related to array indexing, to check for buffer overflow errors, as potential bug statements.
In one or more embodiments of the invention, the static analysis engine (130) may recursively evaluate the identified potential bug statements (112) using one or more static program analyses (120) to determine if the potential bug statements (112) are bug free statements (114) or actual bug statements (116). The static analysis engine (130) includes functionality to identify the least time consuming static program analysis based on the estimated time consumption of each of the static program analyses. Furthermore, the static analysis engine (130) may include functionality to remove bug free statements (114) and/or actual bug statements (116) from the potential bug statements (112) based on evaluating the potential bug statements (112) using the static program analyses (120).
In one or more embodiments of the invention, the statement modifier (140) corresponds to a program (e.g., a text editor) and/or system to modify statements (106) within the source code (105). The statement modifier (140) may be used to modify potential bug statements (112) and/or actual bug statements (116), identified by the static analysis engine (130), within the source code (105). The statement modifier (140) may also include functionality to add and delete statements (106) to and from the source code (105).
In one or more embodiments of the invention, the system (100) may be accessed using a user interface (not shown). The user interface may be a web interface, a graphical user interface (GUI), a command line interface, an application interface or any other suitable interface. The interface may also include one or more web pages that can be accessed from a computer with a web browser and/or internet connection. Alternatively, the interface may be an application that resides on a computing system, such as a PC, mobile devices, a PDA, and/or other computing devices of the users, and that communicate with the system (100) via one or more network connections and protocols.
Specifically,
In one or more embodiments of the invention, a set of static program analyses is obtained for evaluation of the potential bug statements (Step 220). Obtaining the set of static program analyses may involve selecting the static program analyses from a larger pool of available static program analyses. For example, based on the code, the type of software application, historical data associated with the software application (e.g., prior evaluation results), or other suitable criteria, a set of static program analyses may be selected for evaluating the potential bug statements. In another embodiment of the invention, the static program analyses may be dynamically selected. For example, a static program analysis for evaluation of the remaining potential bug statements may be selected based on the result of a previous evaluation (see Step 250 discussed below).
Continuing with
In one or more embodiments of the invention, the potential bug statements are evaluated using the selected static program analysis to determine a subgroup of bug free statements (Step 250). For example, the potential bug statements may be parsed and checked for specific errors (e.g., buffer overflows, double frees, etc.), and if the specific errors are not found the potential bug statements may be deemed as bug free statements. An example involves a static program analysis directed at model checking that checks the structure of a software module for a logical formula. In this example, code may be translated to a finite state machine, where each node in the finite state machine is defined by a set of values (e.g., global variables, stacks, and heaps). The code may then be tested if it is possible to reach a set of values that does not match one of the nodes in the finite state machine. If such a set of values exist, the potential bug statements are not bug free statements. Accordingly, they may be deemed as actual bug statements and may remain classified as potential bug statements or may be removed from the set of potential bug statements and reclassified as actual bug statements. If such a set of values does not exist, then the statements may be deemed as bug free statements.
Once a set of bug free statements are identified based on the evaluation of the potential bug statements, the bug free statements may be removed (Step 260) if they do not need to be tested for other bugs or using other static program analyses. The removed statements may be stored separately or simply deleted from the potential bug statements. In addition, the set of actual bug statements may also be filtered out for modification by a user (not shown).
In one or more embodiments of the invention, a determination is made whether the remaining potential bug statements after filtering out the bug free statements need to be further reduced (Step 270). The determination may be based on whether the potential bug statements have been reduced to a certain predetermined number of potential bug statements. In another embodiment of the invention, the decision to further reduce the potential bug statements may be based on the number of actual bugs found. For example, if a high number of actual bugs are found, the potential bug statements may all need to be reviewed and accordingly, not further reduced. Alternatively, if a very low number of potential bug statements are found, it is less likely that a large number of errors still exist in the remaining potential bug statements. Accordingly, if needed the remaining potential bug statements may be further reduced by another recursion (Steps 230-Step 260). Thereby, one or more embodiments of the invention allow for reduction of the statements within the source code that are classified as potential bug statements. Filtering out bug free statements reduces the total number of potential bug statements that need to be manually inspected by a programmer or other user to check for errors. Furthermore, identifying actual bug statements from the potential bug statements alerts a user for correction of the source code.
In one or more embodiments of the invention, one or more of the steps shown in
In one or more embodiments of the invention, Step 304, which may be executed concurrently with Step 302, the static program analysis B is executed using the potential bug statements A (325) as input to determine whether the potential bug statements A (325) are bug free statements B (326), potential bug statements B (330), and/or actual bug statements B (327). Accordingly, a portion of the output of static program analysis A from Step 302 is used as input for static program analysis B in Step 304. Similar to static program analysis A, static program analysis B results in a continuous and/or periodic flow of input statements and output statements.
In one or more embodiments of the invention, any number of static program analyses may be executed concurrently using the output of the previous static program analysis as input. After the final static program analysis has been executed (Step 310), the remaining potential bug statements (395) may be reviewed manually for errors. In one or more embodiments of the invention, the remaining potential bug statements (395) are fewer than the initially identified potential bug statements (320). Accordingly, embodiments of the invention may allow for a reduction of the number of potential bug statements that need to be manually reviewed for finding errors. Further, embodiments of the invention may allow for finding actual bug statements in a continuous and/or periodic manner for correction. Furthermore, embodiments of the invention may allow for the concurrent analysis of different portions of the code.
In this example, in order to check for buffer overflows (i.e., three of the above four errors) a static program analysis is first used to analyze the sample code based on constant propagation and check on any write array accesses where the index is constant, to determine if it is out of bounds. In the example, variable n, initialized in line 5 with a constant value, can be constant folded into its uses at lines 12 and 15, leading to the snippet of code, shown in
Next, a static program analysis is used a partial evaluation technique to find the second bug in the sample code. Any loop that accesses an array and has a constant number of iterations can be analyzed by creating a slice of the loop that contains the statements that are relevant for the array access. This small slice of code can be augmented with a test for out of bounds access, and code can be generated for the augmented slice (using a JIT compiler) to very quickly determine if any memory is accessed outside the bounds of the loop. This technique is relevant to the code in lines 12-14 of
Thereafter, a static program analysis is used to perform a symbolic analysis using affine constraints. Array accesses that are based on indexes that are non-constant require a more complex technique to analyze them. The slice for the array access at line 20 is shown in
The invention may be implemented on virtually any type of computer regardless of the platform being used. For example, as shown in
Further, those skilled in the art will appreciate that one or more elements of the aforementioned computer system (500) may be located at a remote location and connected to the other elements over a network. Further, the invention may be implemented on a distributed system having a plurality of nodes, where each portion of the invention (e.g., results repository, static analysis engine, static program analyses, statement modifier, etc.) may be located on a different node within the distributed system. In one embodiment of the invention, the node corresponds to a computer system. Alternatively, the node may correspond to a processor with associated physical memory. The node may alternatively correspond to a processor with shared memory and/or resources. Further, software instructions to perform embodiments of the invention may be stored on a computer readable medium such as a compact disc (CD), a diskette, a tape, a file, or any other computer readable storage device.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
Number | Name | Date | Kind |
---|---|---|---|
6367041 | Statovici et al. | Apr 2002 | B1 |
6823507 | Srinivasan et al. | Nov 2004 | B1 |
7047463 | Organ et al. | May 2006 | B1 |
7165074 | Avvari et al. | Jan 2007 | B2 |
7168009 | Darringer et al. | Jan 2007 | B2 |
7178063 | Smith | Feb 2007 | B1 |
7253606 | Loh et al. | Aug 2007 | B2 |
7340726 | Chelf et al. | Mar 2008 | B1 |
7562255 | El Far et al. | Jul 2009 | B2 |
7571390 | Langkafel et al. | Aug 2009 | B2 |
7788640 | Grimaldi | Aug 2010 | B2 |
7900193 | Kolawa et al. | Mar 2011 | B1 |
8079019 | Lindo et al. | Dec 2011 | B2 |
8230401 | Branca et al. | Jul 2012 | B2 |
20020166089 | Noy | Nov 2002 | A1 |
20030233635 | Corrie | Dec 2003 | A1 |
20040128584 | Mandava et al. | Jul 2004 | A1 |
20040255277 | Berg et al. | Dec 2004 | A1 |
20050015752 | Alpern et al. | Jan 2005 | A1 |
20050081104 | Nikolik | Apr 2005 | A1 |
20050081106 | Chang et al. | Apr 2005 | A1 |
20060080578 | Thiagarajan et al. | Apr 2006 | A1 |
20060085681 | Feldstein et al. | Apr 2006 | A1 |
20060150160 | Taft et al. | Jul 2006 | A1 |
20060225056 | Mukkavilli | Oct 2006 | A1 |
20060248519 | Jaeger et al. | Nov 2006 | A1 |
20060253739 | Godefroid et al. | Nov 2006 | A1 |
20060253841 | Rioux | Nov 2006 | A1 |
20070006194 | Mejri et al. | Jan 2007 | A1 |
20070028220 | Miller et al. | Feb 2007 | A1 |
20070061781 | Bryan | Mar 2007 | A1 |
20070234300 | Leake et al. | Oct 2007 | A1 |
20070234305 | Mishra et al. | Oct 2007 | A1 |
20080222609 | Barry et al. | Sep 2008 | A1 |
20080244536 | Farchi et al. | Oct 2008 | A1 |
20080256392 | Garland et al. | Oct 2008 | A1 |
20080270992 | Georgieva et al. | Oct 2008 | A1 |
20080276228 | Sreedhar | Nov 2008 | A1 |
20090044177 | Bates et al. | Feb 2009 | A1 |
20090070643 | Anvekar et al. | Mar 2009 | A1 |
20090125887 | Kahlon et al. | May 2009 | A1 |
20090259989 | Cifuentes et al. | Oct 2009 | A1 |
20090307664 | Huuck et al. | Dec 2009 | A1 |
Entry |
---|
Beyer, D., Henzinger, T. A., Jhala, R., and Majumdar, R., Checking Memory Safety with Blast, Fundamental Approaches to Software Engineering (FASE), May 2005, Springer-Verlag, Berlin, 17 pages. |
Le, W. and Soffa, M. L., Refining Buffer Overflow Detection via Demand-Driven Path-Sensitive Analysis, Program Analysis for Software Tools & Engineering, Jun. 13-14, 2007, San Diego, California, 6 pages. |
Number | Date | Country | |
---|---|---|---|
20090259989 A1 | Oct 2009 | US |