Dataflow analysis is often used to determine program state with respect to a particular point of a software program. For example, dataflow analysis may track program state at a particular point of a software program and determine whether or not the particular point of the software program contains a programming defect. Dataflow analysis may be path-insensitive or path-sensitive. Path-insensitive dataflow analysis computes program state at the particular point of the software program without regard to the particular execution path taken to reach the particular point. Such path-insensitive dataflow analysis may be relatively efficient (e.g., linear complexity proportional to program length, O(n)), but the results of the path-insensitive dataflow analysis are limited. For example, the results may not detect defects in the software program that appear only when specific execution paths are taken. The results may also report false positives (i.e., defects that do not actually exist in the software program).
Although path-sensitive analysis may be used to improve the accuracy of the analysis, current systems of performing path-sensitive dataflow analysis typically incorporate theorem provers that are more computationally expensive than path-insensitive dataflow analysis. The increase in computational complexity may be at least partly attributed to modification and duplication of control flow graphs that are generated during analysis of the software program. For example, certain systems may generate a new copy of a control flow graph each time a conditional statement is encountered. Thus, such systems may consume a large amount of memory space and processor resources.
The present disclosure describes an on-demand path-sensitive dataflow analysis that includes path refinement. Path refinement may provide more accuracy than computationally inexpensive path-insensitive dataflow analysis with less resource consumption than computationally expensive path-sensitive dataflow analysis. Path refinement may also be performed without use of resource-intensive operations, such as use of a theorem prover, modification of control flow graphs (CFGs), and duplication of CFGs.
An initial path-insensitive dataflow analysis is conducted to produce a set of potential defects in a computer program. The potential defects may be examined for infeasible paths, resulting in a reduced set of potential defects. The reduced set of potential defects is more accurate than the original set and may be used to make a defect determination regarding the computer program.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Systems, methods, and computer-readable media to perform path-sensitive dataflow analysis including path refinement, are disclosed. In a particular embodiment, a computer-readable medium includes instructions that, when executed by a computer, cause the computer to perform a path-insensitive dataflow analysis on a control flow graph (CFG) of a computer program to detect a set of potential defects in the computer program. The computer-readable medium also includes instructions, that when executed by the computer, cause the computer to perform a path-sensitive dataflow analysis to identify one or more infeasible paths of the CFG without modifying the CFG. The computer-readable medium further includes instructions, that when executed by the computer, cause the computer to remove potential defects associated with the one or more infeasible paths from the set of potential defects to generate a reduced set of potential defects. The computer-readable medium includes instructions, that when executed by the computer, cause the computer to output the reduced set of potential defects.
In another particular embodiment, a computer-implemented method is disclosed that includes determining a control flow graph (CFG) for a computer program. The CFG includes a plurality of nodes, where each node represents an execution point of the computer program. The method includes performing a path-insensitive dataflow analysis of the CFG to determine whether a value of a state expression representing program state of the computer program at the particular node is path-insensitive or path-sensitive. When the value of the state expression is path insensitive, the method further includes outputting the path insensitive value. When the value of the state expression is path-sensitive, the method further includes outputting a path-refined value of the state expression, where the path-refined value is determined without modifying the CFG.
In another particular embodiment, a system is disclosed that includes a memory and a processor coupled to the memory. The processor is configured to execute instructions to perform a path-insensitive dataflow analysis with respect to nodes of a control flow graph (CFG) representing source code of a computer program to detect a set of potential defects in the computer program. The processor is also configured to execute instructions to perform a path-sensitive dataflow analysis to identify one or more infeasible paths of the CFG without modifying the CFG. The processor is further configured to execute instructions to remove potential defects associated with the one or more infeasible paths from the set of potential defects to generate a reduced set of potential defects. The processor is configured to execute instructions to output the reduced set of potential defects.
The system 100 of
The CFG determination logic 110 may generate a CFG 120 for the computer program 102. In a particular embodiment, the CFG 120 is a directed graph of nodes connected via edges, where each node represents a different execution point of the computer program 102. Thus, the CFG 120 may represent various possible execution paths of the computer program 102. CFG generation is further described with reference to
Path-insensitive dataflow analysis logic 130 may perform a path-insensitive dataflow analysis of the CFG 120. For example, the path-insensitive dataflow analysis logic 130 may track program state of the computer program 102 from a beginning of the computer program to the particular execution point of the computer program 102 and may represent the program state at the particular execution point in a state expression 140. When the value of the state expression 140 is path-insensitive, the system 100 may output a defect determination 106 (e.g., a set of potential defects) based on the path-insensitive value of the state expression 140. For example, the system 100 may determine whether or not a pointer that is dereferenced at the particular execution point of a computer program can be zero or null based on a path-insensitive value (e.g., “pointer=always null” or “pointer=always not null”) of a state expression that represents program state at the particular execution point. Path-insensitive dataflow analysis is further described with reference to
Path refinement logic 150 may perform a path refinement procedure on the state expression 140 when a value of the state expression 140 is path-sensitive (e.g., “pointer=maybe null”). For example, a first execution path leading to the particular execution point may have a state expression “pointer=always null” and a second execution path leading to the particular execution point may have a state expression “pointer=always not null.” Thus, the value of the state expression 140 may be path-sensitive (e.g., “maybe null”), because the value of the state expression 140 depends on whether the first path or the second path is taken to reach the particular execution point.
In a particular embodiment, the path refinement procedure includes detecting and removing values associated with infeasible paths of the CFG 120 from the state expression 140. The path refinement procedure may also recursively split sub-paths of the CFG 120. Thus, the path refinement procedure may be considered a path-sensitive dataflow analysis, because execution of the path refinement procedure is dependent on the particular paths of the CFG 120. For example, the path refinement logic 150 may determine a path-refined value (e.g., “pointer=always not null”) of the state expression 140 that is more accurate than the path-sensitive value (e.g., “pointer=maybe null”) of the state expression 140. The system 100 may output a defect determination 108 (e.g., a reduced set of potential defects) based on the path-refined value determined by the path refinement logic 150 based on the state expression 140. Path refinement is further described with reference to
In operation, the CFG determination logic 110 may initiate defect determination via dataflow analysis by determining the CFG 120 for the computer program 102. The path-insensitive dataflow analysis logic 130 may determine a value of the state expression 140 that represents the state of the computer program 102 at a particular execution point (i.e., particular node of the CFG 120). When the value of the state expression 140 is path-insensitive, the system 100 outputs the defect determination 106 based on the path-insensitive value of the state expression 140. When the value of the state expression 140 is path-sensitive, the path refinement logic 150 may determine a path-refined value of the state expression 140. In many cases, the path-refined value is more accurate than the path-sensitive value. The system 100 may output the defect determination 108 based on the path-refined value of the state expression 140. For example, a user of the system 100 may be notified whether the path-refined value indicates a programming defect in the computer program 102.
It will be appreciated that the system 100 of
An exemplary path sensitive dataflow analysis in accordance with the disclosure is further illustrated with reference to
The source code 200 includes two variables: an integer “y” and a pointer to an integer “p.” The source code 200 further accepts an integer “x” as a parameter. Thus, a program state at any line of the source code 200 will include one or more of a value of “y,” a value of “p,” and a value of “x.”
The source code 200 includes a first conditional statement 210. If the value of “x” is equal to zero (e.g., a comparison between the value of “x” and zero is “true”), execution proceeds to a first assignment statement 220, where the value of an address (e.g., in memory) of “y” is assigned to the value of “p.” If the value of “x” is not equal to zero (e.g., the comparison between the value of “x” and zero is “false”), execution proceeds to a second assignment statement 230, where a null pointer value is assigned to the value of “p.” Regardless of which assignment statement 220, 230 is executed, execution then proceeds to the unrelated code portion 240.
After the unrelated code portion 240 is executed, execution proceeds to a second conditional statement 250. Like the first conditional statement 210, the second conditional statement 250 compares the value of “x” with zero. If the comparison is “true,” execution proceeds to a third assignment statement 260, where the value 5 is assigned to the value pointed to by “p.” It will thus be noted that the third assignment statement 260 includes a pointer dereference operation. It will also be noted that if the value (i.e., address) stored in “p” is zero or null, an error condition may arise. Upon completion of the third assignment statement 260, execution proceeds to a function return 280. Alternatively, when the comparison of the second conditional statement 250 is “false,” execution proceeds to the function return 280 via an empty “else” branch 270 of the third conditional statement 250.
It should be noted that although the particular source code 200 illustrated in
Control flow begins at node B1 310 corresponding to the first conditional statement 210 of
At node B2 320 corresponding to the first assignment statement 220 of
As illustrated in
At node B6 360 corresponding to the third assignment statement 260 of
The source code 200 of
A path-insensitive dataflow analysis 400 may initially be performed on the CFG 300 of
Immediately prior to the merge node B4 340 of
Thus, the path-sensitive dataflow analysis 400 results in a path-sensitive value 402 “maybe null,” indicating that the source code 200 may have a programming defect. To improve the accuracy of defect determination, a path-sensitive dataflow analysis may be performed via a path refinement algorithm. In a particular embodiment, the path refinement algorithm is executed based on recursive subdivision of control flow paths as follows:
In accordance with the path-refinement algorithm, an initial set S that includes the pair [(B1→B6), Merge(B2, B3)] and an empty result set R are created at 410. That is, the initial set S includes the path B1→B6 and the corresponding state expression Merge(B2, B3) having the value 402 “maybe null” as determined by the path-insensitive dataflow analysis 400.
Advancing to 420, the path B1→B6 is split because the state expression Merge(B2, B3) is path-sensitive. As illustrated by the CFG 300 of
Proceeding to 430, the first pair is examined and added to the result set R because the first pair includes a path-insensitive state expression “not null.” When the second pair is examined, it is determined that the path (B1→B3→B4→B6) is infeasible.
That is, the path (B1→B3→B4→B6) cannot occur during execution of the source code 200 of
Advancing to 440, the state expression(s) in the result set R are output because the initial set S is empty. That is, a path-refined value 404 “not null” is output, indicating that the source code 200 does not include a programming defect.
It will thus be appreciated that path refinement may improve the accuracy of defect determination by improving the accuracy of state expressions. For example, in the particular embodiment illustrated in
The method 500 includes performing path-insensitive dataflow analysis on a control flow graph (CFG) of a computer program to detect a set of potential defects in the computer program, at 502. For example, in
The method 500 also includes performing a path-sensitive dataflow analysis to identify one or more infeasible paths of the CFG without modifying the CFG, at 504. For example, in
The method 500 further includes removing potential defects associated with the one or more infeasible paths from the set of potential defects to generate a reduced set of potential defects in the computer program, at 506. For example, in
The method 500 includes outputting the reduced set of potential defects, at 508. For example, in
The method 600 includes identifying a CFG for a computer program, at 602. The CFG includes a plurality of nodes, where each node represents an execution point of the computer program. For example, the CFG 300 of
The method 600 includes performing a path-insensitive dataflow analysis of the CFG to determine a value of a state expression representing program state of the program at a particular node, at 604. For example, the path-insensitive dataflow analysis 400 of
The method 600 further includes determining whether the value of the state expression is path-insensitive or path-sensitive, at 606. When the value of the state expression is path-insensitive, the method 600 includes outputting the path-insensitive value, at 608. When the value of the state expression is path-sensitive, the method 600 includes determining a path-refined value of the state expression without modifying or duplicating any node of the CFG, at 610. For example, the path-refined value 404 “not null” may be determined as illustrated in
The method 600 also includes outputting the path-refined value, at 614. For example, the path-refined value 404 of
The method 700 includes determining whether an initial set of paths is empty, at 702. For example, referring to
When the initial set of paths is not empty, the method 700 includes determining whether a particular path in the initial set of paths is infeasible, at 704. When the particular path is infeasible, the method 700 includes removing the particular path from the initial set of paths, at 705. For example, referring to
When the particular path is not infeasible, the method 700 includes determining whether the particular path includes a cycle, at 706. When the particular path includes a cycle, the method 700 includes removing the particular path from the initial set of paths and adding the particular path to a result set of paths, at 707. The method 700 returns to 702 from 707. When the particular path does not include a cycle, the method 700 includes determining whether a value of a state expression associated with the particular path is path-insensitive, at 708. When the value is path-insensitive, the method 700 includes adding the particular path to the result set of paths, at 709. For example, referring to
When the value of the state expression is not path-insensitive, the method 700 includes determining whether a maximum number of splitting operations have been performed, at 710. If the maximum number of splitting operations have been performed, the method 700 includes treating the path-sensitive value of the state expression like a path-insensitive value by advancing to 709. If the maximum number of splitting operations have not been performed, the method 700 includes splitting the particular path, at 711. Splitting the particular path may include removing the particular path from the initial set of paths and adding two or more distinct (e.g., non-identical) alternative paths to the initial set of paths. For example, referring to
It will be appreciated that the method 700 of
The computing device 810 includes at least one processor 820 and a system memory 830. Depending on the configuration and type of computing device, the system memory 830 may be volatile (such as random access memory or “RAM”), non-volatile (such as read-only memory or “ROM,” flash memory, and similar memory devices that maintain stored data even when power is not provided), or some combination of the two. The system memory 830 typically includes an operating system 832, one or more application platforms (e.g., an integrated development environment (IDE) 834), one or more applications (e.g., a compiler/debugger 836 and a defect tracking tool 837), and program data (e.g., source code 838) associated with the one or more applications. In an illustrative embodiment, the IDE 834, the compiler/debugger 836, and the defect tracking tool 837 include one or more of the logic 110, 130, 150 of
The computing device 810 may also have additional features or functionality. For example, the computing device 810 may also include removable and/or non-removable additional data storage devices such as magnetic disks, optical disks, tape, and standard-sized or miniature flash memory cards. Such additional storage is illustrated in
The computing device 810 may also have input device(s) 860, such as a keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 870, such as a display, speakers, printer, etc. may also be included. The computing device 810 also contains one or more communication connections 880 that allow the computing device 810 to communicate with other computing devices 890 over a wired or a wireless network.
It will be appreciated that not all of the components or devices illustrated in
The illustrations of the embodiments described herein are intended to provide a general understanding of the structure of the various embodiments. The illustrations are not intended to serve as a complete description of all of the elements and features of apparatus and systems that utilize the structures or methods described herein. Many other embodiments may be apparent to those of skill in the art upon reviewing the disclosure. Other embodiments may be utilized and derived from the disclosure, such that structural and logical substitutions and changes may be made without departing from the scope of the disclosure. Accordingly, the disclosure and the figures are to be regarded as illustrative rather than restrictive.
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, and process steps or instructions described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. Various illustrative components, blocks, configurations, modules, or steps have been described generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in computer readable media, such as random access memory (RAM), flash memory, read only memory (ROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to a processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor or the processor and the storage medium may reside as discrete components in a computing device or computer system.
Although specific embodiments have been illustrated and described herein, it should be appreciated that any subsequent arrangement designed to achieve the same or similar purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all subsequent adaptations or variations of various embodiments.
The Abstract of the Disclosure is provided with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, various features may be grouped together or described in a single embodiment for the purpose of streamlining the disclosure. This disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter may be directed to less than all of the features of any of the disclosed embodiments.
The previous description of the embodiments is provided to enable a person skilled in the art to make or use the embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.