Software bugs generally refer to errors, flaws, mistakes, and/or other faults in computer programs that can produce incorrect or unexpected results. For example, some software bugs may cause a computer to crash or freeze because of memory access violation, memory leaks, or other types of defects. Other software bugs may allow attackers to take control of other users' computers, to spy on other users, and/or otherwise injure unsuspecting users.
Programming mistakes and errors are believed to cause most software bugs. Various debugging techniques have been developed to discover such mistakes and errors. Examples of such debugging techniques include code coverage testing, fault injection, mutation testing, fuzz testing, and exploratory testing. However, these debugging techniques may still be unsuitable and/or ineffective for catching various types of software bugs.
Aspects of the present technology are directed to computer testing systems and processes for testing and/or debugging computer programs. In certain embodiments, the present technology may include techniques to discover use-after-free bugs, type confusion bugs, and/or other types of software bugs. In other embodiments, the present technology may also include techniques to at least facilitate and/or assist in debugging computer programs.
In one aspect, the present technology provides a computer testing system that includes an initial processing component and a runtime component. The initial processing component can insert testing instructions into a computer program. The runtime component can then execute the computer program with the inserted instructions and monitor a type, a version, and/or other suitable characteristics of individual objects of the executed computer program in a computer memory (e.g., heap, stack, etc.).
In certain embodiments, the testing system may assign a unique version (or identifier) to memory locations holding the individual objects and/or corresponding pointers. When the computer program dereferences a pointer during execution, the testing system may compare (1) a version of the dereferenced pointer location (i.e., the pointer version) to (2) a version of an object in the memory location pointed by the pointer (i.e., the object version). If the pointer version does not match the object version, the testing system may raise and/or record an alarm for use-after-free bugs.
In other embodiments, the testing system may associate and record a type for individual objects in the computer memory. For example, the testing system may associate an integer, floating point, and/or other suitable type with a particular memory location holding a structure or parameter. During execution, the testing system may compare the types of memory locations referred to by a source operand or destination operand. If the types of the memory locations do not match, the testing system may raise and/or record an alarm or flag for type confusion bugs.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Various embodiments of computer testing systems, components, modules, routines, and processes are described below. In the following description, example software codes, values, and other specific details are included to provide a thorough understanding of various embodiments of the present technology. A person skilled in the relevant art will also understand that the technology may have additional embodiments. The technology may also be practiced without several of the details of the embodiments described below with reference to
As discussed in the Background section, software bugs can cause computer programs to produce incorrect or unexpected results. One type of such software bug is a type confusion bug that occurs when a program assigns a source operand of a particular type (e.g., integer) to a destination operand of a different type (e.g., floating). Another type of software bug is a use-after-free bug that occurs when a program reuses a memory location after the memory location has been de-allocated. The following text describes certain embodiments of type checking and version checking techniques that a user may apply to address at least some of the foregoing software bugs. In other embodiments, a user may apply the described techniques for addressing other suitable types of software bugs.
As shown in
The initial processing component 102 may be configured to insert test instructions into an original program 120. The original program 120 with the inserted test instructions forms a processed program 122 for execution by the runtime component 104. In certain embodiments, the test instructions may be added for monitoring a function entry, function return, memory read, memory write, dynamic memory allocation, and/or dynamic memory de-allocation. In other embodiments, the test instructions may also be added for monitoring usage of memory, usage of particular instructions, and/or frequency and duration of function calls.
In one embodiment, the original program 120 may be in source code (e.g., C++). The initial processing component 102 may include a source code editor configured to add the test instructions to the original program 120. The initial processing component 102 may also include a compiler configured to compile the original program 120 with the added test instructions to generate the processed program 122 in object or machine code. In one example, Microsoft Visual Studio provides both a suitable source code editor and a compiler when the source code of the original program 120 is in C or C++.
In another embodiment, the original program 120 may be in object or machine code. The initial processing component 102 may include a binary instrumentation tool configured to add the test instructions in binary form to the original program 120 to generate the processed program 122. One suitable binary instrumentation tool is Pin provided by Intel Corp. of Santa Clara, Calif.
In further embodiments, a user may determine that the original program 120 includes a first portion in source code and a second portion in object code. In one embodiment, the user may compile the first portion of the original program 120 into object code before the test instructions are added using the binary instrumentation tool. In other embodiments, the user may process the first portion using the source code editor and compiled with the compiler. The user may then process the second portion with the binary instrumentation tool and subsequently combine it with the compiled first portion having the added test instructions.
The initial processing component 102 may also be configured to identify and collect type information for static variables, function parameters, local variables, structures, and/or other objects in the original program 120. As used herein, the phrase “type” generally refers to a classification identifying at least one of various categories of data. The classification may determine possible values for a type, operations that may be performed on values of the type, and the way values of the type may be stored. In one example, a type may include a primitive type, e.g., floating-point, integer, Boolean, etc. In other examples, a type may include a program-defined type or types. In further examples, a type may include a combination of and/or nested primitive and program-defined types.
In one embodiment, the initial processing component 102 may be configured to collect the type data from information generated by a compiler/debugger during compilation of the original program 120. For instance, if the original program 120 is in C or C++ and is compiled with Microsoft Visual Studio, the initial processing component 102 may collect the type information from a program database (“PDB”) file generated by the compiler. For example, the original program 120 may include the following C statement:
The initial processing component 102 may also be configured to associate the collected type information with individual objects of the original program 120. In the illustrated embodiment, the objects and type data are organized and stored in an initial type database 124. For example, in the example instruction above, the initial type database 124 can store the memory location of the object pA with the type struct A*. In other embodiments, the objects and the collected type information may be organized and/or stored in other suitable data structures.
Optionally, the initial processing component 102 may be configured to identify and generate use-define data 126 (e.g., use-define chains, shown in phantom lines for clarity) of function parameters, local variables, structures, and/or other objects in the original program 120. As used herein, the phrase “use-define data” generally refers to data that include a use of a variable and all the definitions of the variable that can reach the use without any other intervening definitions.
In one embodiment, the initial processing component 102 is configured to perform a static use-define analysis on the original program 120. For example, the original program 120 may include the following machine code instructions:
As shown above, the use-define analysis shows that is the variable ebx is used in “mov ecx, [ebx]” and is defined by memory location [ebp-8] as shown in a use-define chain below:
The runtime component 104 is configured to (a) receive the processed program 122, the initial type database 124, and the optional use-define data 126 and (b) execute the processed program 122 to generate test results 128. As shown in
The type module 106 may be configured to associate a type with individual memory locations holding objects when the processed program 122 is executed. For example, the type module 106 may associate an integer, floating point, and/or other suitable type with a memory location holding a particular structure or parameter. The type module 106 may compare the type of the particular memory location holding the structure or parameter with that of a source (or destination) operand location. If the types do not match, the type module 106 may raise and/or record an alarm or flag for a type confusion bug. Embodiments of the type module 106 and the foregoing type checking techniques are described in more detail below with reference to
The version module 108 may be configured to assign a unique version to individual memory locations holding objects and corresponding pointers. As used herein, the word “version” generally refers to a unique identifier. For example, the version may include globally unique identifiers, sequential numbers, random numbers, or random alphanumeric codes assigned individually to each memory location. In other examples, the version may include other suitable identifiers.
The assigned versions may then be used during runtime for finding use-after-free bugs. For example, in certain embodiments, the version module 108 can compare (1) the version of the memory location holding the dereferenced pointer (i.e., the pointer version) to (2) a current version the object in the memory location pointed to by the pointer (i.e., the object version) when a pointer is dereferenced. If the pointer version does not match the object version, the version module 108 may raise and/or record an alarm for use-after-free bugs. Embodiments of the version module 108 and the foregoing version checking techniques are described in more detail below with reference to
In other embodiments, the computer testing system 100 may optionally include a cause analysis component 110. The cause analysis component 110 may be configured to analyze test results 128 from at least one of the type module 106 and the version module 108 and provide at least an estimate or general indication of the cause of a particular software bug. The cause analysis component 110 may include instructions for software tracing, event logging, and/or other suitable instructions. In the illustrated embodiment, the cause analysis component 110 is independent from the runtime component 104. In other embodiments, the cause analysis component 110 may be integral to the runtime component 104. In further embodiments, the cause analysis component 110 may be omitted.
In operation, the initial processing component 102 may perform a static analysis on the original program 120. During the static analysis, the initial processing component 102 inserts test instructions into the original program 120 to generate the processed program 122, identifies and collects initial type information for objects in the original program 120, and optionally generates use-define data 126 for objects in the original program 120. In other embodiments, the initial processing component 102 may perform the foregoing functions dynamically or in a just-in-time fashion.
The runtime component 104 then executes the processed program 122 to check for type confusion errors, use-after-free errors, and/or other errors in the original program 120. In one embodiment, the runtime component 104 uses the use-define data 126 for version checking. In other embodiments, the use-define data 126 may be omitted. Instead, the initial processing component 102 may be configured to insert instructions for monitoring register moves in the processed program 122 during runtime. The optional cause analysis component 110 may then analyze, suggest, or identify a cause of the individual errors in the original program 120. Embodiments of operation of the runtime component 104 are described in more detail below with reference to
The process 200 also includes a decision block 204 to determine whether to perform a type check. In one embodiment, the process 200 may perform a type check on every executed instruction. As a result, the block 204 may be omitted. In other embodiments, the process 200 may perform a type check on some instructions based on certain conditions. For example, in one embodiment, if the executed instruction does not involve a memory operation, the process 200 may determine that a type check is not needed. In another example, if both a source operand and a destination operand of an assignment instruction have a matching type, the process 200 may determine that a type check is not needed. As used herein, a “source operand” generally refers to an object whose value is to be assigned to another object (referred to as the “destination operand”). In further embodiments, the determination may be based on other suitable conditions.
If a type check is to be performed, the process 200 proceeds to performing a type check on the executed instruction at block 206. Embodiments of performing a type check are described in more detail below with reference to
The process 200 then includes a decision block 210 to determine whether the process should continue. In one embodiment, the process 200 continues if the processed program 122 includes additional instructions. In other embodiments, the process 200 may continue based on other suitable conditions. As a result, the process reverts to executing another instruction of the processed program 122 at block 202. Otherwise, the process ends.
Even though
The type inspection routine 302 may be configured to check and determine whether a source or destination operand has a type in at least one of the initial type database 124 and the object type database 308. As described above, the initial type database 124 can include type information for function parameters, local variables, structures, and/or other objects in the processed program 122. Similarly, the object type database 308 can include type information for memory locations allocated to objects in the processed program 122. For example, in one embodiment, the object type database 308 can include the following data records:
Thus, as shown above, each memory address (e.g., [value 1] or [value 2]) identifies a memory location that is associated with a type (e.g., struct A or struct B). In other embodiments, the object type database 308 may be organized and/or stored in other suitable data structures.
The type comparison routine 304 may be configured to compare types of a source operand and a destination operand and determine if the compared types match. For example, in one embodiment, the type comparison routine 304 may be configured to determine if the type of the source operand exactly matches that of the destination operand. In another embodiment, the type comparison routine 304 may be configured to reduce the type of at least one of the source operand and the destination operand into a set of primitive data types (e.g., integer, floating point, Boolean, etc.) and their respective type locations (e.g., bit offset) in the source and/or destination operand. The type comparison routine 304 may then determine whether the set of the source operand is a subset of that of the destination operand at the same type locations, or vice versa. In further embodiments, the type comparison routine 304 may be configured to compare types based on other suitable rules determined by a user, a programming language, and/or other suitable sources. Results from the type comparison routine 304 may then be stored in the test results 128.
The type database routine 306 may be configured to organize records, including the initial type database 124 and the object type database 308, and facilitates storing and retrieving of these records. Any type of database organization may be utilized, including a flat file system, hierarchical database, relational database, or distributed database, such as provided by a database vendor such as the Microsoft Corporation of Redmond, Wash.
If the source type is available, the process 400 proceeds to inspecting a destination operand with the type inspection routine 302 at block 406. Embodiments of inspecting the destination operand are described in more detail below with reference to
If the destination type is not available, the process 400 proceeds to updating the destination type in the object type database 308 (
If the destination type is available, the process 400 then includes comparing the destination type with the source type using the type comparison routine 304 (
If the source type and the destination type do not match, the process 400 includes raising an alarm at block 416 to indicate that a type mismatch error has occurred. As a result, a type confusion bug may exist in the executed instruction. The process 400 may then include storing the alarm in the test results 128 (
The process 402 may then include a decision block 422 to determine whether the source type is available in the object type database 308. If the source type is available in the object type database 308, the process 402 proceeds to indicating that the source type is available at block 424. Then, the process returns.
If the source type is not available in the object type database 308, the process 402 includes inspecting the source operand in the initial type database 124 (
The process 402 may then include a decision block 428 to determine whether the source type is available in the initial type database 124. If the source type is available in the initial type database 124, the process 402 includes updating the source type in the object type database 308 with the retrieved records from the initial type database 124 at block 430. In one embodiment, the source type in the object type database 308 may be updated with the same records from the initial type database 124. In other embodiments, the source type may be updated with a subset of the records from the initial type database 124. In further embodiments, the source type in the object type database 308 may be associated with select records from the initial type database 124 following rules determined by a user, a programming language, and/or other suitable sources. The process 402 proceeds to indicating that the source type is available at block 424, and then the process returns.
The memory monitor routine 502 may be configured to monitor and/or determine memory operations when the processed program 122 is executed. For example, in one embodiment, the memory monitor routine 502 can be configured to monitor at least one of function entry, function return, memory read, memory write, dynamic memory allocation, and dynamic memory de-allocation. In other embodiments, the memory monitor routine 502 may be configured to monitor other suitable memory operations.
The version comparison routine 504 is configured to compare a version of a dereferenced pointer (i.e., the pointer version) to a version of an object in the memory location pointed to by the pointer (i.e., the object version). If the pointer version does not match the object version, the version comparison routine 504 may raise and/or record an alarm for use-after-free bugs.
The version database routine 506 is configured to organize records, including the optional use-define data 126, the object version database 508, and the pointer version database 510, and facilitates storing and retrieving of these records. Any type of database organization may be utilized, including a flat file system, hierarchical database, relational database, or distributed database, such as provided by a database vendor such as the Microsoft Corporation of Redmond, Washington.
The object version database 508 can include version data for memory locations holding objects allocated to the corresponding memory locations. For example, in one embodiment, the object version database 508 can include the following data record:
Similarly, the pointer version database 510 can include a data structure as follows:
Thus, in the example above, the object address in the object version database 508 has a value of the pointer (e.g., P). The pointer address in the pointer version database 510 has a value that is the memory address holding the pointer P, i.e., &P. In other embodiments, at least one of the object version database 508 and the pointer version database 510 may have other suitable types of data structures.
If a memory write is detected, the process 600 includes updating the pointer version database 510 with a unique version value for the pointer based on the version value for the object at block 610. In one embodiment, the version values are equal to each other. In other embodiments, the version values may have other relationships. Then, the process returns. If a memory write is not detected, the process 600 proceeds to another decision block 612 to determine whether the executed instruction involves a memory read or dereferencing a pointer. If a memory read is not detected, the process returns.
If a memory read is detected, the process 600 proceeds to inspecting both the object version database 508 (
Based on the memory location for the pointer &P, the version database routine 506 may query the pointer version database 510 to determine if a version value is present for &P. Based on the value of the pointer (i.e., P), the version database routine 506 may query the object version database 508 to determine if a version value is present for P.
The process 600 may include a decision block 616 to determine if the memory locations for both the object and the pointer have version values. If at least one of the version values is not present, the process returns. If both the version values are present, the process 600 proceeds to comparing the version values to each other at block 618. The process 600 may then include another decision block 620 to determine whether the version values match. In one embodiment, the version values are indicated as a match if they are equal to each other. In other embodiments, the version values may be indicated as a match based on rules determined by a user, a programming language, and/or other suitable sources. If the version values match, the process returns. If the version values do not match, the process 600 includes raising an alarm at block 622. The alarm may be stored in the results 128 (
Several embodiments of the process 600 may at least facilitate finding use-after-free bugs in the original program 120 (
At line 1, a memory location is allocated to an integer with pointer A. As a result, the process 600 assigns both the object at the memory location and pointer A with a first version (e.g., 1) at blocks 604 and 608. At line 2, the memory location is freed and may be allocated again. At line 3, the same memory location is re-allocated to an integer with pointer B. Thus, the process 600 assigns a second version (e.g., 2) to both the object at the memory location and pointer B.
At line 4, a memory read is attempted using pointer A. At this point, the version values of pointer A, pointer B, and the object at the memory location 0x42 are as follows:
Thus, when the process 600 compares the first version (i.e., 1) of pointer A to the second version (i.e., 2) of the object at the memory location at block 620, a mismatch is detected. As a result, the process 600 may raise an alarm at block 622 and indicate that the original program 120 attempts to access a memory location with a pointer that has been freed.
The version module 108 (
Specific embodiments of the technology have been described above for purposes of illustration. However, various modifications may be made without deviating from the foregoing disclosure. In addition, many of the elements of one embodiment may be combined with other embodiments in addition to or in lieu of the elements of the other embodiments. Accordingly, the technology is not limited except as by the appended claims.