METHOD OF DETERMINING WHICH COMPUTER PROGRAM FUNCTIONS ARE CHANGED BY AN ARBITRARY SOURCE CODE MODIFICATION

Information

  • Patent Application
  • 20100269105
  • Publication Number
    20100269105
  • Date Filed
    April 21, 2009
    15 years ago
  • Date Published
    October 21, 2010
    14 years ago
Abstract
In a method of determining which computer program functions are changed by a source code modification to a computer program's source code, the improvement of including the following steps, not necessarily performed in the order indicated:
Description
BACKGROUND OF THE INVENTION

When software developers discover a problem in a computer program (such as an operating system kernel), they typically create a patch to fix the problem. A patch is an arbitrary source code modification to the computer program, and it can result in changes to many functions within the computer program. Automatically determining what computer program functions are changed by an arbitrary source code modification can be useful for many software processes, such as determining how to “hot update” a computer program (i.e., apply a source code modification to a running program without restarting the program).


Determining which computer program functions are changed by a source code modification is an important task that a hot update system must accomplish. Prior hot update systems determined which functions changed as a result of a source code modification at the source code layer, and thus were subject to a number of limitations (for example, they do not handle function inlining or implicit casting correctly).


BRIEF SUMMARY OF THE INVENTION

The present invention is an improved method for determining which functions within a computer program are changed as a result of a source code modification.


Determining which functions within a computer program are changed as a result of a source code modification can be challenging in many cases. Consider a source code modification that changes a data type in a function prototype in a C header file (e.g., from an “int” to a “long long”). Because of implicit casting, this patch implies changes to the executable code of any functions that call the prototyped function. Any method that attempts to determine which functions are changed by this patch by looking only at source code, not at object code, will encounter the problem that the callers of the prototyped function have not had their source code modified at all, even after C preprocessing.


The present invention can identify which functions are changed by an arbitrary source code modification, while avoiding detecting extraneous differences. The present invention does not require any information about programming language semantics, such as information about the semantics of implicit casting in C.





BRIEF DESCRIPTION OF THE DRAWING

The present invention will become more fully understood from the detailed description given below and the accompanying drawing, which are given by way of illustration only and thus are not limitative of the present invention, wherein:



FIG. 1 illustrates a data storage medium having instructions stored therein for a computer to perform the method of the present invention.





DETAILED DESCRIPTION

As used herein, the term “computer program” or “program” refers to any computer program, including an operating system kernel.


The present invention determines which functions are changed by a source code patch while operating entirely at the object code layer—in other words, by looking at compiler output rather than the source-level contents of the patch.


The present invention must deal with the complication that compiler output can obscure the desired changes by introducing extraneous differences. Without taking any special measures, the object code corresponding to before and after the source code modification will contain many extraneous differences. These extraneous differences are not the result of semantic changes (i.e., changes that adjust the meaning of the code) introduced by the source code modification. For example, a number of extraneous differences result from location assumptions inherent in object code, e.g., the offsets provided to immediate jump instructions that are calculated relative to the program counter. The present invention makes it possible to generate a list of functions that are changed by an arbitrary source code modification, gaining the benefit of working at the object code layer while avoiding extraneous differences.


In order to avoid extraneous differences, the present invention employs compiler modifications and a specially-designed comparison process. Specifically, the compiler is modified to ensure that it generates relocations for all references to functions and data structures, which results in more general code that does not make assumptions about where other functions and data structures are located in memory. This compiler behavior can be accomplished using many different techniques, such as the “ffunction-sections” and “fdata-sections” configuration controls for the GNU C compiler.


As used below, the term “pre object code” refers to the output of compiling the computer program's original source code using the modified compiler. As used below, the term “post object code” refers to the output of compiling the computer program's modified source code using the modified compiler.


In order to determine which functions were changed by the source code patch, the present invention compares the object files by comparing corresponding object code sections between the pre object code and the post object code.


If a function has not changed, the non-relocation contents of the pre object code and the post object code will be identical, and all of the relocations will be equivalent. Two relocations are equivalent if they refer to program functions that have the same name, or refer to program data objects that have the same contents.


In order to determine which functions are changed by a source code patch to a computer program, the present invention performs the following process.


First, the present invention compiles the computer program's original source code, using a compiler modified or configured to generate, in compiler output, a relocation entry for each program access to a function or data item. The result of this compilation is the pre object code.


Second, the present invention compiles the computer program's source code, modified by applying the source code patch, using a compiler modified or configured to generate, in compiler output, a relocation entry for each program access to a function or data item. The result of this compilation is the post object code.


Using the results of the previous two steps, the present invention constructs a list of object code differences by comparing the pre object code with the post object code, excluding any object code differences where the pre object code and post object code both contain relocations that are equivalent.


The object code differences resulting from this process provide a list of the functions that have changed as a result of the source code modification.


Thus, the improved method of the present invention can be summarized as follows:


In a method of determining what computer program functions are changed by a source code modification to a computer program's source code, the improvement of including the following steps in said method, with the order of steps (a) and (b) being interchangeable:


step (a)—compiling said computer program's source code, using a compiler modified or configured to generate, in compiler output, a relocation entry for each program access to a program function or a program data item; and


step (b)—compiling source code resulting from modifying said computer program's source code with said source code modification, using a compiler modified or configured to generate, in compiler output, a relocation entry for each program access to a program function or a program data item; and


step (c)—constructing a list of object code differences by comparing the object code produced from step (a) versus the object code produced from step (b), and excluding from the list any object code difference for which:

    • the object code produced from step (a) contains a relocation entry at the position of said object code difference; and
    • the object code produced from step (b) contains a relocation entry at the position of said object code difference; and
    • said relocation entries are equivalent.



FIG. 1 illustrates a data storage medium 1 having instructions stored therein for a computer 2 to perform the method of the present invention.


The invention being thus described, it will be obvious that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention. Rather, the scope of the invention shall be defined as set forth in the following claims and their legal equivalents. All such modifications as would be obvious to one skilled in the art are intended to be included within the scope of the following claims.

Claims
  • 1. (canceled)
  • 2. (canceled)
  • 3. A method comprising: identifying a function changed by source code modification in a portion of executable object code; andidentifying extraneous differences by examining the executable code.
  • 4. The method of claim 3, wherein identifying the function includes: generating relocations for references to the functions and data structures in the object code;building pre object code using original source code; andbuilding post object code using modified source code.
  • 5. The method of claim 4 further comprising: determining that the function has not changed, in response to determining that non-relocation contents of the pre object code and the post object code are identical, and all of the corresponding relocations are equivalent.
  • 6. The method of claim 5, wherein determining that all of the corresponding relocations are equivalent includes determining that two relocations are equivalent if the corresponding relocations refer to a program function with the same name and refer to program data objects having the same contents for data objects.
  • 7. The method of claim 5 further comprising constructing a list of object code differences by comparing the pre object code with the post object code, excluding object code differences where the pre object code and post object code both include relocations that are equivalent.
  • 8. The method of claim 5 further comprising determining whether extraneous differences result from location assumptions inherent in object code.
  • 9. The method of claim 4 further comprising providing a modified compiler to generate the relocations for references.
  • 10. The method of claim 9, wherein the modified compiler to used to build the pre object code.
  • 11. The method of claim 9, wherein the modified compiler to used to build the post object code.
  • 12. The method of claim 4 further comprising generating relocations for all functions to provide general object code independent of the memory location of functions and data structures thereby avoiding extraneous differences.
  • 13. A method of determining what computer program functions are changed by a source code modification to a computer program source code, comprising: compiling the computer program source code to generate first object code including a relocation entry for each program access to a program function; andcompiling the computer program source code resulting from modifying the computer program source code with the source code modification, to generate second object code including, in compiler output, a relocation entry for each program access to the corresponding program function; andconstructing a list of object code differences by comparing the first object code to the second object code.
  • 14. The method of claim 13 further comprising using a modified compiler configured to generate the relocation entry for each program access.
  • 15. The method of claim 13 further comprising generating, in the compiler output, a relocation entry for each program access to a program data item.
  • 16. The method of claim 13 further comprising excluding from the list any object code difference for which: the object code produced from step (a) contains a relocation entry at the position of the object code difference;the object code produced from step (b) contains a relocation entry at the position of the object code difference; andthe relocation entries produced from steps (a) and (b) are equivalent.
  • 17. A computer program product having a computer-readable medium including computer program logic encoded thereon that, when executed on a computer system provides a method of determining what computer program functions are changed by a source code modification, that causes the computer system to perform operations of: compiling computer program source code to produce a first portion of object code, using a compiler configured to generate, in compiler output, a relocation entry for each program access to at least one of a program function and a program data item;compiling the computer program source code resulting from modifying the computer program source code with the source code modification to produce a second portion of object code, using a compiler configured to generate, in compiler output, a relocation entry for each program access to the corresponding at least one of the program function and the program data item; andconstructing a list of object code differences by comparing the first portion of object code to the second portion of object code, and excluding from the list any object code difference for which: the first portion of object code includes a relocation entry at the position of the object code difference;the second portion of object code includes a relocation entry at the position of the object code difference; andthe relocation entries are equivalent.