1. Field of the Invention
The present invention relates generally to data processing and, in particular, to compiling computer instructions in a data processing system. Still more particularly, the present invention relates to optimizing application code to line up frequent blocks based on profile directed feedback while maintaining code locality.
2. Description of the Related Art
When an application is compiled, the instructions may be divided into basic blocks. A basic block is a series of instructions that ends with a conditional branch or an unconditional branch. Because a basic block ends in a branch, the series of instructions within the block executes successively. At the end of a basic block, execution may transfer to the very first instruction of the same basic block, transfer to an earlier basic block, or proceed to a succeeding basic block.
When the compiled code is executed, one or more basic blocks may be fetched into instruction cache to improve runtime performance. Code straightening or code positioning is a compiler optimization technique for reordering the position of the procedures inside a program or the position of the basic blocks inside a procedure to reduce cache miss ratio of the instruction cache and to better utilize hardware branch prediction mechanisms of modern processors, thus improving the runtime performance of the program or application code.
Known code straightening methods reorder basic blocks solely based on execution frequency. These known methods place the most frequently executed blocks together to avoid cache misses. Often, infrequent blocks are placed at the end of the critical path. Also, an important drawback of some known methods is that even the critical path may not be ordered by execution order. For example, a successor of a block may be placed before the block itself.
The present invention recognizes the disadvantages of the prior art and provides a region-based method for optimizing application code. A compiler creates a control flow graph for a procedure. The control flow graph represents the procedure and flow of control between instruction blocks of the procedure and wherein the control flow graph includes profile information for the instruction blocks. A region based code straightening mechanism in the compiler performs a depth-first search of the control flow graph to form an ordered list of instruction blocks. The region based code straightening mechanism moves at least one instruction block closer to its predecessor, wherein the region based code straightening generates a final list of instruction blocks.
The novel features of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the figures and in particular with reference to
Computer 100 can be implemented using any suitable computer, such as an IBM eServer™ computer or IntelliStation® computer, which are products of International Business Machines Corporation, located in Armonk, N.Y. Although the depicted representation shows a computer, other embodiments of the present invention may be implemented in other types of data processing systems, such as a network computer. Computer 100 also preferably includes a graphical user interface (GUI) that may be implemented by means of systems software residing in computer readable media in operation within computer 100.
With reference now to
In the depicted example, local area network (LAN) adapter 212, audio adapter 216, keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224, hard disk drive (HDD) 226, CD-ROM driver 230, universal serial bus (USB) ports and other communications ports 232, and PCI/PCIe devices 234 may be connected to ICH 210. PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, PC cards for notebook computers, etc. PCI uses a cardbus controller, while PCIe does not. ROM 224 may be, for example, a flash binary input/output system (BIOS). Hard disk drive 226 and CD-ROM drive 230 may use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface. A super I/O (SIO) device 236 may be connected to ICH 210.
An operating system runs on processor 202 and is used to coordinate and provide control of various components within data processing system 200 in
In accordance with exemplary aspects of the present invention, a compiler converts source code into machine instructions for execution on a computer, such as data processing system 200. In general, the compiler can do many transformations (or optimizations) based on user specified optimization level to reduce the code size and to generate better code for application program. Usually, the generated code can be executed much faster than the code without such transformation. One of the transformations may be code straightening. Code straightening reorders the position of the procedures inside a program or the position of the basic blocks inside a procedure to improve the performance of the application code by reducing the instruction cache miss ratio and better utilizing the hardware instruction fetch mechanism and branch prediction mechanisms of modern processors. According to an exemplary embodiment of the present invention, the compiler performs code straightening based on profile directed feedback to line up most frequently executed instructions together while maintaining code locality.
Those of ordinary skill in the art will appreciate that the hardware in
As stated above, in accordance with exemplary aspects of the present invention, a compiler performs region-based code straightening to line up frequently executed basic blocks together based on profile directed feedback. The region-based code straightening lines up the basic blocks in order of execution sequence, but does not move infrequently executed basic blocks too far away from predecessors. As such, code locality is maintained, thus reducing instruction cache misses.
A system for optimizing an application includes a compiler. A compiler does a lot of optimizations and it transforms high-level user programs into machine instructions. A compiler builds control flow graph for each procedure in the application code.
The control flow graph is constructed by examining the instructions of a procedure and creating a node for each basic block and adding edges between the nodes to represent flow of control transfers introduced by branch instructions. The compiler may instrument a procedure by providing counters that count the number of times each basic block is executed. This information would be gathered in the first pass of compilation. The information contained by the counters is referred to as profile directed feedback or profile information.
The profile information can be used by the compiler by compiling the application code twice. At the first pass compilation, the compiler inserts some instrumentation code to the application code, generates the instrumented version of the application code. Then, a user or compiler (for a dynamic compiler like a just in time (JIT) compiler) runs the instrumented version of application code with some training data (some small input data that represents a typical work load of the application). The instrumented code generates the profile information and stores the profile information somewhere (e.g., in a file). Thereafter, at the second pass compilation, the compiler makes use of the generated profile information to guide other optimizations. The region based code straightening of the present invention also requires the profile information.
More particularly, at the second pass compilation, the code straightening mechanism of the compiler reorders or repositions those basic blocks according to the execution frequency and execution sequence. The code straightening mechanism of the compiler first performs a depth-first search of the control flow graph. The search starts from the entry of the control flow graph, block Begin in the example of
If a block has multiple successors, such as when a block ends in a conditional branch, like block A in
In the example shown in
The depth-first search generates a list. All the frequently executed blocks are lined up according to execution sequence. This is referred to as the critical path. All the infrequently executed blocks are moved to the position after the critical path.
As shown in
In accordance with exemplary aspects of the present invention, the code straightening mechanism then generates a final list based on the depth-first list. A block is appended to the final list if the immediate containing region of the block to be appended is the same immediate containing region as the previous block. A region is the context of a loop. Loops may be nested; therefore, a region may have a sub-region. If a block is not in the same immediate region as a preceding block, the block may be in the same region on a higher level.
If the last appended block in the final list ends a region and the next block N in the depth-first list starts a new region, and the new region is not contained by the region of last appended block, then any infrequently executed blocks that are contained by the previous region are inserted at this point under certain circumstances (e.g.: if the infrequently executed block is more frequently executed than block N). For instance, the code straightening mechanism may determine whether the profile directed feedback block counter of the infrequently executed block is greater than the counter of block N by a predetermined threshold or a predetermined factor. For instance, the infrequently executed block may be placed before block N if the infrequently executed block executes twice as frequently as block N.
All predecessors of the concerned infrequently executed block may also be placed before the successor. The blocks will be inserted in the order of predecessor comes before successor whenever possible. This preserves locality with respect to the infrequently executed block and reduces the likelihood of cache misses by keeping the order of instructions in consideration.
The code straightening mechanism then uses the final list to change the layout of the control flow graph. The code straightening mechanism inserts unconditional branches wherever necessary. For example, if the only successor of a block is not the next block in the control flow graph, or the flow through block of a block ended with a conditional branch is not the next block in the control flow graph, an unconditional branch will be inserted. In the example shown in
The examples shown in
More particularly, in
In step C, a user or compiler runs instrumented application 714 with some training data to generate profile information 716. Thereafter, in step D, compiler 710 takes profile information 716 as an additional input and recompiles source application 712 (second pass compilation). During the second pass compilation, compiler 710 may perform additional optimizations, as well as region based code straightening. Compiler 710 generates compiled application 718, which is reordered to maintain locality and, thus, reduce instruction cache misses.
It will be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the processor or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks. These computer program instructions may also be stored in a computer-readable memory or storage medium that can direct a processor or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory or storage medium produce an article of manufacture including instruction means which implement the functions specified in the flowchart block or blocks.
Accordingly, blocks of the flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and computer usable program code for performing the specified functions. It will also be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or by combinations of special purpose hardware and computer instructions.
With particular reference to
During first pass compilation, the compiler compiles the application and inserts instrumentation code into the compiled application (block 802) and generates instrumented application code (block 804). Then, the compiler or user runs the instrumented application with some training data (block 806). The instrumented application gathers profile information and stores the profile information (block 808).
Next, the user or compiler invokes the compiler again to perform a second pass compilation. The compiler makes use of the profile information to guide optimizations (block 810). In a typical case, in the second pass compilation, the compiler may perform many more optimizations than in the first pass. In accordance with exemplary aspects of the present invention, one of the optimizations the compiler performs is region based code straightening (block 812). After all the optimizations are performed, the compiler may perform still more optimizations (block 814). Thereafter, the compiler generates machine executable application code (block 816) and operation ends.
With reference now to
If A is in the same immediate containing region, then the code straightening mechanism appends A to the final list (block 912). The code straightening mechanism determines whether block A is the end of the depth-first list (block 914). If A is the last block in the depth-first list, then operation ends. If, however, A is not the last block in the depth-first list in block 914, then operation transfers to block 908 to consider the next basic block in the depth-first list.
If A is not in region R in block 910, then the code straightening mechanism determines whether block A starts a new region and the new region is not contained in R (block 916). If A does not start a new region that is not contained in R, then operation proceeds to block 912 to append A to the final list. However, if A does start a new region that is not contained in R in block 916, then the code straightening mechanism determines whether there is any infrequently executed block (B) in R and B is not yet appended into the final list (block 918). If there is not any infrequently executed block (B) in R or there is a B in R, but B is already appended into the final list in block 918, then operation proceeds to block 912 to append A to the final list.
If there is any infrequently executed block (B) in R and B is not yet appended into final list, then the region based code straightening mechanism determines if: (1) all predecessors of B are appended to the final list and B is more frequently executed than A (block 920), or (2) all predecessors and successors of B are appended to the final list and B is executed at least once (block 922). If block B does not satisfy either of the above two conditions, then operation proceeds to block 912 to append A to final list.
If block B satisfies at least one of the two conditions at block 920 and block 922, the code straightening mechanism appends B to the final list before block A (block 924). Thereafter, operation returns to block 918 to determine whether any other infrequently executed block in R is not yet processed. Blocks 918-924 repeat until all infrequently executed blocks in R are processed.
Thus, the exemplary aspects of the present invention solve the disadvantages of the prior art by providing a region-based code straightening mechanism to line up frequently executed basic blocks together based on profile directed feedback. The region-based code straightening lines up the basic blocks in order of execution sequence, but does not move infrequently executed basic blocks too far away from its predecessors. As such, code locality is maintained, thus reducing instruction cache misses.
The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.