The example system 100 comprises an instruction receiver 102, a pseudo critical section end notification (pCSend) creator 104, a critical section end (CSend) notification hoister 106, a correctness verifier 108, a pCSend remover 110, and an instruction emitter 112. An example method to implement the example system 100 is illustrated in
In the illustrated example, the instruction receiver 102 receives a set of instructions that are to be optimized and passes them to the pCSend creator 104. The example instruction receiver 102 receives the set of instructions from a compiler that is associated with the example system 100. However, the instruction receiver 102 may receive the set of instructions from any source such as, for example, a file, a user input, data stored in a memory, etc. Before the set of instructions is received by the example instruction receiver 102, other operations and/or optimizations may be applied to the instructions such as, for example, loop optimizations. The instruction receiver 102 may not be included in an implementation of the example system 100 in which the methods and apparatus disclosed herein are integrated with a compiler.
The example pCSend creator 104 iterates through the set of instructions received from the instruction receiver 102, inserts a pCSend instruction in the set of instructions, and passes the set of instructions to the CSend hoister 106. In the illustrated example, the pCSend instruction is inserted in the set of instructions on a line after the CSend instruction. Alternatively, the pCSend instruction may be inserted on a line before the CSend instruction. The inserted pCSend instruction can be used to prevent instructions in the critical section from being moved to a location outside of the critical section. In addition, the pCSend instruction can prevent other instructions from being moved into the critical section as the CSend instruction is hoisted by the CSend hoister 106. Also, the pCSend instruction can be used by the correctness verifier 108 to indicate the original location of the CSend instruction. The pCSend instruction may not be needed in all implementations. For example, the original location of CSend instructions may be stored as a variable in memory. Additionally, a pCSend instruction may not be inserted at all instances of CSend instructions. For example, pCSend instructions may be inserted based on the performance of the instructions on traces. Any iteration algorithm may be used to insert pCSend instructions.
The example CSend hoister 106 moves CSend instructions to earlier locations (i.e., modifies the execution order) in the set of instructions and passes the set of instructions to the correctness verifier 108. Modifying the execution order to cause the CSend instructions to execute earlier may compensate for notification delays that cause delays between the executions of critical sections of threads. The CSend hoister 106 of the illustrated example moves the CSend instruction to the earliest location in the critical section that is not earlier than an instruction that may invoke a context switch (e.g., wait instructions, context switch requests, blocking instructions, stalling instructions, etc.). An example method to implement the CSend hoister 106 is illustrated in
The example correctness verifier 108 iterates through the CSend instructions in the set of instructions received from the CSend hoister 106 and reverses the hoisting of the CSend instruction where the hoisting has altered the logic of the set of instructions. An example method for implementing the correctness verifier 108 of the illustrated example is illustrated in
The example pCSend remover 110 removes any pCSend instructions that remain after the correctness verifier 108 has verified the set of instructions. The pCSend remover 110 may not be included in the example system 100 if the pCSend instructions are merely used as placeholders, which will be ignored during execution of the set of instructions. The modified set of instructions is passed to the instruction emitter 112.
The instruction emitter 112 of the illustrated example emits the set of instructions following the optimization performed by the example system 100. For example, the instruction emitter 112 may output the set of instructions as machine code, as the same type of instructions as the set of instructions, as instructions of a type that is different than the set of instructions, etc. The instruction emitter 112 may be integrated with the code of a compiler in which the example system 100 is integrated.
Having described the architecture of an example system that may be used to optimize computer instructions, various processes are described in
The example process begins when the instruction receiver 102 receives a set of instructions to be optimized (block 302). The pCSend creator 104 then inserts a pCSend instruction on the line after each CSend instruction in the set of instructions (block 304). Then, the CSend hoister 106 moves each of the CSend instructions in the set of instructions to earlier locations in the execution order of the set of instructions (block 306). An example method for hoisting the CSend instructions is illustrated in
Next, the correctness verifier 108 verifies the correctness of the movement of each CSend instruction in the set of instructions and corrects any errors (block 308). An example method for verifying the correctness of the movement of the CSend instructions is illustrated in
The example process 306 first determines whether the instruction prior to the current CSend instruction is a context switching instruction (e.g., wait instructions, context switch requests, blocking instructions, stalling instructions, etc.) (block 402). If the instruction prior to the current CSend instruction is a context switching instruction, process 306 is complete and control proceeds to block 308 of
If the instruction prior to the current CSend instruction is not a context switching instruction, the CSend instruction is moved to the line before to the prior instruction (block 404). The control proceeds to block 402 to analyze the instruction that is prior to the new location of the CSend.
In the flow diagram 502, nodes 1, 2, and 3 include CSend instructions that have been hoisted by the CSend hoister 106. Nodes 4, 5, and 6 include pCSend instructions that have been added by the pCSend creator 104. Node 7 includes a context switching instruction (CTX_SWT).
The pseudo_set[c] in the equation set 504 includes the nodes of all of the instructions corresponding to the critical section to be analyzed. The GEN[i] set includes all of the nodes that give the definitions generated by node i where node i includes a pCSend instruction. The KILL[i] set includes all of the nodes that change the definitions of node i where node i includes a CSend instruction and every j is in the pseudo_set[c] set. The IN[i] set includes all of the nodes that have definitions that exist at the start of node i. The OUT[i] set includes all of the nodes that have definitions that reach the end of node i.
The table 506 illustrates the results of applying the equation set 504 to the flow diagram 502. Using the table it can be found that the mappings of CSend to pCSend are: node 3 to node 6, node 2 to node 5, node 1 to node 4, and node 1 to node 5 based on the OUT[i] sets. Accordingly, the set of CSend to pCSend includes node 1, node 2, node 3, node 4, node 5, and node 6. Based on the set of CSend to pCSend, an initialized partition set is {{1} {2} {3} {4} {5} {6}}. The following table shows the effect of each of the CSend to pCSend mappings on the partition set:
Accordingly, the completed partition set is {{1 2 4 5} {3 6}}. The table 506 indicates that node 4 includes the pCSend related to node 7 (i.e., because IN[7]={(1,4)}). The partition set including node 4 and corresponding to node 7 is {1 2 4 5). This partition set is used by the process illustrated in
Process 308 begins by locating the first node in the IN[i] set corresponding to the CSend instruction that is to be verified (block 602). Then, the correctness verifier 108 determines if the located node is in one of the sets of the partition set corresponding to the instructions (e.g., the partition set determined for the flow diagram 502 of
If the located node is in one of the sets (referred to as set s) of the partition set (block 604), the correctness verifier 108 locates the first node in the partition set (referred to as the partition set node) (block 606). The correctness verifier 108 then determines if the partition set node includes a pCSend instruction (block 610). If the partition set node does not include a pCSend instruction, control proceeds to block 618.
If the partition set node includes a pCSend instruction (block 610), the correctness verifier 108 replaces the pCSend instruction with a CSend instruction. Control then proceeds to block 614.
If the partition set node does not include a pCSend instruction (block 610), the correctness verifier 108 determines if the partition set node includes a CSend instruction (block 618). If the partition set node does not include a CSend instruction, control proceeds to block 612. If the partition set node includes a CSend instruction, the correctness verifier 108 removes the CSend instruction (block 620). Control then proceeds to block 612.
After replacing the pCSend instruction with a CSend instruction (block 610), determining that the partition set node does not include a CSend (block 618), or removing the CSend instruction (block 620), the correctness verifier 108 removes set s from the partition set (block 612).
The correctness verifier 108 then determines if all partition set nodes have been processed (block 614). If there are further partition set nodes to be processed, the correctness verifier 108 locates the next partition set node (block 622) and control proceeds to block 608 to process the next node.
If there are no further partition set nodes to be processed, the correctness verifier 108 determines if there are further nodes in the IN[i] set to be processed. If there are further nodes in the IN[i] set to be processed, the correctness verifier 108 locates the next node in the IN[i] set (block 624) and control proceeds to block 604 to process the next node. If there are no further nodes in the IN[i] set to process, control proceeds to block 310 of
The system 800 of the instant example includes a processor 812 such as a general purpose programmable processor. The processor 812 includes a local memory 814, and executes coded instructions 816 present in random access memory 818, coded instruction 817 present in the read only memory 820, and/or instructions present in another memory device. The processor 812 may execute, among other things, machine readable instructions that implement the processes illustrated in
The processor 812 is in communication with a main memory including a volatile memory 818 and a non-volatile memory 820 via a bus 825. The volatile memory 818 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM) and/or any other type of random access memory device. The non-volatile memory 820 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 818, 820 is typically controlled by a memory controller (not shown) in a conventional manner.
The computer 800 also includes a conventional interface circuit 824. The interface circuit 824 may be implemented by any type of well known interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a third generation input/output (3GIO) interface.
One or more input devices 826 are connected to the interface circuit 824. The input device(s) 826 permit a user to enter data and commands into the processor 812. The input device(s) can be implemented by, for example, a keyboard, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 828 are also connected to the interface circuit 824. The output devices 828 can be implemented, for example, by display devices (e.g., a liquid crystal display, a cathode ray tube display (CRT), a printer and/or speakers). The interface circuit 824, thus, typically includes a graphics driver card.
The interface circuit 824 also includes a communication device such as a modem or network interface card to facilitate exchange of data with external computers via a network (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
The computer 800 also includes one or more mass storage devices 830 for storing software and data. Examples of such mass storage devices 830 include floppy disk drives, hard drive disks, compact disk drives and digital versatile disk (DVD) drives.
As an alternative to implementing the methods and/or apparatus described herein in a system such as the device of
Although certain example methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN06/02006 | Aug 2006 | US |
Child | 11537931 | US |