This invention relates in general to the coding of instructions to be executed in a computer or microprocessor having instructions of variable length. More particularly, the present invention is directed to a method for embedding rarely executed code sequences into code sequences which are frequently executed without concomitantly introducing longer execution times.
Computer programs usually have sequences for rare (cold) code that are executed under exceptional conditions. Sometimes these sequences of rare code occur in close vicinity of hot (frequently executed) code. The existence of this code in the vicinity of hot code requires a compiler, interpreter, assembler or programmer to branch around the rare sequence in the usual case. The branch-around causes a performance overhead on the frequently executed path. Alternatively, the compiler or programmer has an option to generate the rare code sequence in an out-of-line code sequence (outlining). This avoids the performance overhead but it adds complexity to the code and/or to the compiler, especially when the rare code sequences are small.
The present invention is applicable to machines which have instructions of variable lengths. The invention uses the details of binary encoding of larger instructions to embed a sma11, rare code sequence within (a sequence of) larger (that is, longer length) instructions. The larger instructions are intelligently chosen to have no impact on the correct execution of the program, and thus they effectively operate as null operations or No-Ops (NOPs). They are chosen to be fast instructions that do not significantly impact the hot code path. In the rare case, when the rare code sequence needs to be executed, it is made reachable by branching into the middle of the larger instruction(s). This allows one to avoid the performance overhead of having to include branch-around instructions and also to avoid the complexity of outlining.
Thus, in accordance with the present invention, there is provided a method, system and program product for structuring instructions in a stored program computer having instructions of variable length. The invention includes the step of encoding an instruction executed on an exceptional basis that actually lies within one or more fields of a second instruction whose execution is substantially unaffected by coding present in this field. In essence, the present invention creates a form of computer instruction which has dual characteristics depending upon the point at which it is entered. Put another way, it is two instructions in one.
The advantages of the present invention are best realized when the exceptional condition being handled is less frequently encountered. However, it is noted that there are entire classes of instructions which are apt to produce exceptional conditions which need to be handled. These certainly include the arithmetic, logical and shifting operations, but there are many other types and groupings of instructions that also exhibit this characteristic. These include instructions that provide system administration functions, so-called “atomic instructions” such as “compare and swap,” and string instructions. The present invention is applicable to all such instructions and, in general, is applicable for use with any instruction that exhibits a need for exceptional condition handling.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.
The recitation herein of a list of desirable objects which are met by various embodiments of the present invention is not meant to imply or suggest that any or all of these objects are present as essential features, either individually or collectively, in the most general embodiment of the present invention or in any of its more specific embodiments.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of practice, together with the further objects and advantages thereof, may best be understood by reference to the following description taken in connection with the accompanying drawings in which:
The following Intel A32 architecture code sequence is an example of code which includes a small sequence of rare code in a hot path. The programmer/compiler has to branch around the rare code sequence most of the time:
The code above and its concomitant limitations are exemplified in
In the usual approach, as exemplified in
The present approach is to implement the above code as follows:
The idea is to use a larger instruction (test in this case) to embed the rare sequence of code. It is noted that the binary encoding of the instruction “or eax, 3” results in the machine code “83 C8 03.” We observe that the binary encoding of the “test” instruction places the 4-byte immediate field at the end of the sequence. We embed this machine code directly inside the immediate field of the instruction. By branching to just the right location inside the “test” instruction it is possible to execute the “or” instruction in the rare cases that it is needed.
The test instruction does not modify any machine state except for the FLAGS register. This technique is used in all places where the FLAGS register is not “live.” It is observed that the FLAGS register on IA32 microprocessors rarely “hold live” across multiple instructions. Accordingly, it is seen that this method is applicable in almost all scenarios. In other words, the “test” instruction is effectively a No-Op at this point in the program because it does not have an impact in observable program state. Also it executes sufficiently fast to make this solution preferable to branching-around.
The improved code structure is illustrated in
However, importantly for the present invention the code sequence includes instruction 155 which is typically a longer length instruction which includes an immediate field or some other field whose presence is controllably irrelevant to the instruction portion shown in “op code” portion 156. Thus, the leftmost three portions of instruction 155 are employed to store the bit representation of an exception handling instruction. Instruction 155 is also chosen not only to have a field which is ignorable, it is also selected to be an instruction which executes relatively quickly. The code sequence provided above are exemplars of this criteria.
It is possible to use other large instructions that only modify processor state, for example general purpose registers whose contents are never read before being set on all paths reachable from that instruction For example:
The “lea edi, [immediate]” instruction can execute a bit faster than the “test” instruction. However, it destroys the target register (edi in the example above). Accordingly, the method of the present invention can also be employed in circumstances in which there is a register available that does not hold a live value.
This method of the present invention is also applicable in other architectures that support variable instruction lengths such as 390. The principle requirement for the applicability of the present invention is that the architecture support variable length instructions with a longer length instruction being present that includes an “immediate” field or any other field where an arbitrary binary value may be used without causing the instruction to change machine state in some way observable by the program or any field whose presence does not affect the performance or actions of the instruction typically as specified by its “opcode” portion. It is also noted that the present invention does not require that the embedded code which is executed via a jump to it to be embedded in a single field of the dual use instruction. Multiple and overlapping fields are also usable. It is also noted that the present invention may be practiced automatically as with a compiler, an emulator or other similar program that generates sequences of machine instructions. Clearly, in the practice of the present invention also contemplates eventual execution of the encoded instruction, no matter how it may come to be encoded. The encoding of more than one such instruction is also contemplated.
The present invention operates in a data processing environment which effectively includes one or more of the computer elements shown in
While the invention has been described in detail herein in accordance with certain preferred embodiments thereof, many modifications and changes therein may be effected by those skilled in the art. Accordingly, it is intended by the appended claims to cover all such modifications and changes as fall within the true spirit and scope of the invention.