An embodiment of the invention is directed to the generation and optimization of computer programming code. Other embodiments are also described.
When an application program is launched or run in a computer, the computer is executing what is referred to as a binary image (or simply, binary) of the program. That is not, however, the version in which the program was originally created by its author. Due to the inherent design and complexity of a computer, programs are written using a higher level programming language that is more readily understandable to a human programmer. A program is initially written in what is called a source programming language (resulting in source code or a source file). It is then translated down into the binary image version (also referred to as the executable or executable file) before being loaded into the computer's memory for execution. Software programs or tools, referred to collectively here as code generators, are used by the programmer to perform this translation. A code generator is selected that is able to translate a particular source file into an executable file that is to be run on a given computer hardware platform (e.g., one that is based on a Pentium® processor by Intel Corp., Santa Clara, Calif.).
A code generator may have the following components. A compiler translates one or more input source files that are written in a high level language (e.g., C; C++; Fortran; Pascal; Basic; as well as others) into object code or object files which are in a low level language referred to as machine language. Next, a linker joins the object files, together with library object files that have been previously compiled, into a binary image (the executable file). The binary may then be loaded into the main memory of the computer and executed by one or more of its processors.
Modern integrated circuit technologies used in advanced computer components are being adopted at a rapid pace. Advances are being rapidly made in computer platform architectures, such as one based on a Pentium® processor, and new hardware components are being designed and manufactured that allow the same platform to be applied to different fields. These include, for example, personal computer (PC) desktops, laptops, home entertainment PCs, servers, home appliances, dedicated video game machines, and mobile held-held devices such as cellular telephones and multifunction personal digital assistants (PDAs). Different fields, however, present different requirements for the binaries that will be running on top of the hardware platform. For example, a program that is to run on a server is expected to have high performance while it is running, while programs that are for mobile devices may have more stringent code size as well as power consumption constraints. In other instances, a program is to be stored in non-volatile, solid state memory of the platform, which has even more stringent limits on storage space due to cost concerns. Such programs are sometimes referred to as firmware, and may need to be compressed, prior to being stored.
Current code generation tools, including compilers, linkers, and binary optimizers, provide optimization controls that can be selected by the user in an effort to generate code that has a higher performance, smaller code size, or lower power consumption. A binary optimizer, also sometimes referred to as a post-link optimizer, is a tool that is used to improve the performance of a program after it has been compiled and linked. The tool directly operates on the executable file and is thus said to rewrite the executable, in accordance with certain user specified optimization controls. Each of these tools may expose its own set of optimization controls to the user.
Current code generation tools, however, do not provide a systematic and automated approach to meet sophisticated code generation requirements. For example, the current tools do not allow the user to specify simultaneously both a code size optimization setting, i.e. one that is expected to reduce the size of the binary, and a performance optimization setting, i.e., one that is expected to increase the performance of the binary.
The embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” embodiment of the invention in this disclosure are not necessarily to the same embodiment, and they mean at least one.
Each of these input binaries 106 is generated with a different, code generator optimization setting, for the same processor instruction set architecture. The input binaries 106 may also be based on the same set of one or more source files (not shown). The binaries 106 may all be generated using the same code generator tool set, configured according to different optimization setting. The tool set may include components from different software vendors (e.g., a compiler and linker from one vendor, and a binary rewriter from another). Note that the input binaries 106 may be generated either manually by the user one at a time, or as described below automatically according to a script.
The evaluator 104 in this example is to measure the performance of each input binary. This may be done by having the binary be executed by a hardware platform that implements the processor instruction set architecture for which the binary has been generated. Alternatively, the evaluator 104 may include a software simulation tool, which simulates the hardware platform, including the processor and I/O device resources that are present in the actual hardware platform. The binary is thus executed on its intended hardware platform, either actually or though simulation, and its performance is measured. Performance may be measured by feeding the running binary a predefined set of inputs and measuring how fast the expected outputs are produced. The measured performance is then translated into the FOM 108. For example, faster execution of a particular task may translate to a lower FOM, while slower execution of the same task translates to a higher FOM. The “best” binary in that case would be the one with the lowest FOM 108.
The computed FOMs 108 are fed to a binary selector 110, which compares them and selects one of them as having the highest or lowest overall FOM 112. Although various ways of defining the FOMs and overall FOMs are possible, an easy to implement approach is to define each POM as being a positive integer. As a simple example, the overall FOM 112 may be the same as its corresponding FOM 108, that is,
overall FOM1=FOM1,
overall FOM2=FOM2, etc. (Equation 1)
In that case, binary selector 110 performs a straight ranking of the overall FOMs 112 and, in this example, determines that overall FOM3 (corresponding to binary 3) has the highest or lowest value. The binary selector 110 thus indicates to the user that binary 3 is the “best” of the four input binaries 106, from the standpoint of performance (being the measured characteristic). The combination of the evaluator 104 and binary selector 110 thus provide the user an automatic methodology for selecting the best binary image in a systematic manner. This general framework allows sophisticated code generation requirements to be evaluated, using multiple evaluators as described below.
The FOMs 120 are also fed to the binary selector 110 which compares the FOMs 120, while aiming at selecting the binary with the lowest or highest overall FOM. The “comparison” involving the FOMs 108 and FOMs 120 is broadly defined here, and may be implemented in several ways. As one example, an overall FOM is computed for each input binary 106, as a function of the FOM 108 and FOM 120. This may be a simple equation such as
overall FOM1=FOM1perf.+FOM1compr.size
overall FOM2=FOM2perf.+FOM2compr.size (Equation 2)
In yet another alternative, the comparison amongst the different FOMs may use the concept of a vector for each binary. For example,
overall FOM1=square_root(FOM12perf.+FOM12compr.size)
overall FOM2=square_root(FOM22perf.+FOM22compr.size) (Equation 3)
If the performance FOM is defined as above, namely, the greater the performance of a binary, the smaller its associated FOM 108, then the compressed size FOM should be defined so that the smaller the compressed file size of a binary, the smaller its associated FOM 120. This approach thus defines the “best” overall FOM as the one having the lowest value.
Note that in the comparisons described above involving two measured characteristics, the equations for overall FOM weight the performance and compressed size FOMs equally. As an alternative, the equation may specify different weights to the FOMs 108 and 120. In one scenario, the performance of a binary may be less important than its compressed file size. In that case, appropriate scaling factors may be included in the equation above, to de-emphasize the contribution from the performance FOM 108, and emphasize the contribution from the compressed size FOM 120. Other ways of defining the overall FOM, including non-linear relationships with the FOMs 108, 120, are possible. The system may also give the user the option to manually define the overall FOM (a “configurable” overall FOM), e.g. in view of a particular field of use.
This methodology may be extended to more than two evaluators. For example, in
In addition to performance, compressed file size, and power consumption, the measured characteristics may also include code size (the size of the binary, typically given in bytes), and its memory footprint (the size of the code and/or data portion of the binary once it is loaded and running on the intended hardware platform). Any two or more of such measured characteristics may be evaluated, by including two or more corresponding evaluators within the system. A system may be delivered that is custom designed with only two evaluators, whereas a fully featured system may have all five or more evaluators integrated in the same software tool. As yet another alternative, the system may be delivered with only a single evaluator. A system with multiple evaluators can be advantageously used for generating optimized binary images in more than one field of use.
Turning now to
The system in
Each optimization setting is different than another, and may be defined based on the user's knowledge of what each optimization setting is expected to accomplish in a general sense (in terms of the associated binary being more suitable for a given field of use). With the help of the evaluator and binary selector of
Turning now of
In another embodiment of the invention, the code generator used to produce the binaries 106 may include not just the compiler 204 and linker 206, but also the binary rewriter 304 (processing an output of the linker 206). Each optimization setting in that case includes an optimization control for the compiler, another for the linker, and another for the rewriter. If the code generator exposes more fine-grained optimization controls, then this will allow more explorations of the different optimization combinations to be made, making it possible to generate even better or more optimized code as evaluated by the evaluators.
Turning now to
In operation 408, the current overall FOM is compared with a prior overall FOM. The latter is an overall FOM that may have been previously computed and that is associated with a prior version of the binary. For the initial pass, there may be no prior computed FOM, such that operation 408 may be skipped.
After the comparison in operation 408, if there are further optimization settings to be evaluated (operation 412) then the process cycles, with another optimization setting. This time, operation 408 is performed since there is a prior overall POM available now. Note that in subsequent cycles, operation 408 may involve multiple comparisons between the current overall FOM and each of several prior overall FOMs that are stored.
Once the last optimization setting has been evaluated, the loop is exited at operation 412, and the process proceeds with either operation 414 and/or operation 416. In the former, the system indicates to the user which version of the input binaries has the highest or lowest (“best”) overall FOM, as determined from the comparisons that were made in the iterative process. Note that the system may be designed such that only the binary that has the highest or lowest overall FOM value at any given point in the iterative process is saved (thereby helping conserve memory resources).
In addition to, or as an alternative to, operation 414, there is operation 416 in which the system can display to the user a ranking of the different binary versions, in accordance with their overall FOMs, e.g. from highest to lowest. This embodiment of the invention allows the user to quickly determine how “far apart” the different versions of the binaries are from each other in view of the respective optimization settings used to generate them. Other ways of displaying the results of the comparison performed by the binary selector are possible.
The flow diagram of
Turning now to
The above described pseudo code thus provides the most optimized (lowest cost) code, by rewriting the binary multiple times (each time using a different optimization setting) in the inner loop, and then recompiling the source program and relinking the recompiled object files (outer loop).
Integrating the compiler, linker, and the binary rewriter in the manner described above brings additional capabilities for code optimization. Also, the system flexibility of bundling together several evaluators gives the general framework the ability to take additional factors into consideration when selecting the best binary image. The system also provides a framework to better study the correlation between a particular optimization control and the cost implications that are brought as a result into the binary image.
Turning now to
A machine-readable medium may include any mechanism for storing or transmitting information (such as any one or more of the software components described above) in a form readable by a machine (e.g., a computer), not limited to Compact Disc Read-Only Memory (CD-ROMs), Read-Only Memory (ROMs), Random Access Memory (RAM), Erasable Programmable Read-Only Memory (EPROM), and a transmission over the Internet.
The invention is not limited to the specific embodiments described above. For example, although shown to be in parallel the measurements of operations 404 and 406 may occur sequentially. In general, the order of the operations, as they are illustrated in
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2005/002384 | 12/30/2005 | WO | 00 | 5/25/2006 |