The present invention relates generally to computer program execution. More particularly, the present invention relates to improving program execution by manipulating program modules and by loading program modules in local storage of a processor based upon object modules.
Computing systems are becoming increasingly more complex, achieving higher processing speeds while at the same time shrinking component size and reducing manufacturing costs. Such advances are critical to the success of many applications, such as real-time, multimedia gaming and other computation-intensive applications. Often, computing systems incorporate multiple processors that operate in parallel (or in concert) to increase processing efficiency.
At a basic level, the processor or processors manipulate code and/or data (collectively “information”). Information is typically stored in a main memory. The main memory can be, for example, a dynamic random access memory (“DRAM”) chip that is physically separate from the chip containing the processor(s). When the main memory is physically or logically separate from the processor, there can be significant delays (“high latency”) that may be, for example, tens or hundreds of milliseconds in additional time required to access the information contained in the main memory. High latency adversely affects processing because the processor may have to idle or pause operation until the necessary information has been delivered from the main memory.
In order to address high latency problems, many computer systems implement cache memory. Cache memory is a temporary storage located between the processor and the main memory. Cache memory generally has small access latency (“low latency”) compared to the main memory, but has a much smaller storage size. When used, cache memory helps improve processor performance by temporarily storing data for repeated access. The effectiveness of cache memory relies on the locality of access. For example, using a “9 to 1” rule, where 90% of the time is spent accessing 10% of the data, retrieving even a small amount of data from main memory or external storage is not very effective since too much time is spent accessing that little amount of data. Thus, often-used data should be stored in the cache.
A conventional hardware cache system contains “cache lines” which are basic units of storage management. Cache lines are selected to be the optimal size of data transfer between the cache memory and the main memory. As is known in the art, cache systems operate with certain rules mapping the cache lines to the main memory. For instance, Cache “tags” are utilized showing which part(s) of the main memory is stored on the cache lines, and the status of that portion of main memory.
Another limitation besides memory access that can adversely affect program execution is memory size. The main memory may simply be too small to perform needed operations. In this case, “virtual memory” can be used to provide larger system address space than physically exists in main memory by utilizing external storage. However, external storage typically has much higher latency than main memory.
In order to implement virtual memory, it is common to utilize the memory management unit (“MMU”) of the processor, which can be a part of the CPU or a separate element. The MMU manages mapping of virtual addresses (the addresses used by the program software) to physical addresses in memory. The MMU can detect when an access is made to a virtual address that is not tied to a physical address. When this occurs, the virtual memory manager software is called. If the virtual address has been saved in external storage, it will be loaded into main memory and a mapping will be made for the virtual address.
In advanced processor architectures, particularly multiprocessor architectures, the individual processing units may have local memories, which can supplement the storage in main memory. The local memories are often high speed, but with limited storage capacity. There is no virtualization between the address used by software and the physical address of the local memory. This limits the amount of memory that a processing unit can use. While the processing unit may access main memory via a direct memory access (“DMA”) controller (“DMAC”) or other hardware, there is no hardware mechanism which links the local memory address space with the system address space.
Unfortunately, the high latency main memory still contributes to reduced processing efficiency, and for multiprocessor systems can create a serious bottleneck for performance. Therefore, a need exists for enhanced information handling to overcome such problems. The present invention addresses these and other problems, and is particularly suited to multiprocessor architectures with strict memory constraints.
In accordance with an embodiment of the present invention, a method of managing operations in a processing apparatus which has a local memory is provided. The method comprises determining if a program module is loaded in the local memory, the programming module being associated with a programming reference; loading the program module into the local memory if the program module is not loaded in the local memory; and obtaining information from the program module based upon the programming reference.
In one alternative, the information obtained from the program module comprises at least one of data and code. In another alternative, the program module comprises an object module loaded in the local memory from a main memory. In yet another alternative, the programming reference comprises a direct reference within the program module. In a further alternative, the programming reference comprises an indirect reference to a second program module.
In another alternative, the program module is a first program module and the method further comprises storing the first program module and a second program module in a main memory, wherein the loading step includes loading the first program module into the local memory from the main memory. In this case, the programming reference may comprise a direct reference within the first program module. Alternatively, the programming reference may comprise an indirect reference to the second program module. In this example, when the information is obtained from the second program module, the method preferably further comprises determining if the second program module is loaded in the local memory; loading the second program module into the local memory if the second program module is not loaded in the local memory; and providing the information to the first program module.
In accordance with another embodiment of the present invention, a method of managing operations in a processing apparatus which has a local memory is provided. The method comprises obtaining a first program module from a main memory; obtaining a second program module from the main memory; determining if a programming reference used by the first program module comprises an indirect reference to the second program module; and forming a new program module if the programming reference comprises the indirect reference, the new program module comprising at least a portion of the first program module so that the programming reference becomes a direct reference between portions of the new program module.
In one alternative, the method further comprises loading the new program module into the local memory. In another alternative the first and second program modules are loaded in the local memory before forming the new program module. In a further alternative, the first program module comprises a first code function, the second program module comprises a second code function, and the new program module is formed to include at least one of the first and second code functions. In this case, the first program module preferably further comprises a data group, and the new program module is formed to further include the data group.
In another alternative, the programming reference is an indirect reference to the second program module and the method further comprises determining a new programming reference for use by the new program module based on the programming reference used by the first program module; wherein the new program module is formed to comprise at least the portion of the first program module and at least a portion of the second program module so that the new programming reference is a direct reference within the new program module.
In accordance with yet another embodiment of the present invention, a method of processing operations in a processing apparatus which has a local memory is provided. The method comprises executing a first program module loaded in the local memory; determining an insertion point for a second program module; loading the second program module in the local memory during execution of the first program module; determining an anticipated execution time to begin execution of the second program module; determining whether loading of the second program module is complete; and executing the second program module after execution of the first program module is terminated.
In one alternative, the method further comprises delaying execution of the second program module if loading is not complete. In this case, delaying execution desirably comprises performing one or more NOPs until loading is complete. In another alternative, the insertion point is determined statistically. In a further alternative, the validity of the insertion point is determined based on runtime conditions.
In accordance with another embodiment of the present invention, a processing system is provided. The processing system comprises a local memory capable of storing a program module; and a processor connected to the local memory. The processor includes logic to perform a management function comprising associating a programming reference with the program module, determining if the program module is currently loaded in the local memory, loading the program module into the local memory if the program module is not currently loaded in the local memory, and obtaining information from the program module based upon the programming reference. The local memory is preferably integrated with the processor.
In accordance with yet another embodiment of the present invention, a processing system is provided. The processing system comprises a local memory capable of storing program modules; and a processor connected to the local memory. The processor includes logic to perform a management function comprising storing first and second ones of the program modules in a main memory, loading a selected one the first and second program modules into the local memory from the main memory, associating a programming reference with the selected program module, and obtaining information based upon the programming reference. Preferably the main memory comprises an on-chip memory. More preferably, the main memory is integrated with the processor.
In accordance with a further embodiment of the present invention, a processing system is provided. The processing system comprises a local memory capable of storing program modules; and a processor connected to the local memory. The processor includes logic to perform a management function comprising obtaining a first program module from a main memory, obtaining a second program module from the main memory, determining a first programming reference for use by the first program module, forming a new program module comprising at least a portion of the first program module so that the first programming reference becomes a direct reference within the new program module, and loading the new program module into the local memory.
In accordance with another embodiment of the present invention, a processing system is provided. The processing system comprises a local memory capable of storing the program modules; and a processor connected to the local memory. The processor includes logic to perform a management function comprising determining an insertion point for a first program module, loading the first program module in the local memory during execution of a second program module by the processor, and executing the first program module after execution of the second program module is terminated and loading is complete.
In accordance with a further embodiment of the present invention, a storage medium storing a program for use by a processor is provided. The program cause the processor to: identify a program module associated with a programming reference; determine if the program module is currently loaded in a local memory associated with the processor; load the program module into the local memory if the program module is not currently loaded in the local memory; and obtain information from the program module based upon the programming reference.
In accordance with another embodiment of the present invention, a storage medium storing a program for use by a processor is provided. The program causes the processor to: store first and second program modules in a main memory; load the first program module into a local memory associated with the processor from the main memory, the first program module being associated with a programming reference; and obtain information based upon the programming reference.
In accordance with yet another embodiment of the present invention, a storage medium storing a program for use by a processor is provided. The program causes the processor to obtain a first program module from a main memory; obtain a second program module from the main memory; determine if a programming reference used by the first program module comprises an indirect reference to the second program module; and form a new program module if the programming reference comprises the indirect reference, the new program module comprising at least a portion of the first program module so that the programming reference becomes a direct reference between portions of the new program module.
In accordance with a further embodiment of the present invention, a storage medium storing a program for use by a processor is provided. The program causes the processor to execute a first program module loaded in a local memory associated with the processor; determine an insertion point for a second program module; load the second program module in the local memory during execution of the first program module; determine an anticipated execution time to begin execution of the second program module; determine whether loading of the second program module is complete; and execute the second program module after execution of the first program module is terminated.
In accordance with another embodiment of the present invention, a processing system is provided. The processing system comprises a processing element including a bus, a processing unit and at least one sub-processing unit connected to the processing unit by the bus. At least one of the processing unit and the at least one sub-processing units are operable to determine whether a programming reference belongs to a first program module, to load the first program module into a local memory, and to obtain information from the first program module based upon the programming reference.
In accordance with yet another embodiment of the present invention, a computer processing system is provided. The computer processing system comprises a user input device; a display interface for attachment of a display device; a local memory capable of storing program modules; and a processor connected to the local memory. The processor comprises one or more processing elements. At least one of the processor elements includes logic to perform a management function comprising determining whether a programming reference belongs to a first program module, loading the first program module into the local memory, and obtaining information from the first program module based upon the programming reference.
In accordance with yet another embodiment of the present invention, a computer network is provided. The computer network comprises a plurality of computer processing systems connected to one another via a communications network. Each of the computer processing systems comprises a user input device; a display interface for attachment of a display device; a local memory capable of storing program modules; and a processor connected to the local memory. The processor comprises one or more processing elements. At least one of the processor elements includes logic to perform a management function comprising determining whether a programming reference belongs to a first program module, loading the first program module into the local memory, and obtaining information from the first program module based upon the programming reference. Preferably, at least one of the computer processing systems comprises a gaming unit capable of processing multimedia gaming applications.
In accordance with a further embodiment of the present invention, a processing system comprising a compiler implemented on a processing device is provided. The processing device is operatively coupled to a local memory that is capable of storing program modules having code and data. The compiler is operable to perform a management function comprising analyzing the code and data of a given program module to determine references between code functions and groups of the data of the given program module, including identifying a number of original external references and a number of original internal references of the given program module; and repartitioning the program module to produce one or more new program modules based on the analysis so that a number of new external references of the one or more new program modules is less than the number of original external references and a number of new internal references of the one or more new program modules is greater than the number of original internal references; wherein the compiler selects the one or more new program modules as a best fit combination based on at least one of a size of the local memory and a data transfer size.
In one alternative, the compiler determines at least one insertion point for the one or more new program modules to be inserted into a program. In another alternative, the one or more new program modules are further selected as the best fit combination by the compiler based on alignment. In a further alternative, the management function of the compiler further identifies a plurality potential module groupings, compares the potential groupings against one another to maximize the number of new internal references and to minimize the number of new external references, and selects a best fit from the plurality of potential module groupings to obtain a best fit combination.
In another alternative, the compiler utilizes the analysis to preload or unload the one or more new program modules in the local memory prior to execution. In one example, the compiler preloads a selected one of the one or more new program modules in the local memory if there is at least about a 75% probability that the selected program module is to be used. In another example, the compiler minimizes the preloading or unloading for the one or more new program modules which are loaded at runtime. In a further example, the compiler chooses arbitrary load locations for selected ones of the one or more new program modules which do not have further calls.
In a further alternative, the compiler assigns at least one weighting factor to quantify the best fit combination. In this case, the at least one weighting factor may include weighting functional references with frequencies of calls, a number of times a given new program module is called and the size of the given new program module. Alternatively, the compiler reduces or sets the weighting factor to zero for a call in a given new program module if the call is a local reference.
And in another alternative, the compiler repartitions the program module into the one or more new program modules so that caller and callee modules fit into the local memory together.
In describing the preferred embodiments of the invention illustrated in the appended drawings, specific terminology will be used for the sake of clarity. However, the invention is not intended to be limited to the specific terms used, and it is to be understood that each specific term includes all technical equivalents that operate in a similar manner to accomplish a similar purpose.
Reference is now made to
PE 100 can be constructed using various methods for implementing digital logic. PE 100 preferably is constructed, however, as a single integrated circuit employing CMOS on a silicon substrate. PE 100 is closely associated with a memory 130 through a high bandwidth memory connection 122. The memory 130 desirably functions as the main memory for PE 100. In certain implementations, the memory 130 may be embedded in or otherwise integrated as part of the processor chip incorporating the PE 100, as opposed to being a separate, external “off chip” memory. For instance, the memory 130 can be in a separate location on the chip or can be integrated with one or more of the processors that comprise the PE 100. Although the memory 130 is preferably a DRAM, the memory 130 could be implemented using other means, such as a static random access memory (“SRAM”), a magnetic random access memory (“MRAM”), an optical memory, a holographic memory, etc. DMAC 106 and memory interface 110 facilitate the transfer of data between the memory 130 and the SPUs 108 and PU 104 of the PE 100.
PU 104 can be, for instance, a standard processor capable of stand-alone processing of data and applications. In operation, the PU 104 schedules and orchestrates the processing of data and applications by the SPUs 108. In an alternative configuration, the PE 100 may include multiple PUs 104. Each of the PUs 104 may control one, all, or some designated group of the SPUs 108. The SPUs 108 are preferably single instruction, multiple data (“SIMD”) processors. Under the control of PU 104, the SPUs 108 may perform the processing of the data and applications in a parallel and independent manner. DMAC 106 controls accesses by PU 104 and the SPUs 108 to the data and applications stored in the shared memory 130. Preferably, a number of PEs, such as PE 100, may be joined or packed together, or otherwise logically associated with one another, to provide enhanced processing power.
The PEs 200 are preferably tied to a shared bus 202. A memory controller or DMAC 206 may be connected to the shared bus 202 through a memory bus 204. The DMAC 206 connects to a memory 208, which may be of one of the types discussed above with regard to memory 130. In certain implementations, the memory 208 may be embedded in or otherwise integrated as part of the processor chip incorporating one or more of the PEs 200, as opposed to being a separate, external off chip memory. For instance, the memory 208 can be in a separate location on the chip or can be integrated with one or more of the PEs 200. An I/O controller 212 may also be connected to the shared bus 202 through an I/O bus 210. The I/O controller 212 may connect to one or more I/O devices 214, such as frame buffers, disk drives, etc.
It should be understood that the above processing modules and architectures are merely exemplary, and the various aspects of the present invention may be employed with other structures, including, but not limited to multiprocessor systems of the types disclosed in U.S. Pat. No. 6,526,491, entitled “Memory Protection System and Method for Computer Architecture for Broadband Networks,” issued on Feb. 25, 2003, and U.S. application Ser. No. 09/816,004, entitled “Computer Architecture and Software Cells for Broadband Networks,” filed on Mar. 22, 2001, which are hereby expressly incorporated by reference herein.
SPU 300 preferably includes or is otherwise logically associated with local store (“LS”) 302, registers 304, one or more floating point units (“FPUs”) 306 and one or more integer units (“IUs”) 308. The components of SPU 300 are, in turn, comprised of subcomponents, as will be described below. Depending upon the processing power required, a greater or lesser number of FPUs 306 and IUs 308 may be employed. In a preferred embodiment, LS 302 contains at least 128 kilobytes of storage, and the capacity of registers 304 is 128×128 bits. FPUs 306 preferably operate at a speed of at least 32 billion floating point operations per second (32 GFLOPS), and IUs 308 preferably operate at a speed of at least 32 billion operations per second (32 GOPS).
LS 302 is preferably not a cache memory. Cache coherency support for the SPU 300 is unnecessary. Instead, the LS 302 is preferably constructed as an SRAM. A PU 104 may require cache coherency support for direct memory access initiated by the PU 104. Cache coherency support is not required, however, for direct memory access initiated by the SPU 300 or for accesses to and from external devices, for example, I/O device 214. LS 302 may be implemented as, for example, a physical memory associated with a particular SPU 300, a virtual memory region associated with the SPU 300, a combination of physical memory and virtual memory, or an equivalent hardware, software and/or firmware structure. If external to the SPU 300, the LS 302 may be coupled to the SPU 300 such as via a SPU-specific local bus or via a system bus such as the local PE bus 120.
SPU 300 further includes bus 310 for transmitting applications and data to and from the SPU 300 through a bus interface (Bus I/F) 312. In a preferred embodiment, bus 310 is 1,024 bits wide. SPU 300 further includes internal busses 314, 316 and 318. In a preferred embodiment, bus 314 has a width of 256 bits and provides communication between local store 302 and registers 304. Busses 316 and 318 provide communications between, respectively, registers 304 and FPUs 306, and registers 304 and IUs 308. In a preferred embodiment, the width of busses 316 and 318 from registers 304 to the FPUs 306 or IUs 308 is 384 bits, and the width of the busses 316 and 318 from the FPU 306 or IUs 308 to the registers 304 is 128 bits. The larger width of the busses from the registers 304 to the FPUs 306 and the IUs 308 accommodates the larger data flow from the registers 304 during processing. In one example, a maximum of three words are needed for each calculation. The result of each calculation, however, is normally only one word.
With the present invention, it is possible to overcome the lack of virtualization and other bottleneck issues between the local memory address space and the system address space. Because data loading and unloading in the LS 302 is desirably performed through software, it is possible to utilize the fact that the software can determine whether data and/or code should be loaded at a certain time or not. This is accomplished through the use of program modules. As used herein, the term “program module” includes, but is not limited to, any logical set of program resources allocated in a memory. By way of example only, a program module may comprise data and/or code, which can be grouped by any logical means, such as a compiler. A program or other computing operations may be implemented using one or more program modules.
Preferably, programs are loaded into the LS 302 per program module. More preferably, programs are loaded into the LS 302 per object module. As seen in
If program modules are implemented using object modules formed during compilation, how the object modules are structured can impact the effectiveness of the storage management process. For example, if the data for a code function is not properly associated with that code function, this could create a processing bottleneck. Thus, one should be cautious when separating programs and/or data into multiple source files.
This problem can be avoided by analyzing the program, including the code and data (if any). In one alternative, the code and/or data are preferably divided into separate modules. In another alternative, the code and/or data are divided into functions or groups of data, depending upon their usage. A compiler or other processing tool can analyze the references made between functions and groups of data. Then, existing program modules can be repartitioned by grouping the data and/or code into new program modules based on the analysis to optimize the program module grouping. This, in turn, will minimize the overhead created by out-of-module access. The process of determining how to split a module preferably begins by separating the module's code by functions. By way of example only, a tree structure can be extracted from the “call out” relationships of the functions. A function with no external call out, or a function which is not being referenced externally, can be identified as a “local” function. Functions having external references can be grouped by reference target modules, and should be identified as having an external reference. Similar groupings can be implemented for functions that are referenced externally, and such functions should be identified as being subject to an external reference. The data portion(s) of a module preferably undergo an equivalent analysis. The module groupings are preferably compared/matched to select a “best fit” combination. The best fit could be selected, for instance, based on the size of the LS 302, preferred transfer size, and/or alignment. Preferably, the more likely a reference is to be used, the higher it is weighted in the best fit analysis. Tools can also be used to automate the optimized grouping. For instance, the compiler and/or the linker may perform one or more compile/link iterations in order to generate a best fit executable file. References can also be statistically analyzed by runtime profiling.
In a preferred embodiment, the input to the regrouping process includes multiple object files that will be linked together to form a program. In such an embodiment, the desired output includes multiple load modules grouped to minimize the delay caused in waiting for a load completion.
In the example of
In the regrouping of
In a more complicated example,
The second module 604 includes code functions 630 and 632. The code function 630 includes the code for operation F. The code function 632 includes the code for operation G. The second module 604 includes data groups 634 and 636, which are associated with the code functions 630 and 632, respectively. Data group 638 is also included in the second module 604. The data group 634 includes data set (or group) F. The data group 636 includes data set G. The data group 638 includes data set FG.
The third module 606 includes code functions 640 and 642. The code function 640 includes the code for operation H. The code function 642 includes the code for operation I. The third module 606 includes data groups 644 and 646, which are associated with the code functions 640 and 642, respectively. Data group 648 is also included in the third module 606. The data group 644 includes data set (or group) H. The data group 646 includes data set I. The data group 648 includes data set IE.
The fourth module 608 includes code functions 650 and 652. The code function 650 includes the code for operation J. The code function 652 includes the code for operation K. The fourth module 608 includes data groups 654 and 656, which are associated with the code functions 640 and 642, respectively. The data group 654 includes data set (or group) J. The data group 656 includes data set K.
In the example of
With respect to the second code module 604, the code function 630 directly references data group 638 (arrow 637). The code function 632 also directly references data group 638 (arrow 639). With respect to the third code module 606, the code function 640 indirectly references code function 650 (dashed arrow 651). The code function 640 also indirectly references code function 652 (dashed arrow 653). The code function 642 directly references data group 648 (arrow 649). With respect to the fourth code module 608, the code function 650 directly references code function 652 (arrow 655).
There are eight local calls (direct references) and eight external calls (indirect references) in the function call tree 600. The eight external calls may create a significant amount of unwanted overhead. Therefore, it is preferable to regroup the components of the call tree 600 to minimize the indirect references.
In the example of
With respect to the second code module 664, the code function 614 now directly references code function 630 (arrow 631′) and code function 632 (arrow 633′). The code function 630 still directly references data group 638 (arrow 637), and the code function 632 still directly references data group 638 (arrow 639).
With respect to the third code module 666, the code function 616 indirectly references code function 640 (dashed arrow 641), but now directly references code function 642 (arrow 643′). The code function 618 now directly references code function 642 (arrow 645′) and data group 648 (arrow 647′). The code function 642 still directly references data group 648 (arrow 649).
With respect to the fourth code module 668, the code function 640 now directly references code function 650 (arrow 651′). The code function 640 also directly references code function 652 (arrow 653′). The code function 650 still directly references code function 652 (arrow 655).
There are now twelve local calls (direct references) and only four external calls (indirect references) in the function call tree 660. By reducing the number of indirect references in half, the amount of unwanted overhead can be minimized.
The number of modules that can be loaded into the LS 302 is limited by the size of the LS 302 and by the size of the modules themselves. However, code analysis on how references are addressed provides a powerful tool, which may enable the loading or unloading of program modules in the LS 302 before they are needed. If it can be determined at a certain point in the program that a program module will be needed, the loading can be performed ahead of time to reduce the latency of loading modules on demand. Even if it is not completely certain that a given module will be used, in many cases it is more efficient to predictively load the module if it is very likely (e.g., 75% or more) to be used.
The references can be made strict, or on-demand checking may be permitted, depending upon the likeliness that the reference will actually be used. The insertion point in the program for such load routines can be determined statistically using a compiler or equivalent tool. The insertion point can also be determined statically before the module is created. The validity of the insertion point can be determined based upon runtime conditions. For example, a load routine may be utilized that judges whether the load should or should not be performed. Preferably, the amount of loading and unloading is minimized for a set of program modules loaded at run time. Runtime profiling analysis can provide up to date information to determine the locations of each module to be loaded. Due to typical stack management, arbitrary load locations should be chosen for modules that do not have further calls. For instance, in a conventional stack management process, stack frames are constructed by return pointers. When a function returns, the module containing the calling module must be located in the same location as when it was called. As long as a module is loaded to the same location when it returns, it is possible to load it to a different location each time the module is newly called. However, when returning from an external function call, the management routine loads the calling module to the original location.
,
A key benefit of program module optimization in accordance with aspects of the present invention is the minimization of the time spent waiting for the loading and unloading of modules. One factor that comes into play is the latency and the bandwidth of module transfers. The time spent during the actual transfer is directly related to the following factors: (a) the number of times a reference is made; (b) the latency for a transfer setup; (c) the transfer size; and (d) the transfer bandwidth. Another factor is the size of the available memory space.
While static analysis may be used as part of the code organization process, it generally is limited to providing relationships between the functions and does not provide information on how many times calls are made to a given function in a set period of time. Preferably, a reference to such static data is used as a factor in regrouping. Additional analysis of the code may also be used to provide some level of information on the frequency and number of times function calls are made within a function. In one embodiment, optimization may be limited to the information that can be obtained using only a static analysis.
Another element that can be included in the optimization algorithm is the size and expected layout of the modules. For example, if a caller module has to be unloaded to load the callee module, the unloading would add more latency to complete the function call.
In designing optimization algorithms, one or more factors (e.g., weighting factors) are preferably included, which are used to quantify the optimization. In one factor, the functional references are preferably weighted with the frequency of calls, the number of times the module is called, and the size of the module. For instance, the number of times a module may be called can be multiplied by the size of the module. In a static analysis mode, function calls farther down the call tree could be given more weighting to indicate that the call would be made more frequently.
In another factor, if a call remains within a module (a local reference), the weighting can be reduced or given a weight of zero. In a further factor, different weights can be set to call from a function with analysis of the code structure. For example, a call made only one time is desirably weighted lower than a call made numerous times as part of a loop. Furthermore, if the number of loop iterations can be determined, that number could be used as the weighting factor for the loop call. In yet another factor, a static data reference used only by a single function should be considered as attached to that function. In another factor, if static data is shared between different functions, it may be desirable to include those functions in a single module.
In a further factor, if an entire program is small enough, the program should be placed into a single module. Otherwise, the program should be split into multiple modules. In another factor, if the program module is split into multiple modules, it is preferable to organize the modules so that both caller and callee modules fit into the memory together. The last two factors relating to splitting a program into a module should be evaluated in view of the other factors in order to achieve a desirable optimization algorithm. The figures discussed above illustrate various reorganizations in accordance with one or more selected factors.
Each computer processing system can include, for example, one or more computing devices having user inputs such as a keyboard 811 and mouse 812 (and various other types of known input devices such as pen-inputs, joysticks, buttons, touch screens, etc.), a display interface 813 (such as connector, port, card, etc.) for connection to a display 814, which could include, for instance, a CRT, LCD, or plasma screen monitor, TV, projector, etc. Each computer also preferably includes the normal processing components found in such devices such as one or more memories and one or more processors located within the computer processing system. The memories and processors within such computing device are adapted to perform, for instance, processing of program modules using programming references in accordance with the various aspects of the present invention as described herein. The memories can include local and external memories for storing code functions and data groups in accordance with the present invention.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims.
This application is a continuation, of U.S. application Ser. No. 10/957,158, filed on Oct. 1, 2004, the entire disclosure of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 10957158 | Oct 2004 | US |
Child | 12228689 | US |