In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Embodiments of the present invention are directed to systems, methods, and computer program products for utilizing prior usage data for software build optimization. In one embodiment of this invention, a computer system performs a method for optimizing the processing of a set of data objects. The method involves the computer system packaging a first set of data objects into a first software build. The computer system evaluates at least a portion of the usage of the first software build in accordance with usage training scenarios. The computer system monitors the evaluation of the first software build in accordance with a first software build usage detection process to detect the use of data objects within the first software build. The computer system generates profile data for the data objects and the generated profile data includes an indication of usage for each data object. The computer system packages a second set of data objects into a second software build in accordance with the generated profile data from the first software build, wherein the second set of data objects is different from but includes one or more data objects from the first set of data objects.
Embodiments within the scope of the present invention also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise physical storage and/or memory media such as RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired and wireless) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.
Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described herein. Rather, the specific features and acts described herein are disclosed as example forms of implementing the claims.
A software application is a program that allows a user to interface with and perform one or more tasks on a computer system. Once instantiated, software applications perform functions and each function may utilize one or more data objects (e.g., 101-105) in the performance of a function. A software build (e.g., 111 and 131) can comprise one or more applications.
In some embodiments, (re)packaging module 110 may also be capable of repackaging one or more data objects based on generated profile data 126. The (re)packaging module 110 may package the data objects of set 152 (101A, 102, 103A, 104 and 105) in any order: in some instances, the data objects may be packaged in the same order as in software build 111 (as shown in software build 131); in other cases, the data objects may be packaged in an order different from software build 111. Data objects 101A and 103A indicate that these data objects are modified forms of data objects 101 and 103, respectively. The data objects may have been modified by one or more developers, as explained above. In some embodiments, the (re)packaging module 110 repackages the second set of data objects (e.g., 101A-105) according to the profile data 126 received from the monitoring module 125.
In some embodiments, (Re)packaging module 110 may be capable of packaging one or more data objects (101A-105) that have been previously packaged. The (re)packaging module 110 may package the data objects (101A-105) in any order: in some instances, the data objects may be packaged in the same order as in software build 111; in other cases, the data objects may be packaged in an order different from software build 111. In some embodiments, the (re)packaging module 110 repackages set 152 (101A-105) according to the profile data 126 (generated from monitoring the use of software build 111) received from the monitoring module 125. In this manner, the (re)packaging module 110 may package set 152 (101A-105) according to use during the training scenarios 122.
In some cases, build 131 may be tested using the profile data collected from training scenarios 122. For example, software build 111 can be put through one or more training scenarios 122, and the usage data 123 gathered by the usage detection module 124. Usage data 123 can be recorded by the monitoring module 125 as profile data 126, the (re)packaging module may package set 152 (101A-105) according to profile data 126.
Additionally or alternatively, the testing module 120 may test software build 131 for functionality based on the profile data 126 gathered for build 111. In this manner, each new build may avoid going through training scenarios 122 because the usage data 123 is reused. This can potentially save large amounts of time and computing power. Optionally, a new software build may be sent through the training scenarios 122 and the profile data 126 may be modified according to the usage data objects in the new build.
Method 200 includes an act of packaging a first set of data objects into a first software build (act 210). For example, a (re)packing module 110 may package a set 151 (101-105) into a first software build 111. Packaging, as used herein, refers to combining or compiling data objects into a group or conglomerate of data objects. The conglomerate of data objects can be referred to as a software build 111, as explained above.
Method 200 also includes an act of evaluating at least a portion of the usage of the first software build in accordance with usage training scenarios (act 220). For example, a testing module 120 may evaluate at least a portion of the usage of software build 111 in accordance with training scenarios 122. In some embodiments, training scenarios 122 include one or more usage scenarios that usage detection module 124 can use to evaluate at least a portion of the usage of binaries in software build 111 caused by running application commands, calling functions, executing code paths, loading and unloading processes, or using other means of activating software build functionality. Thus, training scenarios 122 can exercise important and/or more common code-paths in software build 111 and record what portions of the application's binaries (e.g., data objects 101-105) get touched when those code-paths are exercised. For example, every use of software build 111 will have to launch software build 111 in order to use software build 111. As such, training scenarios 122 can include a usage scenario to detect binary usage when software build 111 is launched.
Method 200 also includes an act of monitoring the evaluation of the first software build in accordance with a usage detection process to detect the use of data objects within the first software build (act 230). For example, monitoring module 125 may monitor the evaluation of the first software build 111 in accordance with a software build usage detection process to detect the use of data objects (101-105) within the first software build 111.
In some embodiments, monitoring module 125 may monitor the evaluation of a first software build 111, as explained above, and record each time a function or piece of source code or data object is used. These records are referred to herein as profile data, as explained below.
The method of
In some embodiments, the profile data 126 may include block offsets or block weights. Blocks are portions of software source code. In some cases, the act of monitoring the evaluation of the first software build in accordance with a software build usage detection process to detect the use of data objects within the first software build (act 230) monitors and stores information indicating the number of times that particular portion of source code was executed during the usage detection process. The number of times the block was executed during the usage detection process is referred to herein as the block weight.
In some embodiments, profile data may additionally or alternatively include metadata tokens. Metadata tokens are identifiers which may be used to identify functions or data structures within software source code. Metadata tokens may also be used to identify references to functions within builds of other applications. Furthermore, metadata tokens may be used to identify which data structures have been called or executed during a software build usage detection process. In some embodiments, during the act of monitoring the evaluation of the first software build in accordance with a software build usage detection process to detect the use of data objects within the first software build (act 230), monitoring module 125 may monitor metadata tokens that are configured to indicate which functions were executed during the software build usage detection process.
In some embodiments, profile data may additionally or alternatively include blobs. Blobs are identifiers capable of identifying generic methods in a software build. In some cases, blobs may be signatures for generic methods. Additionally or alternatively, blobs may include metadata strings, or strings of identifiers such as metadata tokens. In some embodiments, during monitoring, monitoring module 125 may monitor blobs that indicate which generic methods were executed during the software build usage detection process. The blobs may also identify other identifiers such as metadata tokens that indicate which functions or sections of source code were executed during the software usage detection testing process.
In some embodiments, profile data includes information indicating which Common Language Runtime (CLR) data structures were referenced during the software build usage detection process. These CLR data structures may be identified by metadata tokens or blobs, as described above. Moreover, in some embodiments, profile data may include at least one of block weights, metadata tokens or blobs indicating which code paths were executed during the software build usage detection process. In some cases, the executed code paths may be mapped and stored and are thus available for use in repackaging a build, as explained below.
Method 200 also includes an act of packaging the second set of data objects into a second software build in accordance with the generated profile data from the first software build, wherein the second set of data objects is different from but includes one or more data objects from the first set of data objects (act 250). For example, (re)packing module 110 can repackage set 152 (101A-105) into software build 131 based on profile data 126. In some embodiments, usage data 123 is packaged with the generated build 131 so that the usage data 123 can be used to optimally lay out the code/CLR data structures in the binary at application run-time. Each data object (101-105) that was accessed during the training scenarios 122 is considered “hot” and all “hot” data objects are packaged together. All other data objects are placed in a “cold” section. The idea is that if only data objects or code from the hot section are required at application runtime, then everything required can be accessed by touching very few pages on the hard-disk (the cold section pages have been sifted out, and hence do not need to be accessed). In some cases, this may lead to significant application start-up time and working set size improvements.
In some embodiments, the act of packaging the second set of data objects in accordance with the generated profile data comprises mapping the metadata tokens from a first version of the software build to a second version of the software build. As explained above, metadata tokens may identify which portions of source code were executed during the software build usage detection process. The metadata tokens of the first version of a software build (i.e. build 111) may be mapped and stored during the software build usage detection process. Next, the mapped metadata tokens may be used by the repackaging module 110 to repackage the second set of data objects (i.e. build 131) according to which functions and/or portions of source code were identified by the metadata tokens. Furthermore, in some embodiments, basic block weights for the code are mapped from build 111 to 131 by analyzing the basic block graphs (sometimes referred to as Control Flow Graphs) for the two builds (111 and 131) and doing a best-effort job of transferring block weights from one graph to the other.
Optionally, in some embodiments, the computer system may evaluate at least a portion of the functionality of any data objects in the second software build using profile data generated from the first software build. For example, testing module 120 can evaluate the functionality of software build 131 using profile data 126 generated from build 111.
The method of
The method of
The method of
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.