The present invention relates to the field of performance analysis, and more particularly to performance analysis using semantic knowledge.
Computer systems execute programs that solve complex computational problems.
Preferably, the programs achieve high levels of performance, reduce wasted computer resources, and execute at peak speed. “Performance analysis” is the process of analyzing and understanding the execution characteristics of programs to identify impediments that prevent programs from running at peak speed, or their highest level of performance.
The amount of information required to completely characterize the execution of a program is massive, however, and it is therefore difficult or impossible to analyze all the data manually. Current automatic “performance analyzers” present performance data textually or graphically and direct the user's attention to patterns that may indicate a performance problem. These tools, however, lack an understanding of the meaning, or “semantic knowledge,” of the analyzed program, which limits their effectiveness in solving performance problems.
For example, performance analyzers generally attempt to identify algorithms that ineffectively use computer resources. To do this, conventional performance analyzers may identify parts of a program that take a long time to execute. This heuristic, however, may be deceptive. For instance, such an analyzer would identify a well-written algorithm as a poorly-performing algorithm simply because it unavoidably requires a lot of time to execute. Such an analyzer would also fail to identify poorly-performing algorithms because they do not take a long time to execute or because they are not central to the program. Without knowledge of the semantics of the programs, or how program components are supposed to run, an automatic performance analyzer cannot adequately determine whether a particular component of a program exhibits poor performance.
Performance analysis is also important in multiprocessing computer systems. A multiprocessing computer system comprises multiple processors in which different portions of a program execute in parallel in the different processors. Or, it is a system in which a program executes in parallel over multiple computers, each with a different processor. In such a computer system, resources may be wasted if processors are idle (i.e., not executing a program instruction) for any length of time. Thus, an automatic performance analyzer identifies algorithms that do not effectively divide tasks over the available processors, i.e., they have low “parallelism.” Conventional performance analyzers generally attempt to identify algorithms with low parallelism by indicating instances during program execution when one or more of the processors are idle. This may indicate when the program is not using the available processor resources as well as it could. Such a heuristic, however, may also identify instances when processors are expected to be idle, such as during the traversal of a linked list by a single processor. Further, even during the course of executing an extremely efficient program, the number of instances that one or more processors may be idle could be one billion or more. Conventional automated performance analyzers are incapable of distinguishing instances when the processors are expected to be idle from instances when they are not. Therefore, without knowledge of the semantics of the program, or how program components are supposed to run, automatic performance analyzers cannot adequately determine low parallelism portions of programs.
Thus, there is a need for performance analysis that identifies performance impediments based on an understanding of the meaning, or semantic knowledge, of the portions of the program being analyzed.
Methods and systems consistent with this invention analyze the performance of a program executed in a data processing system. Such methods and systems assign a semantic to the performance of the program, and measure the level of performance of the program based on the semantic. As part of assigning a semantic, such methods and systems indicate a class of processing of which to measure performance, and may define a suctitude, i.e. a degree of poor performance, associated with that class. Such methods and systems define the class as a processing function that could contribute to the poor performance of the program. As part of measuring the level of performance, such methods and systems measure the suctitude of the indicated class during program execution.
The summary and the following detailed description should not restrict the scope of the claimed invention. Both provide examples and explanations to enable others to practice the invention.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an implementation of the invention and, together with the description, serve to explain the advantages and principles of the invention. In the drawings,
Overview
The following description of embodiments of this invention refers to the accompanying drawings. Where appropriate, the same reference numbers in different drawings refer to the same or similar elements.
Methods and systems consistent with this invention provide a performance analysis tool that identifies performance impediments based on an understanding of the meaning, or semantic knowledge, of the portions of the program being analyzed. Such methods and systems assign a semantic to the performance of the program, and then measure the level of performance of the program based on the semantic. The semantic may comprise a class of processing and a suctitude, i.e. a degree of poor performance, associated with the class. A class is anything that could contribute to the poor performance of a computer program. Use of semantic knowledge facilitates identification of impediments that prevent the program from executing a peak speed.
Implementation Details
Methods and systems consistent with this invention first gather data concerning the execution characteristics of program 108. The process of gathering information for performance analysis is called “instrumentation.” Instrumentation requires adding instructions to analyzed program 108 so that when it executes, these instructions generate data from which performance analyzer 106 derives performance information. For example, in “subprogram level instrumentation,” each subprogram is instrumented with a set of instructions that generate data reflecting calls to the subprogram. This allows, for example, the tracking of the number times the subprogram is called. Performance analyzer 106 may analyze the instrumentation data generated during execution of program 108 after program 106 is finished executing. Alternatively, performance analyzer 106 may analyze the data during execution of program 108. An example of instrumentation consistent with this invention is described in more detail below. Both the instrumentation and performance analyzer 106 may comprise “performance analysis.”
Methods and systems consistent with this invention analyze the performance of program 108, which is executed in data processing system 100. Such methods and systems assign a semantic to the performance of program 108, and measure the level of performance of the program based on the semantic. For instance, the semantic may take the form of a “class” and a “suctitude,” which is a degree of poor performance associated with the class. The class indicates the type of processing in terms that are meaningful in the context of application 108. For example, in a scientific application, classes may include “input,” “matrix multiplication,” and “output.” For multiprocessor computer systems, a class may be “idle processor,” or “stalled processor.” Another class may be “cache misses,” which occur when a cache memory for storing variable data is overwritten with other data. Other classes may be “sorting,” “searching,” “convolution,” or “decryption.” Essentially, a class may be anything that could contribute to the poor performance of a computer program, and not all classes consistent with this invention are listed here.
The suctitude indicates the degree to which the class constitutes a performance problem, where higher values may indicate larger problems. For example, if the class were “idle processor,” the suctitude may be defined as one. Thus, if processor 116 were idle, this would produce a calculated suctitude of one per unit of time. Ten idle processors would produce a calculated suctitude of 10 per unit of time. The suctitude of “stalled processor” may be 10, larger than the suctitude of an “idle processor,” because a stalled processor is more of a performance impediment than an idle processor. The unit of the elapsed time may be seconds, for example.
The user of performance analyzer 106 may dynamically indicate the classes that he or she believes is important for analysis, i.e., the classes he or she believes may inhibit good performance. Further, the user may dynamically define the suctitudes associated with the classes, reflecting his or her belief of the relative suctitude of the different classes. Alternatively, the classes and suctitudes are already indicated or defined in performance analyzer 106 by the software vendor.
Methods and systems consistent with this invention may divide classes into subclasses. For example, if the class were “stalled processor,” the subclasses could be (1) “stalled as a result of user,” or (2) “stalled as a result of system 100 activity.” Processor 116 may stall as a result of system 100 activity if, for instance, processor 116 is waiting for input/output to complete (or waiting for access to data), waiting for system resources to be assigned, or waiting for a dependent piece of the program to complete. Processor 116 may stall as a result of user activity, for instance, if processor 116 is waiting for the user to input information through input device 122.
Methods and systems consistent with this invention may also indicate a plurality of classes of which to measure performance, and define a suctitude associated with each class. Such methods and systems may also calculate the aggregate suctitude. The aggregate suctitude at any given time is the sum of the calculated suctitudes of all classes.
A class's defined suctitude may be a constant or it may be a function of other execution data. For example, an idle processor may have a constant suctitude per unit time, while other classes, such as “matrix multiplication,” may have a defined suctitude that is a function of the amount of work done and the time taken to do the work. In matrix multiplication, a square matrix M may have N rows and columns. The amount of time taken to square this matrix (matrix multiplication) may be proportional to the cube of the size of the matrix, or N3. Therefore, the suctitude may be a function defined by B=T−N3, where B is the suctitude and T is the total time it took for the matrix multiplication. If N were 10, and if the total amount of time T to square the matrix were 1000 units, the calculated suctitude B would be zero. If, on the other hand, the total amount of time T to square that matrix were 4000 units, then the suctitude B would be 3000. To the extent to which it takes more time than expected to perform the matrix multiplication, the higher the calculated suctitude.
The “matrix multiplication” class is used here to describe one example of performance instrumentation. When program 108 enters matrix multiplication subroutine, it records in secondary storage 120 (1) a start time when it entered the subroutine, and (2) the size of matrix M. When program 108 leaves matrix multiplication subroutine, it records (3) a leave time in secondary storage device 120. Performance analyzer 106 may analyze this data at a later time after program 108 finishes execution. Program 108 is instrumented to record all the data necessary for performance analyzer 106 to perform the semantic analysis described above. In this instance, performance analyzer 106 knows the size of matrix M and the amount of time it took perform the matrix multiplication and can calculate the suctitude. Because the suctitude may not be calculated during the execution of the program, the user may dynamically indicate and define the classes and suctitudes, as described above. The other classes in program 108 may similarly be instrumented, ensuring that the necessary data for the suctitudes to be calculated is recorded in secondary storage 120.
An example of an instrumented matrix multiplication subroutine follows:
This subroutine multiplies a matrix A by a matrix B and stores the result in a matrix C. Matrices A, B, and C are square of size N. Variable START_NSEC stores the time the subroutine starts, and variable END13 NSEC sores the time the subroutine ends. Subroutine SAVE13 START_STOP_AND_SIZE stores the variables START13 NSEC, END13 NSEC, and N to secondary storage device 120 so that performance analyzer may analyze this data at a later time. An example of the data stored to secondary storage device 120 follows:
Other classes whose suctitudes may be defined as a function of other execution data are “cache misses,” “input,” and “output.” For example, if the class were “cache misses,” five percent of memory requests ending up in cache misses may result in a calculated suctitude of 5 per unit time. If the class were “input” or “output,” the suctitude may be defined as a function of the amount of data input or output during a unit of time.
In a multiprocessor environment, two of the factors in the total performance of analyzed program 108 are (1) the amount of time spent waiting for another processor to finish, and (2) the amount of time spent executing code that cannot be executed in parallel. An example of code in a multiprocessor environment follows:
Subroutine READ_DATA may be code that cannot be executed in parallel; subroutine PROCESS_DATA may efficiently execute in parallel processors; and subroutine WAIT_FOR13 ALL_CPUS_TO_FINSH is executed while waiting for all the CPUs to finish executing code. Such a code may be instrumented by storing in secondary storage device 120 (1) the amount of time spent in READ_DATA, and (2) the amount of time spent in WAIT_FOR_ALL_CPUS_TO_FINSH, and then assigning a suctitude to each. READ_DATA may have a higher defined suctitude relative to WAIT_FOR_ALL_CPUS_TO_FINSH because single-CPU regions may be more of a performance impediment to parallel performance than synchronization.
Once the execution data has been gathered and suctitudes defined, performance analyzer 106 may use the calculated suctitude data to identify performance impediments in program 108. Methods and systems consistent with this invention display the calculated suctitude of a class as a function of time. In this case, the display may be a line graph with the suctitude on the ordinate, or Y-axis, and time on the abscissa, or X-axis. U.S. Pat. No. 6,434,714, entitled “Methods, Systems, and Articles of Manufacture for Analyzing Performance of Application Programs,” hereby incorporated by reference, describes ways of displaying performance analysis results. Performance analyzer 106 may display the calculated aggregate suctitude of all the classes as a function of time, or it may display the calculated suctitude of only one or a select few classes. Thus, the user may isolate the suctitude of a particular class.
Alternatively, methods and systems consistent with this invention display when the calculated suctitude of a class or group of classes exceeds a threshold. Or, performance analyzer 106 may indicate when during execution in the program the calculated suctitude reached a maximum. The calculated suctitude may be displayed as a, color, with different colors representing different numerical levels of calculated suctitude.
Systems consistent with this invention are applicable to all programs written in all computer programming languages, including Fortran 77, Fortran 95, Java, C, C++, and assembler for any given computer.
One skilled in the art appreciates that numerous variations to this system exist. For example, the performance data may be tabulated and displayed in any fashion. Although methods and systems consistent with this invention have been described with reference to a preferred embodiment thereof, those skilled in the art know various changes in form and detail which may be made without departing from the spirit and scope of this invention as defined in the appended claims and their full scope of equivalents.
Number | Name | Date | Kind |
---|---|---|---|
4675832 | Robinson et al. | Jun 1987 | A |
4685082 | Cheung et al. | Aug 1987 | A |
4812996 | Stubbs | Mar 1989 | A |
5073851 | Masterson et al. | Dec 1991 | A |
5075847 | Fromme | Dec 1991 | A |
5079707 | Bird et al. | Jan 1992 | A |
5119465 | Jack et al. | Jun 1992 | A |
5146593 | Brandle et al. | Sep 1992 | A |
5168563 | Shenoy et al. | Dec 1992 | A |
5179702 | Spix et al. | Jan 1993 | A |
5274813 | Itoh | Dec 1993 | A |
5274821 | Rouquie | Dec 1993 | A |
5297274 | Jackson | Mar 1994 | A |
5301312 | Christopher, Jr. et al. | Apr 1994 | A |
5325499 | Kummer et al. | Jun 1994 | A |
5325533 | McInerney et al. | Jun 1994 | A |
5353401 | Iizawa et al. | Oct 1994 | A |
5390314 | Swanson | Feb 1995 | A |
5438659 | Notess et al. | Aug 1995 | A |
5450542 | Lehman et al. | Sep 1995 | A |
5463775 | DeWitt et al. | Oct 1995 | A |
5485574 | Bolosky et al. | Jan 1996 | A |
5485619 | Lai et al. | Jan 1996 | A |
5497458 | Finch et al. | Mar 1996 | A |
5499349 | Nikhil et al. | Mar 1996 | A |
5500881 | Levin et al. | Mar 1996 | A |
5519866 | Lawrence et al. | May 1996 | A |
5530816 | Holt | Jun 1996 | A |
5535364 | Resman et al. | Jul 1996 | A |
5535393 | Reeve et al. | Jul 1996 | A |
5539907 | Srivastava et al. | Jul 1996 | A |
5553235 | Chen et al. | Sep 1996 | A |
5574922 | James | Nov 1996 | A |
5613063 | Eustace et al. | Mar 1997 | A |
5636374 | Rodgers et al. | Jun 1997 | A |
5640550 | Coker | Jun 1997 | A |
5673387 | Chen et al. | Sep 1997 | A |
5675790 | Walls | Oct 1997 | A |
5675802 | Allen et al. | Oct 1997 | A |
5689712 | Heisch | Nov 1997 | A |
5696937 | White et al. | Dec 1997 | A |
5710727 | Mitchell et al. | Jan 1998 | A |
5724262 | Ghahramani | Mar 1998 | A |
5734822 | Houha et al. | Mar 1998 | A |
5737605 | Cunningham et al. | Apr 1998 | A |
5740431 | Rail | Apr 1998 | A |
5740433 | Carini | Apr 1998 | A |
5742793 | Sturges et al. | Apr 1998 | A |
5745897 | Perkins et al. | Apr 1998 | A |
5748892 | Richardson | May 1998 | A |
5748961 | Hanna et al. | May 1998 | A |
5754820 | Yamagami | May 1998 | A |
5761426 | Ishizaki et al. | Jun 1998 | A |
5774724 | Heisch | Jun 1998 | A |
5784698 | Brady et al. | Jul 1998 | A |
5787480 | Scales et al. | Jul 1998 | A |
5805795 | Whitten | Sep 1998 | A |
5812799 | Zuravleff et al. | Sep 1998 | A |
5835705 | Larsen et al. | Nov 1998 | A |
5850554 | Carver | Dec 1998 | A |
5860024 | Kyle et al. | Jan 1999 | A |
5864867 | Krusche et al. | Jan 1999 | A |
5867649 | Larson | Feb 1999 | A |
5867735 | Zuravleff et al. | Feb 1999 | A |
5872977 | Thompson | Feb 1999 | A |
5890171 | Blumer et al. | Mar 1999 | A |
5905488 | Demers et al. | May 1999 | A |
5905856 | Ottensooser | May 1999 | A |
5913223 | Sheppard et al. | Jun 1999 | A |
5920895 | Perazzoli, Jr. et al. | Jul 1999 | A |
5963975 | Boyle et al. | Oct 1999 | A |
5968114 | Wentka et al. | Oct 1999 | A |
5970510 | Sher et al. | Oct 1999 | A |
5974510 | Cheng et al. | Oct 1999 | A |
5974536 | Richardson | Oct 1999 | A |
5978892 | Noel et al. | Nov 1999 | A |
5987479 | Oliver | Nov 1999 | A |
5991708 | Levine et al. | Nov 1999 | A |
5991893 | Snider | Nov 1999 | A |
6006031 | Andrews et al. | Dec 1999 | A |
6009514 | Henzinger et al. | Dec 1999 | A |
6014517 | Shagam et al. | Jan 2000 | A |
6016474 | Kim et al. | Jan 2000 | A |
6018793 | Rao | Jan 2000 | A |
6023583 | Honda | Feb 2000 | A |
6044438 | Olnowich | Mar 2000 | A |
6049798 | Bishop et al. | Apr 2000 | A |
6049855 | Jeddeloh | Apr 2000 | A |
6052708 | Flynn et al. | Apr 2000 | A |
6052763 | Maruyama | Apr 2000 | A |
6055368 | Kunioka | Apr 2000 | A |
6065019 | Ault et al. | May 2000 | A |
6066181 | DeMaster | May 2000 | A |
6072951 | Donovan et al. | Jun 2000 | A |
6077312 | Bates et al. | Jun 2000 | A |
6081868 | Brooks | Jun 2000 | A |
6085029 | Kolawa et al. | Jul 2000 | A |
6088771 | Steely, Jr. et al. | Jul 2000 | A |
6098169 | Ranganathan | Aug 2000 | A |
6101325 | Flaat | Aug 2000 | A |
6101525 | Hecker | Aug 2000 | A |
6108343 | Cruickshank et al. | Aug 2000 | A |
6119198 | Fromm | Sep 2000 | A |
6125430 | Noel et al. | Sep 2000 | A |
6141692 | Loewenstein et al. | Oct 2000 | A |
6145054 | Mehrotra et al. | Nov 2000 | A |
6167565 | Kanamori | Dec 2000 | A |
6173327 | De Borst et al. | Jan 2001 | B1 |
6173368 | Krueger et al. | Jan 2001 | B1 |
6205537 | Albonesi | Mar 2001 | B1 |
6223134 | Rust et al. | Apr 2001 | B1 |
6249906 | Levine et al. | Jun 2001 | B1 |
6253252 | Schofield | Jun 2001 | B1 |
6263485 | Schofield | Jul 2001 | B1 |
6269457 | Lane | Jul 2001 | B1 |
6282702 | Ungar | Aug 2001 | B1 |
6286130 | Poulsen et al. | Sep 2001 | B1 |
6295600 | Parady | Sep 2001 | B1 |
6304951 | Mealey et al. | Oct 2001 | B1 |
6311320 | Jibbe | Oct 2001 | B1 |
6314429 | Simser | Nov 2001 | B1 |
6317871 | Andrews et al. | Nov 2001 | B1 |
6341338 | Dennie | Jan 2002 | B1 |
6351845 | Hinker et al. | Feb 2002 | B1 |
6353829 | Koblenz et al. | Mar 2002 | B1 |
6353869 | Ofer et al. | Mar 2002 | B1 |
6366994 | Kalyur | Apr 2002 | B1 |
6369725 | Busaba | Apr 2002 | B1 |
6430657 | Mittal et al. | Aug 2002 | B1 |
6434714 | Lewis et al. | Aug 2002 | B1 |
6438745 | Kanamaru et al. | Aug 2002 | B1 |
6442162 | O'Neill et al. | Aug 2002 | B1 |
6473833 | Arimilli et al. | Oct 2002 | B1 |
6480818 | Alverson et al. | Nov 2002 | B1 |
6496902 | Faanes et al. | Dec 2002 | B1 |
6502136 | Higuchi et al. | Dec 2002 | B1 |
6523090 | Tremblay | Feb 2003 | B2 |
6542919 | Wendorf et al. | Apr 2003 | B1 |
6574725 | Kranich et al. | Jun 2003 | B1 |
6629214 | Arimilli et al. | Sep 2003 | B1 |
6647546 | Hinker et al. | Nov 2003 | B1 |
6684296 | Hayter et al. | Jan 2004 | B2 |
20010003831 | Boland | Jun 2001 | A1 |
20010051974 | Saad | Dec 2001 | A1 |
20020046201 | Hambry | Apr 2002 | A1 |
20020073360 | Lewis et al. | Jun 2002 | A1 |
20020078010 | Ehrman et al. | Jun 2002 | A1 |
20030061395 | Kingsbury et al. | Mar 2003 | A1 |
Number | Date | Country |
---|---|---|
199 34 515 | Jan 2000 | DE |
0390339 | Jan 1990 | EP |
0 390 339 | Oct 1990 | EP |
0 703 534 | Mar 1996 | EP |
0 817 044 | Jan 1998 | EP |
0 965 921 | Dec 1999 | EP |
1 024 432 | Aug 2000 | EP |
1 026 592 | Aug 2000 | EP |
1 081 585 | Mar 2001 | EP |
2 793 908 | Nov 2000 | FR |
2 324 942 | Nov 1998 | GB |
2 343 029 | Apr 2000 | GB |
2 357 873 | Jul 2001 | GB |
03-282731 | Dec 1991 | JP |
07-056716 | Mar 1995 | JP |
WO 9910812 | Mar 1999 | WO |