N/A
As computerized systems have increased in popularity, so have the complexity of the software and hardware employed within such systems. In general, there are a number of reasons that drive software and hardware changes for computerized systems. For example, as hardware capabilities improve, software often needs to change to accommodate new hardware requirements. Similarly, as software becomes more demanding, a similar effect occurs that can push hardware capabilities into new ground. In addition to these reciprocating push forces, end-users continue to demand that software and hardware add convenience by improving automation of certain tasks or features, or by adding automation where none previously existed.
For at least these reasons, software is continually being developed. In some cases, new software programs are written entirely from scratch, while in other cases, some software programs continue through ongoing, incremental updates. Developing software, however, is not a simple matter. In particular, software development typically involves not only the creation of executable code, but also extensive testing techniques to ensure that the executable code works properly. In this regard, there are a variety of metrics and considerations that can be used to gauge whether a program works as intended, or in accordance with certain hardware and software expectations.
One such consideration is the basic level of input/output operation, where an executable computer program simply provides certain expected outputs in response to certain inputs. For example, a tester might want to determine if a particular user interface of an application program displays certain data or results in response to certain provided inputs. Other considerations in software testing can include how well the given application programs allocate or use resources during execution of certain functions. That is, beyond whether an application program actually performs a particular function, a tester might be interested to see if the application program was well-written, in that it does not tax a computer's resources any more than it needs to with certain executions.
One example of this is the consideration referred to herein of “memory locality,” which is also sometimes referred to as “locality of reference,” or more simply as “locality.” In general, locality refers at least partly to the notion that more frequently accessed data items should be fetched into cache memory. Cache memory, in turn, tends to be much faster than main memory, which is where the application program and data is otherwise loaded during execution. Along these lines, locality also refers at least partly to the notion that sequentially-accessed data items should be brought into cache memory together, or at least in sequence, since it is faster to read items already in cache rather than continually pull them in from outside of cache. Thus an application program can be optimized for speed often by improving locality (i.e., sequential accessibility) of its data items (i.e., functions, information, etc.)
In general, one way that application programs can improve efficiency through optimizing locality is by referencing data items in main memory so that data items that are needed in sequence are stored together in cache. This is particularly the case since memory items are typically pulled into cache memory in “chunks” (ranges of addresses). That is, data items in neighboring memory addresses are pulled into cache memory at the same time as the targeted data items are pulled into cache memory. Well-written application programs can thus be configured to ensure that frequently and sequentially accessed data items are pulled into cache memory together, so that they can be executed in cache memory without interruption.
One can appreciate, therefore, that arranging data references in memory poorly can result in less-efficient execution. This can occur when the result is that sequentially-accessed data items are not arranged near or next to each other, and/or are otherwise pulled into cache memory in different, non-sequential chunks. Specifically, poor memory reference locality leads to more costly paging and caching behavior (i.e., more page faults and more cache faults). Accordingly, developers will often endeavor to optimize locality considerations when writing or developing code.
Unfortunately, current hardware specifications and developments have made such memory locality optimizations increasingly more difficult. For example, the memory system behavior of present software is typically determined by a complex set of factors that include code size, program data structures used, mapping of code and data to memory addresses, how the memory addresses are accessed, and architectural considerations, such as cache and memory configuration. Current tools generally do not make it easy for an average programmer to understand whether his/her software has a reference locality problem or not, or to identify problem areas in the code or data structures used. Consequently, programmers have very little idea of a program's memory systems behavior, and often write programs with poor memory reference locality.
It is not surprising therefore, that a program's memory system performance is often the main determinant of its overall performance, particularly in light of the large performance gap between processor speeds and memory and disk access times.
Implementations of the present invention provide systems, methods, and computer program products configured to visibly represent an application program's memory footprint (i.e., memory locality, or reference locality). In at least one implementation, for example, a memory footprint user interface is configured to receive one or more memory address traces of an application program. The address traces include data regarding minimum and maximum memory addresses that are being accessed during execution of the application program. The memory footprint user interface can then provide a number of visible indicia for the given trace, where the indicia show memory accesses during application program execution. The memory footprint user interface can be adjusted with a number of different configurations and/or filters to display the memory access patterns, and to show the underlying code (or other information) for a particular memory access.
For example, a method of visually representing a memory footprint of the application program can involve identifying a time interval during which an application program executes a plurality of memory accesses. The method can also involve creating one or more address traces for the application program during the identified time interval. In addition, the method can involve generating pixel information corresponding to the memory accesses of the one or more memory address traces, where the memory accesses can be displayed in accordance with the identified time interval. Furthermore, the method can involve visibly displaying in a display window the pixel information in accordance with one or more filtration selections by the user, wherein the display of pixels indicate a memory access footprint for the application program during the selected time interval.
In addition, a user interface configured to visually represent a memory footprint of the application program can include a display window for viewing a memory footprint over a selected time interval. The user interface can also include a set of one or more memory controls configured to adjust the number of memory words, a range of the cache line, a page size, or a number of disk blocks displayed per sample. In addition, the user interface can include a set of one or more playback controls configured to display a plurality of different application program execution samples during the selected time interval. Furthermore, the user interface can include a plurality of selectable option controls configured to filter display of the application program execution during the selected time interval.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Implementations of the present invention extend to systems, methods, and computer program products configured to visibly represent an application program's memory footprint (i.e., memory locality, or reference locality). In at least one implementation, for example, a memory footprint user interface is configured to receive one or more memory address traces of an application program. The address traces include data regarding minimum and maximum memory addresses that are being accessed during execution of the application program. The memory footprint user interface can then provide a number of visible indicia for the given trace, where the indicia show memory accesses during application program execution. The memory footprint user interface can be adjusted with a number of different configurations and/or filters to display the memory access patterns, and to show the underlying code (or other information) for a particular memory access.
Accordingly, and as will be understood more fully from the following specification and claims, implementations of the present invention can provide a wide range of advantages for optimizing memory accesses by an application program. In at least one implementation, this can be done via one or more tools that provide effective memory usage visualizations. In particular, the one or more tools can be configured to animate memory access and instruction-addressed trace information over a time interval of application program execution.
These visualizations and/or animations, in turn, can allow a programmer to quickly learn the total memory footprint of a program. These visualizations and/or animations can indicate to a developer/programmer how memory is being (or not being) reused, the size of what the dynamic working set is over time, and data access patterns (e.g., linear scans, sequential strides, cyclic patterns, etc.) As a result, a developer/programmer can identify problem areas in poorly behaving code and data structures more easily.
In general, the tool in accordance with at least one implementation of the present invention has essentially three modes of operation. At least one mode of operation emphasizes reuse of memory, while another emphasizes the working set over time, and yet another emphasizes the total memory footprint of the application program. In each case, and as discussed more fully herein, a region of display space is used to represent the total range of memory addresses touched by the given application program. In particular, each memory access highlights a region of the display, where by each pixel represents an amount of words of memory. In one implementation, the mapping of pixel(s) to word(s) of memory depends on the size of the display space (the window size), the range of memory address accessed during the trace, a user specified zoom factor (the user can zoom in for more details), and an optional user specified block size (e.g., the user could ask to see if memory is cache page or disk page blocks).
As described or otherwise illustrated more fully herein, when an address is referenced, the corresponding pixel(s) can be color coded based on one of several alternative schemes. One scheme might use, for example, green for memory accesses by application instructions, blue for memory accesses for read operations, and red for memory accesses for data write operations. Another scheme could be used to encode address reference frequency into the color, so that warmer colors (red) indicate the most frequently accessed parts of memory. Yet another scheme could be used to encode information about cache faults or paging faults into the color. Still another scheme could be used to encode thread identifiers into a given color, which can be particularly valuable for multi-threaded programs.
Referring now to the Figures,
Any time a particular data reference is processed, the chunk of memory location corresponding to that data reference is cached into one or more of the cache 109 locations (e.g., L1, L2, etc.) Since it is faster to process data access requests from cache 109 rather than from the rest of the regular memory 107, it is preferable to cache (into cache 109) as many of the memory references that are going to be used, rather than caching them (into cache 109) on an as-needed basis. Accordingly, and to determine the efficiency with which these various data access requests are made, application program 105 is processed through trace generator 110.
In general, trace generator 110 is used to instrument the application program 105 over a particular time interval. In one implementation, a user that desires to instrument application program 105 might select an interval of five seconds, ten seconds, a few minutes, or even an hour or so, which represents some time during which the application program 105 executes one or more memory access requests in memory 107, whether it is read, write, or PC, etc. Trace generator 110 then identifies the specific memory address (and/or address range) of each of the different memory accesses, as well as any additional information during the time interval, such as the name of the function associated with the data access, the underlying program code, when the data access occurred, and so on. In one implementation, the trace can identify whether the location of the memory reference resulted in a “cache miss,” whereby a function executed from one cache 109 block resulted in a next function having to be pulled from main memory, rather than from the same or adjacent cache 109 block.
In general, the user will view the trace data access points through a display window 200 using a variety of different usage controls (e.g., 135). As shown again with respect to
For example, selecting a read/write option 230 provides all the data access points that correspond to reads, writes, and/or PC. By contrast, selecting the frequency option 235 displays all the different memory accesses with various color codes, depending on how frequently those memory locations are accessed or referenced by the application program 105. Similarly, the threads option 240 can represent the memory accesses in a manner that is color-coded by thread, so that a user can view how the data access patterns might differ from one thread to another within the same application program. For example, an application program 105 comprising multiple threads could show the memory accesses in display window 200 as sets of blue colors for one thread and sets of red colors for another.
Similar filtering options are available for selecting memory footprint 205, memory reuse 250, and working set 255. For example, selecting memory footprint, which is a default, simply shows all memory accesses by all components of an application program as they occur. By contrast, selecting memory reuse 250 shows an animated playback of the memory references, fading each reference out relatively quickly so as to show a dynamic temporal view of how memory is being used. Selecting working set 255 shows a similar animated playback of memory references, but fades each reference out relatively slowly so as to show how the working set is changing over time.
Along these lines,
Thus,
Of course, these particular arrangements of memory accesses 263 may be different from one selected illustration to the next, and may mean different things depending on the selected trace. As previously mentioned, for example, the various aggregations of memory accesses 263 along specific lines can alternatively refer to memory accesses 263 corresponding to different application program 105 threads. In any event, the memory accesses of
For example,
In any event,
For example, each sample (represented by each horizontal position of interval selector 217) may be based on a sampling rate of several hundreds, thousands, or even millions of the total access requests during the entire interval. One will appreciate, therefore, that the interval selector 217 can be configured to move as slowly or as quickly along the time interval path as desired based on part on the particular sampling rate. If a user selected a sampling rate of 100, the interval selector 217 might move quite slowly along the time interval path, while, a sampling rate of 10,000 would move the interval selector 217 quicker by a factor of 100.
In either case, one will appreciate that a user may desire to adjust the sampling rate (e.g.,
In one implementation, the user can adjust what is displayed (as well as the resolution of what is displayed) using the set of memory controls 210. For example,
For example,
In any event, one will appreciate that the density of memory accesses 263 on display window 200 will increase as interval selector 217 moves along the interval selector path. For example,
As previously mentioned in
In this example, the user identifies corresponding data 270a for this specific memory access request 263a which indicates the address of the last PC, the name of the method referencing this address (“Allocation Req::Allocate Pages”), and the application program for this request (“sqlserver.exe”). As a preliminary matter, reference herein to any components, functions, or modules that are specific (or appear to be specific) to a MICROSOFT operating environment is made primarily by way of convenience in explanation. In particular, one will appreciate that implementations of the present invention can be applied to a wide range of operating environments and operating systems. Accordingly, reference herein to any specific component or module should not be construed as limiting to any particular operating environment or operating system.
In any event, and with further respect to
For example,
Accordingly, one will appreciate that a user can zoom further out or in the display window 200 to find memory footprint data at virtually any granularity. In at least one implementation, this ability to zoom inward and outward using zoom control 213 can be used regardless of the type of memory control 210 option selected. In particular, the changing of zoom by changing of words per pixel can be used during the default settings, as described above, as well as when “cache line,” “page size,” or “disk block” is selected.
Ultimately, one will appreciate that these and other controls of user interface 120 are specifically designed to enable a user to adjust the operation of a given application program. As previously mentioned, for example, a user can review the memory footprint and focus primarily on memory accesses 263 that fall significantly outside of the application instruction grouping 264, the memory stack grouping 265, or the memory heap grouping 275. In some cases, the user may even focus more narrowly on clusters of memory accesses 263 that fall outside of one of the expected groupings 263, 264, and 275. The intent of these adjustments would generally be to manipulate the corresponding code so that the memory accesses 263 would be located near other memory accesses of the same grouping, and thus more likely to be cached to cache at the same time as the other memory accesses.
Along these lines,
As previously mentioned,
In one implementation, the fade controls 305 are used with the “memory reuse” control 250 and the “working set” control 255. In this example, each block of pixels can be configured to fade to white from its designated color over a time interval. In general, the time interval can be set by the user, although two particular time intervals can work well in at least some implementations. In such an implementation, for example, if the fade out occurs over about one (1) second, the resulting view is a good indication of working set size over time. If the fade out is faster (about one-third (⅓) second), the resulting view shows more immediate memory reuse over time.
In addition,
Similarly, the user can select one or more colors 315 corresponding to the frequency option 235. In this case, the animated playback through display window 200 would render the pixels based on the colors 315 chosen in interface 261, so that a user could easily distinguish frequent memory accesses from less frequent memory accesses. This could be yet another way in which a user can narrow in on problem areas, such as by focusing on the memory accesses that are both out of the generalized patterns, and also more frequent.
Furthermore,
In addition to the foregoing,
Similarly,
Accordingly,
For example,
In addition,
Furthermore,
Accordingly,
The embodiments of the present invention may comprise a special purpose or general-purpose computer including various computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer.
By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.
Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.