With the advent of multi-core processor technology, parallel programming has become ubiquitous. However, due to the non-deterministic nature of parallel programs, multiple executions of the same parallel program with the identical input can produce different outcomes.
Memory race recording (MRR) techniques enable the execution of multi-threaded programs to be recorded, thereby logging the order in which memory accesses interleave. The recordings can be replayed for debugging purposes. When replayed, the recordings produce the same results as those obtained by the original execution. Whereas point-to-point MRR techniques track memory access interleavings at the level of individual shared memory instructions, chunk-based techniques track memory access interleavings by observing the number of memory operations that execute atomically (e.g., without interleaving with a conflicting remote memory access).
The concepts described herein are illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not necessarily drawn to scale. Where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.
While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described herein in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.
References in the specification to “one embodiment,” “an embodiment,” “an illustrative embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device).
In the drawings, some structural or method features may be shown in specific arrangements and/or orderings. However, it should be appreciated that such specific arrangements and/or orderings may not be required. Rather, in some embodiments, such features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or method feature in a particular figure is not meant to imply that such feature is required in all embodiments and, in some embodiments, may not be included or may be combined with other features.
Referring now to
The computing device 100 may be embodied as any type of computing device for displaying animated graphical information to a viewer and performing the functions described herein. Although one computing device is shown in
The processor 110 may be embodied as any type of processor currently known or developed in the future and capable of performing the functions described herein. For example, the processor may be embodied as a single or multi-core processor(s), digital signal processor, microcontroller, or other processor or processing/controlling circuit. Similarly, the memory 112 may be embodied as any type of volatile or non-volatile memory or data storage currently known or developed in the future and capable of performing the functions described herein. In operation, the memory 112 may store various data and software used during operation of the system 124 such as operating systems, applications, programs, libraries, and drivers. The memory 112 is communicatively coupled to the processor 110 via the I/O subsystem 114, which may be embodied as circuitry and/or components to facilitate input/output operations with the processor 110, the memory 112, and other components of the computing device 100. For example, the I/O subsystem 114 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.) and/or other components and subsystems to facilitate the input/output operations. In some embodiments, the I/O subsystem 114 may form a portion of a system-on-a-chip (SoC) and be incorporated, along with the processor 110, the memory 112, and other components of the computing device 100, on a single integrated circuit chip.
The data storage 116 may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, or other data storage devices. In the illustrative embodiment, the visualization system 124 and/or the memory race recorder 118 may maintain program execution data 128, including the MRR log files 130, the instruction traces 132, the graphical representation 134, portions thereof and/or other information, in the data storage 116. As discussed in more detail below, the log files 130 and instruction traces 132 may be used to create the graphical representation 134. Portions of the program execution data 128 may be embodied as any type of digital data capable of display on the display 120. For example, portions of the program execution data 128 may be embodied as binary code, machine- or assembly-level code, text, graphics, and/or other types of content. Portions of the program execution data 128 may be stored in digital files, arrays, databases, tables, and/or other suitable data structures.
The memory race recorder 118 may be embodied as any suitable type of system for recording the execution of a multi-threaded software program in a chunk-based fashion. For example, the memory race recorder 118 may be embodied as a hardware or software system, a hardware system implemented in the architecture of the processor 110. The memory race recorder 118 records the execution of the multi-threaded software program 126 for later deterministic replay. The memory race recorder 118 is configured so that when the recorded execution is replayed, it is reproduced in the same way as it was recorded during the original execution. To do this, the memory race recorder 118 records the memory access interleavings across the threads so that during replay, those threads can be re-synchronized in the same way as in the original execution. The memory race recorder 118 logs the order in which the memory accesses interleave.
As noted above, the memory race recorder 118 uses a chunk-based approach to track memory access interleavings by observing the number of memory operations that can execute without the intervention of a conflicting shared memory dependency. A “chunk” represents a block of instructions that execute in isolation; that is, without any interleavings with conflicting memory accesses from another thread. In other words, a chunk captures shared memory accesses that occur between adjacent cache coherence requests that cause a conflict between multiple threads. Shared memory, refers to memory (e.g., random access memory or RAM) that can be accessed by different processors or processor cores, e.g., in a multiple-core processor. A shared memory system often involves the use of cache memory. Cache coherence refers to the need to update the cache memory used by all processors or processor cores whenever one of the caches is updated with information that may be used by other processors or cores. Thus, a “conflict” or “dependency” can occur if for example, a processor needs access to information stored in shared memory but must wait for its cache to be updated with data written to the shared memory by another processor. Further discussion of chunk-based memory race recording can be found in, for example, Pokam et al., Architecting a Chunk-based Memory Race Recorder in Modern CMPs, presented at MICRO '09, Association of Computing Machinery (ACM), Dec. 12-16, 2009.
The display 120 of the computing device 100 may be embodied as any one or more display screens on which information may be displayed to the viewer. The display may be embodied as, or otherwise use, any suitable display technology including, for example, an interactive display (e.g., a touch screen), a liquid crystal display (LCD), a light emitting diode (LED) display, a cathode ray tube (CRT) display, a plasma display, and/or other display technology currently known or developed in the future. Although only a single display 120 is illustrated in
The user controls 122 may be embodied as any one or more physical or virtual controls that can be activated by the viewer to, for example, adjust the display of the graphical representation 134. The user controls 122 may be embodied as any suitable user control technology currently known or developed in the future, including, for example, physical or virtual (e.g., touch screen) keys, keyboard or keypad, a mouse, physical or virtual buttons, switches, slides, dials and the like, as well as non-tactile controls such as voice or gesture-activated controls.
The software program 126 may be embodied as any type of multi-threaded or “parallel” machine-executable software program whose execution can be recorded by the memory race recorder 118. The term “multi-threaded” refers, generally, to a software program that is implemented using a programming technique that allows multiple threads to execute independently, e.g., on different processors or cores, where a “thread” refers to a small sequence of programming instructions and the different threads can access shared memory, regardless of the type of synchronization (e.g., locks, transactional memory, or some other synchronization technique) that is used used. For example, the visualization system 124 can visualize shared memory dependency conflicts and/or synchronization contentions, depending on the type of synchronization that is used. An example of a system for visualizing transactional memory is described in Gottschlich, et al., Visualizing Transactional Memory, presented at PACT '12, Association of Computing Machinery (ACM), Sep. 19-23, 2012.
Referring now to
The instruction traces 132 are used as input to the dynamic replay module 212. The dynamic replay module 212 interfaces with the display 120 and the user controls 122 to create and interactively present the animated graphical representation 134 to the viewer. Referring now to
The real-time controller 310 controls the animated display of the graphical representation 134 based on its associated visualization parameters 340. The visualization parameters 340 may include playback direction, rate, magnification, and/or orientation, for example. That is, rather than viewing all of the program execution data at once, the real-time controller 310 allows the recorded execution to be “played back” in “real time,” at the speed or rate of the original execution. Additionally, the real-time controller 310 can adjust the direction (e.g., forward or backward), magnification, orientation (e.g., rotation), and/or rate or speed at which it replays the original program execution, to allow the viewer to observe events that occur as they unfold, to slow down the playback, to pay greater attention to areas of interest, or to speed up the playback to skip over irrelevant or lesser important areas, for example. As such, the real-time controller 310 interfaces with the user-input controller 316 to process the viewer's requests for changes in the presentation of the animated graphical representation 134.
The real-time controller 310 interfaces with the instruction simulation module 312, to control the display of text corresponding to the instructions executed during the recorded execution, and with the graphical modeler 314, to control the display of the graphical representation 134, in response to input received by the user input controller 316. The user input controller 316 detects activation or deactivation of the user controls 122 and translates those user actions into instructions that can be executed by the real-time controller 310 and the graphical modeler 314, as needed. For instance, if the user input controller 316 detects that the viewer has tapped a “+” graphical control on the display 120, the user input controller 316 may instruct the real-time controller 310 to increase the speed of the playback. Likewise, if the user input controller 316 detects that the user has tapped a magnifying glass icon or made a certain gesture (e.g., moving thumb and forefinger away from each other), the user input controller 316 may instruct the graphical modeler 314 to increase the magnification of the graphical representation 134.
The graphical modeler 314 may be embodied as an animation logic module 320 and a graphics rendering module 322. The animation logic module 320 controls the rate at which the visual features of the graphical representation 134 are presented (e.g., the refresh rate), to provide the animation of the graphical representation 134. For example, in some embodiments, the refresh rate may be in the range of about 50 frames per second or other suitable rate to present the graphical representation 134 in a manner that simulates the original execution in real time. The graphics rendering module 322 initially develops the graphical representation 134 based on the textual information provided by the instruction traces 132, and displays the graphical representation 134 according to the visualization parameters as may be adjusted or updated from time to time by the user input controller 316. The graphics rendering module 322 may apply, e.g., polygon rendering techniques and/or other suitable techniques to display the graphical representation 134 on the display 120.
The graphical representation 134 of the original, recorded execution of the multi-threaded software program 126 is stored in a data structure such as an array, container, table, hash, or combination or plurality thereof. The graphical representation 134 includes data relating to the threads 330 executed during the original execution, the chunks 332 executed by each of the threads 330 and the order in which they were executed, the machine-executable instructions 334 associated with each of the chunks 332, the execution times 336 associated with each of the instructions 334 (which may be absolute or relative values), the visual features 338 associated with each of the threads 330, chunks 332, and instructions 334, and the visualization parameters 340 associated with the graphical representation 134. The visual features 338 may include, for example, different colors associated with the different threads 330. The visual features 338 may also include, for example, graphics, such as shapes, which are associated with each chunk 332. For instance, for a given chunk 332, a visual feature 338 may be defined by the number of instructions 334 in the chunk 332 and/or the total execution time for all of the instructions 334 in the chunk 332. In the illustrative visualizations of
Referring now to
Referring now to
At block 612, the computing device 100 reads the next instruction from the instruction trace 132. The instruction line read at block 612 includes the information about the instruction that the visualization system 124 needs to create the textual and graphical simulations of the instruction, e.g., instruction type, mnemonic string, memory operations and arguments. If the computing device 100 has read the last instruction in the instruction trace 132 (block 614), then at block 616, the computing device 100 adds the information for the last chunk (of which the last instruction is a part) to an active threads array. The active threads array stores the chunk-based information needed for the visualization of the program execution. If the computing device 100 has not reached the end of the file, then at block 618, the computing device 100 checks to see if the currently read instruction line is associated with the currently active thread or a new thread. To do so, the computing device 100 may compare the value of the active thread tracker to the value of the current thread tracker. If the instruction line currently being read is associated with a new thread, then at blocks 620 and 622, the computing device 100 adds the current chunk (e.g., the chunk to which the previously read instruction belongs) to the active threads array, dynamically resizes the threads container as needed for the new thread, initializes the container for the new thread and updates the active thread tracker to indicate that the new thread is now the active thread. The threads container is a data store that holds the data for all of the executed threads. Dynamic resizing of the threads container allows the computing device 100 to handle any number of threads of various sizes, without knowing that information in advance. In other words, in some embodiments, the computing device 100 pares the instruction traces 132 without knowing ahead of time how many threads are involved in the recorded program execution or their sizes. As a result, the computing device 100 only needs to read the instruction traces 132 one time.
Whether the current instruction line involves a new thread or the same thread as the previously-read instruction line, the computing device 100 proceeds from block 618 or block 622, as the case may be, to block 624. At block 624, the computing device 100 processes the instruction to prepare the instruction information needed for the visualization. At block 626, the computing device 100 sets the instruction type and determines the simulated execution time for the instruction based on its instruction type. For example, “load” instructions may be defined as having an execution time that is twice as fast as “store” instructions. Other types of instructions may have the same or similar execution times. In some embodiments, the execution times of the instructions are used to determine the length dimension of the visual features 338, as mentioned above.
At block 628, the computing device 100 sets the instruction pointer value for the current instruction based on the instruction line read from the instruction trace 132. The instruction pointer value is used, in some embodiments, to allow the viewer to, during the visualization, refer back to the actual disassembled binary code (e.g., in a log file 130) that is associated with the instruction line of the instruction trace 132. This may be useful for debugging purposes and/or other reasons. At block 630, the computing device 100 sets the mnemonic string associated with the current instruction, based on the information provided in the instruction trace 132. For instance, whereas the log file 130 may contain a binary representation of the current instruction, the mnemonic is a human-readable equivalent of the binary operand (e.g., “store,” “load,” “jump,” etc.), as may be used in assembly code or source code, for example. The mnemonics can be determined by using a translation table or a standard disassembler utility, which often is provided with the operating system installed on the computing device 100. With all of the foregoing information about the current instruction, the computing device 100 proceeds to insert the instruction information into the data store or container for the current chunk. As noted above, the foregoing information needed for the visualization is arranged by chunk, and then the chunk-based information is stored in the threads array, which serves as input to the visualization process (e.g., the dynamic replay module 212). In some embodiments, the threads array may be stored in or as a portion of the graphical representation 134.
Referring now to
Referring now to
Referring now to
To assist a programmer in analyzing the program 126's correctness, the system 124 can present the programmer with a visualization of the entire program execution (e.g., the view 1000) or a visualization of a specific segmented portion of the execution (e.g., the view 1100). In either case, the programmer can use the visualization to identify shared-memory accesses between the threads as discussed above. If the programmer notices that many chunks exist during a particular segment of the program, the programmer can review the portion of the program code associated with those chunks using, for example, the instruction pointer information described above and/or debug symbols associated with the program execution. The programmer may then determine whether those chunks represent intentional interleavings of the threads or if the program is lacking specific serialization in that segment (where serialization could result in larger serialized chunks). In other words, the system 124 can help the programmer determine whether intended interleavings or the lack thereof have been implemented correctly, or whether such programming techniques have been inadvertently omitted, in addition to identifying performance features such as shared memory dependency conflicts and synchronization contentions.
Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any one or more, and any combination of, the examples described below.
Example 1 includes a visualization system to graphically display performance and correctness features of an execution of a multi-threaded software program on a computing device. The visualization system includes a parser module to prepare program execution data recorded during the execution of the multi-threaded software program for visualization; and a graphical modeler to display an animated graphical representation of the program execution data, where the animated graphical representation highlights one or more of the performance and correctness features. The visualization system also includes a controller module to interactively control the display of the animated graphical representation on a display.
Example 2 includes the subject matter of Example 1, and wherein the parser module prepares instruction traces comprising data relating to instructions executed by the multi-threaded software program during the execution and the threads on which the instructions were executed.
Example 3 includes the subject matter of Example 1 or Example 2, and wherein the parser module reads the program execution data from a plurality of log files generated by a chunk-based memory race recording system during the execution of the multi-threaded software program.
Example 4 includes the subject matter of any of Examples 1-3, wherein the parser module arranges the data according to chunks, and each chunk represents a plurality of instructions executed by the same thread without interleaving with a conflicting memory access.
Example 5 includes the subject matter of Example 4, and wherein the graphical modeler displays a plurality of visual features and each visual feature includes a color representing each chunk such that chunks associated with the same thread are displayed using the same color.
Example 6 includes the subject matter of Example 5, and wherein each instruction in each chunk has an execution time, and each visual feature includes a shape having a size defined by the execution times of the instructions in the chunk.
Example 7 includes the subject matter of Example 6, and wherein the size of the shape is further defined by the number of instructions in the chunk.
Example 8 includes the subject matter of any of Examples 1-7, and wherein the graphical modeler normalizes the size of the animated graphical representation based on the total execution time of the program.
Example 9 includes the subject matter of any of Examples 1-8, and wherein the animated graphical representation highlights a shared memory dependency conflict that occurred during the execution of the multi-threaded software program.
Example 10 includes the subject matter of any of Examples 1-9, and wherein the graphical modeler stores data relating to the animated graphical representation for offline replay of the animated graphical representation.
Example 11 includes the subject matter of Example 10, and wherein the controller module controls the offline replay of the animated graphical representation.
Example 12 includes the subject matter of any of Examples 1-11, and wherein the controller module receives input from a viewer of the animated graphical representation and adjusts the display of the animated graphical representation in response to the input during the display of the animated graphical representation.
Example 13 includes the subject matter of Example 12, and wherein the controller module increases and decreases the speed at which the animated graphical representation is displayed in response to the viewer input during the display of the animated graphical representation.
Example 14 includes the subject matter of Example 12 or Example 13, wherein the controller module changes the magnification of the display of the animated graphical representation in response to the viewer input during the display of the animated graphical representation.
Example 15 includes the subject matter of any of Examples 12-14, wherein the controller module rotates the display of the animated graphical representation in response to the viewer input during the display of the animated graphical representation.
Example 16 includes a method for graphically visualizing performance and correctness features of an execution of a multi-threaded software program on a computing device. The method includes reading program execution data recorded by a chunk-based memory race recording system during the execution of the multi-threaded software program; preparing the program execution data for graphical visualization; displaying an animated graphical representation of the program execution data, the animated graphical representation highlighting one or more of the performance and correctness features; and controlling the display of the animated graphical representation in response to one or more visualization parameters.
Example 17 includes the subject matter of Example 16, and includes arranging the data according to chunks, wherein each chunk represents a plurality of instructions executed by the same thread without interleaving with a conflicting memory access.
Example 18 includes the subject matter of Example 17, and includes displaying a plurality, of visual features relating to the chunks, wherein each visual feature comprises a color representing each chunk such that chunks associated with the same thread are displayed using the same color.
Example 19 includes the subject matter of Example 18, wherein each instruction in each chunk has an execution time and each chunk is associated with a number of instructions, and the method includes defining each visual feature to include a shape having a size defined by the execution times of the instructions in the chunk.
Example 20 includes the subject matter of any of Examples 16-19, and includes configuring the size of the animated graphical representation based on the size of the program execution.
Example 21 includes the subject matter of any of Examples 16-20, and includes highlighting in the animated graphical representation a shared memory dependency conflict that occurred during the execution of the multi-threaded software program.
Example 22 includes the subject matter of any of Examples 16-21, and includes receiving input from a viewer of the animated graphical representation and adjusting the display of the animated graphical representation in response to the input during the display of the animated graphical representation.
Example 23 includes a computing device including a processor; and a memory having stored therein a plurality of instructions that when executed by the processor cause the computing device to perform the method of any of Examples 16-22.
Example 24 includes one or more machine readable storage media including a plurality of instructions stored thereon that in response to being executed result in a computing device performing the method of any of Examples 16-22.
Example 25 includes a system for graphically visualizing performance and correctness features of an execution of a multi-threaded software program on a computing device. The system includes means for reading program execution data recorded by a chunk-based memory race recording system during the execution of the multi-threaded software program; means for preparing the program execution data for graphical visualization; means for displaying an animated graphical representation of the program execution data, the animated graphical representation highlighting one or more of the performance and correctness features; and means for controlling the display of the animated graphical representation in response to one or more visualization parameters.
Example 26 includes a dynamic replay module for a visualization system to graphically visualize an original execution of a multi-threaded software program. The dynamic replay module controls the display of a graphical representation of program execution data recorded during the original execution of the multi-threaded software program. The dynamic replay module includes a graphical modeler to display a plurality of visual features associated with the program execution data on a display according to visualization parameters to simulate the speed of the original execution of the multi-threaded software program. The visual features include a plurality of colors, where each color is associated with a different thread on which instructions of the multi-threaded software program were executed during the original execution. The dynamic replay module also includes a controller module to, during the display of the visual features: receive a requested change to a visualization parameter from a viewer of the display in response to the requested change, update the visualization parameter in accordance with the change; and communicate with the graphical modeler to update the display of the visual features in accordance with the updated visualization parameter.
Example 27 includes the subject matter of Example 26, and wherein the visual features are associated with chunks, and each chunk represents a plurality of instructions executed by the same thread without interleaving with a conflicting memory access.
Example 28 includes the subject matter of Example 27, and wherein each instruction in each chunk has an execution time, and each visual feature comprises a shape having a size defined by the execution times of the instructions in the chunk.
Example 29 includes the subject matter of Example 28, and wherein the size of the shape is further defined by the number of instructions in the chunk.
Example 30 includes the subject matter of any of Examples 26-29, and wherein the visual features indicate a shared memory dependency conflict that occurred during the original execution of the multi-threaded software program.
Example 31 includes the subject matter of any of Examples 26-30, and wherein the controller module increases and decreases the speed at which the visual features are displayed in response to the requested change.
Example 32 includes the subject matter of Example 31, and wherein the controller module changes the magnification of the display of the visual features in response to the requested change.
Example 33 includes a method for controlling the display of a graphical representation of program execution data recorded during an original execution of a multi-threaded software program. The method includes displaying a plurality of visual features of the program execution data on a display according to visualization parameters to simulate the speed of the original execution of the software program, where the visual features include a plurality of colors, and each color is associated with a different thread on which instructions of the multi-threaded software program were executed during the original execution. The method also includes, during the displaying of the visual features, receiving a requested change to a visualization parameter; and in response to the requested change, updating the visualization parameter in accordance with the change; and updating the displaying of the visual features in accordance with the updated visualization parameter.
Example 34 includes the subject matter of Example 33, and includes associating each visual feature with a chunk, wherein each chunk represents a plurality of instructions executed by the same thread without interleaving with a conflicting memory access.
Example 35 includes the subject matter of Example 34, and wherein each instruction in each chunk has an execution time and each visual feature comprises a shape, and the method includes defining the size of the shape based on the execution times of the instructions in the chunk.
Example 36 includes the subject matter of Example 35, and includes defining the size of the shape based on the number of instructions in the chunk.
Example 37 includes the subject matter of any of claims 33-36, and includes increasing and decreasing the speed at which the visual features are displayed in response to the requested change.
Example 38 includes the subject matter of any of claims 33-37, and includes changing the magnification of the display of the visual features in response to the requested change.
Example 39 includes a computing device including: a processor; and a memory having stored therein a plurality of instructions that when executed by the processor cause the computing device to perform the method of any of Examples 33-38.
Example 40 includes one or more machine readable storage media including a plurality of instructions stored thereon that in response to being executed result in a computing device performing the method of any of Examples 33-38.
Example 41 includes a system for controlling the display of a graphical representation of program execution data recorded during an original execution of a multi-threaded software program. The system includes means for displaying a plurality of visual features of the program execution data on a display according to visualization parameters to simulate the speed of the original execution of the software program, where the visual features include a plurality of colors, and each color associated with a different thread on which instructions of the multi-threaded software program were executed during the original execution. The system also includes means for receiving a requested change to a visualization parameter during the displaying of the visual features; means for updating the visualization parameter in response to the requested change; and means for updating the displaying of the visual features in accordance with the updated visualization parameter.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US13/30745 | 3/13/2013 | WO | 00 | 6/25/2013 |