Embodiments of the invention relate to software programming, and more specifically to identifying serial code regions in applications for parallelization.
Many applications are not threaded to take advantage of processors with multiple cores. Threading allows code to run in parallel to increase efficiency and performance. However, it is often difficult for a programmer to know what area of their code to thread in order to yield the best performance results. Call graphing tools may be used to show the structure of the application at a function level, including the functions that are executed and parent-child relationships. However, a programmer still has to do a significant amount of work to analyze the code and to determine what is causing a lot of time to be spent in a particular function. The best area to thread may be outside of the function that is being executed most or for the most total time, since the function may be called from elsewhere in the code. Therefore, determining the best portion of the code to thread is often a trial and error process that takes a significant amount of time.
The invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
Embodiments of a system and method to identify serial code regions are described. In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
As will be appreciated by those skilled in the art, the content for implementing an embodiment of the method of the invention, for example, computer program instructions, may be provided by any machine-readable media which can store data that is accessible by system 100, as part of or in addition to memory, including but not limited to cartridges, magnetic cassettes, flash memory cards, digital video disks, random access memories (RAMs), read-only memories (ROMs), and the like. In this regard, the system 100 is equipped to communicate with such machine-readable media in a manner well-known in the art.
It will be further appreciated by those skilled in the art that the content for implementing an embodiment of the method of the invention may be provided to the system 100 from any external device capable of storing the content and communicating the content to the system 100. For example, in one embodiment of the invention, the system 100 may be connected to a network, and the content may be stored on any device in the network.
At 206, directives are inserted into the application code to thread one or more of the plurality of loops. In one embodiment, compiler recognizable directives, such as pragmas, are inserted around the one or more loops selected for threading. In one embodiment, the pragmas are OpenMP pragmas. After the directives are inserted, a compiler may be invoked to recompile the application code. A thread checker, which is a software based tool to check for threading errors, may then be launched to run a simulation of the threaded loops. After the simulation, threading errors may be captured. The threading errors may be displayed. The loops may also be prioritized based on the severity of the simulation errors. A list of loops may then be displayed based on feasibility of parallelization.
In one embodiment, an automated process may be used to insert directives into the application code to thread one or more loops, invoke a compiler to recompile the application code, launch a thread checker to run a simulation of multiple threads, capture any threading errors, and report a prioritization of loops to thread based on the simulation errors.
The following is an illustrative example of inserting instrumentation around loops into the application code.
In the above example, each loop is identified and an InstrumentationProlog( ) and InstrumentationEpilog( ) pair is added around the loop. The InstrumentationProlog( ) uniquely identifies the loop and the loop times are recorded by the InstrumentationEpilog( ) section of the instrumentation code. After the instrumented application is run, a list of loops may be displayed accorded to the recorded data and one or more of the loops may be selected for threading.
Thus, embodiments of a system and method to identify serial code regions have been described. While the invention has been described in terms of several embodiments, those of ordinary skill in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.