1. Field of the Invention
The present invention generally relates to the multi-threaded processors and more particularly to detecting and managing synchronization between programs in multithreaded systems.
2. Background Description
Semiconductor technology and chip manufacturing advances have resulted in a steady increase of on-chip clock frequencies, the number of transistors on a single chip, e.g., in a state of the art processor or microprocessor. A scalar processor fetches and issues/executes one instruction at a time. Each such instruction operates on scalar data operands. Each such operand is a single or atomic data value or number. Generally, unit performance for a given clocked unit increases linearly with the frequency of switching within it, provided the clocked unit is operating at full capacity. Pipelining within a scalar processor introduces what is known as concurrency, i.e., processing multiple instructions at difference pipeline stages in a given clock cycle, while preserving the single-issue paradigm. A superscalar processor can fetch, issue and execute multiple instructions in a given machine cycle, each in a different execution path or thread. Each instruction fetch, issue and execute path is usually pipelined for further, parallel concurrency. State of the art commercial microprocessors (e.g. Intel's Netburs™ Pentium™ IV or IBM's POWER5™) use a mode of multithreading that is commonly referred to as Simultaneous MultiThreading (SMT). In each processor cycle, a SMT processor simultaneously fetches instructions for different threads that populate the back-end execution resources.
Frequently, multiple programming tasks or threads are concurrently dispatched that require a common resource. Inevitably, some threads compete for the same resource and so, collide. The simplest example of such a collision occurs when multiple threads concurrently attempt to modify of the value of the same field. Occasionally, such a collision can result in what is known in the art as a race condition, where the collision causes a program failure or, even a system failure. The simplest way to avoid collisions and eliminate race conditions is through synchronization.
Synchronization routines are well known in the art for preventing collisions and eliminating potential race conditions. A typical synchronization routine forces threads to execute serially, thus losing much of the advantage of multi-threading. Unfortunately, no state of the art facility is available to programmers for determining which threads will collide and which will not. Although synchronization eliminates race conditions, because it forces serial execution, it eliminates races at the expense of overall program performance and efficiency. Consequently, programmers normally implement synchronization routines only very sparingly. Regardless, however, if the synchronization is necessary because races are inevitable, the programmer must force serial execution and accept the unavoidable efficiency and performance degradation.
For example, very large web based applications perform poorly or fail when unwanted synchronization is present, e.g., due to synchronized calls to database operations and Lightweight Directory Access Protocol (LDAP) servers. Common symptoms of unwanted synchronization include slow, erratic response times and throughput, application hangs and even, entire website outages. Further, these symptoms become apparent at the most inopportune moments, e.g., in a production environment under heavy workload. During such periods, minor hardware network or software component problems can trigger unwanted synchronization to result in more severe problems. Moreover, these problems may be difficult to simulate, e.g., during system test or normal maintenance.
Several such tools are available for detecting unnecessary synchronization in moderately complex programs, e.g., javac, javacup, pizza, jlex. These are typically of size in the order of 104 to 105 lines of source code. Usually unnecessary synchronization was detected via static analysis. Typically, however the only way to identify such unwanted synchronizations in more complex functions is with a set of sophisticated runtime analysis tools, that are both tedious and difficult to use. Further, since these tools are complicated, finding unwanted synchronizations requires a high skill level and is costly, especially in a production environment. Also, most of these tools focus only on reducing the overhead caused by the synchronization itself, i.e., the cost of performing locking and unlocking operations.
Thus, there is need for a tool for identifying potential occurrences of unwanted synchronization in code, especially prior to deploying the code in a production environment.
It is a purpose of the invention to improve the performance of complex multithreaded systems;
It is another purpose of the invention to identify potential sources of thread synchronization;
It is another purpose of the invention to eliminate unnecessary synchronization from multithreaded code.
The present invention is related to a method, system and program product for minimizing unwanted synchronizations in a multithreading program. Program functions in a multithreading program that should not be synchronized are provided as an input and are called “tails.” All possible entry points to the program are computed, and they are called “heads.” An invocation graph is then constructed for the multithreading program, that includes “head nodes” and “tail nodes,” corresponding to the heads and tails, respectively. Synchronization information is collected for each node of the invocation graph. Sources of synchronization in the invocation graph are represented as source nodes. All paths from head nodes to tail nodes through at least one source node are identified.
The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:
FIGS. 3A-B show an example of pseudo-code for identifying synchronized objects for each basic block, each node and each edge of the invocation graph.
Turning now to the drawings, and more particularly,
Thus, unwanted synchronization in multithreaded programs is identified to determine the set of critical synchronization paths for a particular program. The code is examined to identify unwanted synchronization of a prescribed set of tail functions. All possible entry points, or head functions, are found and a determination is made whether any tail functions are in any of the execution paths from the head functions and where in the path threads are executing under the constraints of synchronization.
Beginning in step 104 input tails are manually identified as those functions with potentially time consuming execution, e.g., computationally expensive calls, system calls, or calls to remote resources. Also included as input tails are those functions that may not return from a call because of system or software failure. The input tail functions should not, if possible, be synchronized. One way to identify tail functions is to analyze execution traces of running programs. Typical examples of input tails are functions that create sockets, connect to databases, make Remote Method Invocation (RMI) calls, perform directory lookups, initiate expensive database queries, parse XML documents, or write to local files.
The tails are provided as an input to step 106, where the invocation graph is created as a collection of basic blocks using, for example, a suitable, well known technique. In the invocation graph, an edge from function p to function s means that p calls s. Grove et al., “A framework for call graph construction algorithms,” ACM Transactions on Programming Languages and Systems, 23 (6), November 2001, pp. 685-746, provide a comparison of suitable such techniques. The invocation graph represents code intraprocedural and interprocedural operation, and includes a control flow graph for each invoked function. The control flow graph represents the interprocedural invocation of each succeeding function S from a basic block B within a proceeding function P over edges (P, B, S).
Nodes are categorized as one of three types within the invocation graph, a head node, a source node or a tail node. A tail node represents an input tail and, so, a potentially time consuming function. So, two more types of nodes, head nodes and source nodes, are distinguished from the invocation graph in step 108 in addition to tail nodes. A node function that begins an execution path is designated as a head node. A head node may be a root of the graph or a head node may initiate synchronization relevant actions, e.g., the start of a new thread of execution. Although typically the head nodes are automatically identified during invocation graph construction, optionally the user may further select a subset of the head nodes. The source nodes are derived through analysis and are nodes where synchronization originates. In many monitor-based languages, source nodes arise either from synchronized functions or synchronized blocks of code.
In step 108, various identified synchronizations are analyzed, and some may be eliminated. The synchronization for a given source node may be necessary or unnecessary. A synchronization is unnecessary if removing it does not change program behavior. Otherwise, the synchronization is necessary. For example, if eliminating a synchronization causes a race condition, the synchronization is identified as necessary. Also, synchronizations that significantly degrade performance are identified as unwanted. So, unwanted synchronization may be further categorized as necessary or unnecessary. Unnecessary and unwanted synchronizations can simply be removed. The only option for eliminating necessary and unwanted synchronizations, however, is to restructure the code. Thus, it must be determined if the code could be restructured (perhaps automatically) to make identified necessary and unwanted synchronizations unnecessary.
FIGS. 3A-B show an example of pseudo-code for identifying synchronized objects for each basic block, each node and each edge of the invocation graph in step 108. So, in
In
Advantageously, computer programs that support synchronization of threads of execution may be managed and modified according to the present invention to minimize unwanted synchronization. Synchronization nodes are identified, and a determination is made whether synchronization is in fact necessary, or if the particular program code may be restructured, even automatically, to make each unnecessary. Synchronization of expensive or dangerous thread operations may be avoided or eliminated. Any synchronization found to be unnecessary may simply be eliminated, and necessary synchronization eliminated where appropriate through code restructuring, either manually, e.g., by a user/developer, or by a transformation program, for a significant performance improvement and a more stable system. Furthermore, as described herein, the present invention may be implemented on any suitable typical computer or personal computer (PC), e.g., 105 in
While the invention has been described in terms of preferred embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims. It is intended that all such variations and modifications fall within the scope of the appended claims. Examples and drawings are, accordingly, to be regarded as illustrative rather than restrictive.