1. Field
The present description relates to pre-processing instruction sequences for parallel execution, and in particular to optimizing the pre-processing to reduce overhead caused by passing messages.
2. Related Art
Many applications which can benefit from parallel hardware, such as multiple processors, multiple cores in one processor and multiple systems clustered together are described using a data-flow, or message passing model. A data flow model allows the individual stages in the data-flow model to execute in parallel. However, processing resources are consumed by the overhead of passing information between the stages of the model.
In systems that are used for developing data-flow applications, the programmer describes the application as a set of actors, each actor works on a separate stage of the application. The stages are connected together through some form of message passing construct, such as a channel or a queue.
Data is sent from one stage of the application to the next through the channels. By breaking the application into multiple stages, the application can be parallelized by allowing each stage to be working on different data concurrently. Each stage can also be duplicated to further increase parallelism.
In such a data flow model using actors, the overhead of passing data or messages between the actors comes in part from the queuing constructs typically used to represent the channels. These queuing constructs are often implemented in memory. As a result, message passing results in extra memory references.
Pre-processing models, such as compilers attempt to optimize the process flow and reduce the overhead of passing messages between the actors. One proposed compiler optimization is to co-locate actors onto the same processor and eliminate the queuing construct for such co-located actors. This limits possible parallel operations. Other compiler optimizations try to optimize out the message passing overhead but cannot be applied when the communications are explicit in the source code.
The various advantages of the embodiments of the present invention will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
A compiler may remove some of the message passing overhead without changing the semantics of the program. Specifically, under certain circumstances, some of the message passing constructs may be implemented with function calls rather than with queues. This can allow the execution of certain applications to be optimized on large-scale multiprocessor or chip multiprocessor systems. According to one embodiment of the invention, a compiler may automatically determine when it is safe to replace an active channel, with a function call, and how it can be done. In such an embodiment, an active channel may be considered to be a channel explicitly referenced by the channel's consuming actor. While embodiments of the present invention are presented in the context of optimizing a compilation of source code for parallel execution, embodiments of the invention may be applied to any automated form of transforming one work into another work.
With current compilers it has only been possible to co-locate actors in which the message passing “get”, or retrieve, constructs are implicit in the application. For example, the following paragraph illustrates an implicit “get” operation using pseudo code for the data-flow actor “C” show in
In the above paragraph, the data, d, arriving from actor B at actor C implicitly appears when the actor C is invoked. By co-locating B and C, no queuing is required and the corresponding overhead and memory accesses are eliminated.
In the paragraph below, the actor C is written with an explicit get operation.
In the above paragraph, C explicitly requests data, d. In this case, the actors B and C cannot be co-located and the overhead from the get and the corresponding put that puts d on the channel cannot be eliminated.
However, according to an embodiment of the present invention, even with explicit get operations, the active channel between the two actors may be replaced with a function call. In one example, the code for an actor is explicitly, or actively, seeking out new data to work on from one or more input channels. This may occur for actors that periodically poll their input channels for data.
When two actors communicating via an active channel are co-located for execution, the active channel of the two co-located actors may safely be replaced with a function call without changing the semantics of the original code. The replacement may be done even if the second of the two co-located actors is actively requesting data rather than passively, or implicitly, receiving the data.
For example, according to an embodiment of the invention, a channel between actors E and F as shown in
Example 1, explicit request for active channel data:
Example 2, active channel replaced by temporary variable
In the second example, the compiler's optimizer has replaced the channel put with an assignment to a temporary variable that may be seen by both stages. It has also replaced the channel get with a call to the previous stage's service function, and has assigned the temporary variable to be the variable that was set to the result of the get. As a result of the replacement, a runtime system, or a compiler, is likely to schedule only F's service function and no queue is necessary. With the original compilation of Example 1, E and F would likely be scheduled when the channel is transformed into a queue.
In the example of
The data flow of the internal representation is subjected to a set of tests as shown in blocks 27, 31, and 33. More or fewer or different tests may be used depending on the particular implementation. In addition, the tests may be performed in a variety of different orders other than the one shown here. The first test is to determine whether the actor has more than one active input channel or queue at block 27. The actor in the present example is a source actor that may be putting data on an active channel. If there is more than one active channel to which the source actor puts data, then the replacement may not be made and the process returns to determine whether there are other actors to evaluate at block 29. For actors that have more than one active channel, it may be possible for a function call to block a channel put or a channel get. In order to avoid starving any of the channels, none of the channels are replaced with a temporary variable.
If the actor passes the first test at block 27, then it is passed to the next test shown in
If the active channel is used by only one source actor, then the process continues to the next test at block 33. In this test, it is determined whether the actor is already consumed. After an active channel has been replaced with a temporary variable, the actor that puts data on that channel is marked as consumed, as shown at block 49. The test at block 33 determines whether the actor already has a function call to a temporary variable. If so, then adding additional function calls to additional temporary variables may make the scheduling too complicated for the data flow program to handle. Again, actors may be starved, i.e. may not be able to access the data when needed, as a result. If the actor is already consumed, then the process returns to find more actors at block 29.
If the actor is not consumed and all of the tests are passed, then the output channel for that actor may be replaced with a function call to a temporary variable. In the example of
At block 41, a check is made to determine if there are any more channel puts from this source actor. If so, then the process returns to block 37 to find the puts and convert them to variable assignments. When there are no more channel puts from this source actor, then the activity on this channel from sink actors is investigated.
At block 43, a channel get function call from the same active channel by a sink actor is found. At block 45 the channel get is replaced with a function call to the source actor's service function and the source actor's assignment to the temporary variable. At block 47, it is determined whether there are any other sink actor channel gets for this active channel. If there are, then the process returns to block 43 to identify these gets and make a replacement. If there are no more channel gets, then the process proceeds to block 49.
At block 49, the source actor for which the replacement was made is marked as consumed. This marking is used in the initial test indicated at block 33. By marking the actor as consumed, function call conflicts may be avoided, as mentioned above. The sink actors are not marked as consumed. Having marked the source actor, the process returns to operate on the remaining actors at block 29. If there are no more actors, then the optimization of
The diagram of
The optimization process of
As can be understood from the description above and from
The MCH may also have an interface, such as a PCI (peripheral component interconnect) Express, or AGP (accelerated graphics port) interface to couple with a graphics controller 341 which, in turn provides graphics and possible audio to a display 337. The PCI Express interface may also be used to couple to other high speed devices. In the example of
The ICH 365 offers possible connectivity to a wide range of different devices. Well-established conventions and protocols may be used for these connections. The connections may include a LAN (Local Area Network) port 369, a USB hub 371, and a local BIOS (Basic Input/Output System) flash memory 373. A SIO (Super Input/Output) port 375 may provide connectivity a keyboard, a mouse, and other I/O devices. The ICH may also provide an IDE (Integrated Device Electronics) bus or SATA (serial advanced technology attachment) bus for connections to disk drives 387, or other large memory devices.
The particular nature of any attached devices may be adapted to the intended use of the device. Any one or more of the devices, buses, or interconnects may be eliminated from this system and others may be added. For example, video may be provided on a PCI bus, on an AGP bus, through the PCI Express bus or through an integrated graphics portion of the host controller.
A lesser or more equipped optimization, process flow, or computer system than the examples described above may be preferred for certain implementations. Therefore, the configuration and ordering of the examples provided above may vary from implementation to implementation depending upon numerous factors, such as the hardware application, price constraints, performance requirements, technological improvements, or other circumstances. Embodiments of the present invention may also be adapted to other types of data flow and software languages than the examples described herein.
Embodiments of the present invention may be provided as a computer program product which may include a machine-readable medium having stored thereon instructions which may be used to program a general purpose computer, mode distribution logic, memory controller or other electronic devices to perform a process. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnet or optical cards, flash memory, or other types of media or machine-readable medium suitable for storing electronic instructions. Moreover, embodiments of the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer or controller to a requesting computer or controller by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
In the description above, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. For example, well-known equivalent components and elements may be substituted in place of those described herein, and similarly, well-known equivalent techniques may be substituted in place of the particular techniques disclosed. In other instances, well-known circuits, structures and techniques have not been shown in detail to avoid obscuring the understanding of this description.
While the embodiments of the invention have been described in terms of several examples, those skilled in the art may recognize that the invention is not limited to the embodiments described, but may be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.