Embodiments are generally related to data-processing systems and methods. Embodiments also relate in general to the field of computers and similar technologies, and in particular to software utilized in this field. In addition, embodiments relate to loop nest structures.
A loop is a repetitive sequence of computations in a computer program, commonly defining a CIV (Controlling Induction Variable). The CIV can be initialized to a lower bound before the loop begins and can be then incremented by a fixed value at each loop iteration, and its current value can be tested against an upper bound as a stopping condition for the loop. A collection of loops contained within a single parent loop is called a loop nest structure.
The loop nest structures can be utilized for computations that involve multidimensional arrays such as vectors, matrices, etc., where the loop's CIVs can be utilized for accessing array members. In such computations it can be preferable to unroll the parent loop by a fixed number of iterations called unroll factor and fuse the child loop nests to form a single perfectly nested loop nest. This form of optimization is known as unroll and jam, which improves computation performance by reusing some of the array elements being accessed in subsequent iterations of the parent loop.
Loop unrolling is a well known program transformation utilized by programmers and program optimizers to improve the instruction-level parallelism and register locality and to decrease branching overhead of program loops. Residues form the portion of the loop that cannot be executed when the loop is unrolled by the unroll factor. That is, since the controlling induction variable of the unrolled outer loop is advanced a fixed number of times in every iteration, if the upper bound does not divide evenly by the unroll factor i.e., when there is a remainder or, the modulus of the upper bound of the outer loop induction variable and the unroll factor is not zero, then code must be generated to address the remaining portion of the residue. The code generated to handle these residues may add overhead and inefficiencies that can result in performance degradation.
An exemplary two dimensional nested loop having an outer loop with an induction variable “i” and an inner loop with an induction variable “j” is illustrated below as Nested Loop Source Code Example 1:
The induction variable “i” and “j” of example 1 are both unrolled and jammed by an unroll factor of two utilizing a prior art approach as illustrated in TABLE 1. The program code replicates the original loop nest of Example 1 for each dimension of “i” and “j” being unrolled and then alerts the bounds of the generated nests to cause them to traverse through the residual iterations of the dimension being handled. The program code illustrated in TABLE 1 includes a separate unroll stage and fuse stage for each dimension of “i” and “j” which generally reduces compile-time efficiency and cause performance degradation.
Note that only outer loops can be unrolled-and-jammed. The ‘jamming’ effect discussed above refers to taking the copies of their “child” loops and jamming them together to form a single child loop.
Now the ‘jamming’ (or ‘fusing’) effect, will convert the two j-loops into a single loop that does both statements, and produce:
Now the j-loop can be unrolled if preferred (e.g. by a factor of 2), which would produce (again, ignoring residue):
As one can see, the j-loop is unrolled, but since it does not contain any child loops, there is no ‘jamming’ for that loop. Thus, the “outer loop” with an induction variable “l” is being unrolled and jammed by an unroll factor of two, and the innermost loop with induction variable “j” is being unrolled by a factor of two utilizing the prior art approach discussed above.
Referring to
The iteration space of the residual nest for “i” dimension 310 overlaps the residual iteration space for “j” dimension 320. The overlapping results in a duplicate traversal of the iteration space 300. Unfortunately, this approach does not provide an easy way to deal with the independence of each replica of the original loop nest and the lack of sense of coordination between the generated residual nests. As a result, bounds of more than one dimension need to be altered for each residual nest, even though only one dimension is being handled.
The creation of the residue causes perfect triangular nested loops i.e., nested loops where the inner loop induction variable “j” is bounded on the upper end by the value of the outer loop induction variable “i” to no longer be “perfect”. As a result, other optimization techniques which are only applicable to perfect loop nests cannot be additionally applied. The prior art-and-jam approach depicted in
Therefore, a need exists for an improved method and system for performing an extended unroll-and-jam transformation that can handle imperfect loop nests and loop nests that contain loops with bounds that are linear functions of the CIV of the nested loops.
The following summary is provided to facilitate an understanding of some of the innovative features unique to the present invention and is not intended to be a full description. A full appreciation of the various aspects of the embodiments disclosed herein can be gained by taking the entire specification, claims, drawings, and abstract as a whole.
It is, therefore, one aspect of the present invention to provide for an improved data-processing method, system and computer-usable medium.
It is another aspect of the present invention to provide for a method, system and computer-usable medium for performing efficient unrolling of imperfect loop nests.
The aforementioned aspects and other objectives and advantages can now be achieved as described herein. A computer implemented method, system and computer program product for efficient unrolling of imperfect loop nests. A virtual iteration space can be determined based on an unroll factor (UF) and the iteration space for each dimension of a nested loop can be divided into a residual iteration space and a non-residual iteration space utilizing unroll-and-jam transformation. The non-residual iteration space for one dimension can be utilized for categorizing the residual and non-residual iteration space for next dimension. This approach can be applied recursively to all dimensions and the non-residual iteration from last dimension can be removed in order to get a clean perfect loop nest. This method can also be applied to triangular loop nests and nested loops having three or more dimensions.
The residual iterations can be either traversed at the beginning of the iteration space as a “head residue” or at the end of the iteration space as a “tail residue”. The child loop and an intervening code of an imperfectly nested loop can be replicated and the intervening code can be moved to either the beginning or the end of the loop in order to fuse the child loop into a single child loop nest. The method and system disclosed in greater detail herein results in an efficient compile time direct loop optimization transformation. This method can also be able to handle the imperfect loop nests with an improved overall run-time performance for program execution.
The accompanying figures, in which like reference numerals refer to identical or functionally-similar elements throughout the separate views and which are incorporated in and form a part of the specification, further illustrate the present invention and, together with the detailed description of the invention, serve to explain the principles of the present invention.
The particular values and configurations discussed in these non-limiting examples can be varied and are cited merely to illustrate at least one embodiment and are not intended to limit the scope of such embodiments.
As depicted in
Illustrated in
The following description is presented with respect to embodiments of the present invention, which can be embodied in the context of a data-processing system such as data-processing system 100 and computer software system 150 depicted in
Referring to
An exemplary two dimensional nested loop having an outer loop with an induction variable “i” and an inner loop with an induction variable “j” is illustrated as Nested Loop Source Code Example 1. The source code file can be parsed in order to identify nested loops, as illustrated at block 420. An iteration space for a first dimension of the nested loop can be categorized into a residual iteration space and a non-residual or remaining iteration space by applying unroll-and-jam transformation, as depicted at block 430. The residual iterations can be either traversed at the beginning of the iteration space as “head residue” or at the end of the iteration space as “tail residue”. The “head residue” can be defined as a residual nest, which traverses the beginning of the iteration space whereas the “tail residue” can be defined as a residual nest traversing the indices at the end of the iteration space. For example, consider TABLE 2 below, which illustrates software code after categorizing a dimension “i” of a two-dimension loop into a residual iteration space and a non-residual or a remaining iteration space.
Referring to
The iteration space 500 can be divided into a residual iteration space for “i” dimension 410 and a non-residual or remaining iteration space for “i” dimension 420. The virtual iteration space 500 is dependent upon the unrolling factor (UF). The unroll factor can be determined by a compiler (not shown), user input, or preferably a combination of the two. The remaining iteration space for “i” dimension 420, which are covered by the unroll-and-jam version of the loop, traverses the set of indices for the next dimension “j”. The virtual iteration space 500 can be determined based on the unroll factor (UF) of two. Bracket 510 represents the left hand-side of the graphical representation of residual iteration space 500 depicted in
A test can then be performed as depicted at block 440 to determine whether next dimension has been found in the nested loop. If next dimension is found, then the next dimension of the nested loop can be received, as depicted at block 450. Next, as described at block 460 non-residual iteration space of previous dimension can be utilized in order to categorize next dimension of the nested loop into residual iteration space and non-residual iteration space. For example, the code for categorizing dimension ‘j’ utilizing the non-residual iteration space of dimension “i” is illustrated in Table 3.
Referring to
The non-residual iteration space of the last dimension of the nested loop can be removed, as illustrated at block 470. The residual portions of the loop can be determined and code can be generated in order to form a perfect loop nest, as shown at block 480. The residual iteration space 550 of
The method 400 can also be applied to triangular loop nests and nested loops having three or more dimensions. For example consider TABLE 4 that includes a two-dimensional triangular loop with “i” and “j” dimensions and the diagrammatic view of the residual iteration space is illustrated in
The residual iteration space for dimension “i” can be calculated as illustrated in TABLE 5. The diagrammatic view of a residual iteration space of dimension “i” for the exemplary two-dimensional triangular nested loop is illustrated at
Referring to
The slicing loop as shown in TABLE. 6 can be introduced whenever a dimension triangularly depends on the current dimension being handled. The set of indices covered by dimension “j” can easily be categorized into the required sets such as residual iteration space and remaining iteration space utilizing the slicing loop, as follows:
The method 400 as illustrated in
Referring to
Since the dimension “j” is triangularly dependent on dimension “k”, the remaining iteration space of the dimension “k” can be surrounded by a slicing loop. Thereafter, the dimension “j” can be finally divided into first residual iteration space, second residual iteration space and remaining iteration spaces using a k-slicer. In order to prevent duplicate traversal of iterations, the remaining and second residual iteration space of “j” dimension can be removed from the generated residual loop nests to get a clear perfect loop. The introduction of the induction variable of the k-slicer can allow separate handling of the two residual spaces for a triangular dimension. This allows processing of triangulated dimensions up to any length without any further complexities. An exemplary transformed code generated for a three-dimensional loop is illustrated in TABLE 9.
It should be understood that at least some aspects of the present invention may alternatively be implemented in a computer-useable medium that contains a program product. For example, the process depicted in
It should be understood, therefore, that such signal-bearing media when carrying or encoding computer readable instructions that direct method functions in the present invention, represent alternative embodiments of the present invention. Further, it is understood that the present invention may be implemented by a system having means in the form of hardware, software, or a combination of software and hardware as described herein or their equivalent.
Thus, the method 400 described herein, and in particular as shown and described in
While the present invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention. Furthermore, as used in the specification and the appended claims, the term “computer” or “system” or “computer system” or “computing device” includes any data processing system including, but not limited to, personal computers, servers, workstations, network computers, main frame computers, routers, switches, Personal Digital Assistants (PDA's), telephones, and any other system capable of processing, transmitting, receiving, capturing and/or storing data.
It will be appreciated that variations of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.