Method for optimizing array bounds checks in programs

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to computer programming and, more particularly, to a method for a compiler, programmer or run-time system to transform a program so as to reduce the overhead of determining if an invalid reference to an array element is attempted, while strictly preserving the original semantics of the program.

2. Background Description

The present invention is a method for a compiler, program translation system, run-time system or a programmer to transform a program or stream of instructions in some machine language so as to minimize the number of array reference tests that must be performed while maintaining the exact semantics of the program that existed before the transformation.

A single-dimensional array A is defined by a lower bound lo(A) and an upper bound up(A). A[lo(A):up(A)] represents an array with (up(A)−lo(A)+1) elements. An array element reference is denoted by A[σ], where σ is an index into the array, also called a subscript expression. This subscript expression may be a constant value at run-time, or it may be computed by evaluating an expression which has both constant and variable terms. For the reference A[σ] to be valid, A must represent an existing array, and a must be within the valid range of indices for A: lo(A), lo(A)+1, . . . , up(A). If A does not represent an existing array, we say that A is null. If σ is not within the valid range of indices for A, we say that σ is out of bounds. The purpose of array reference tests is to guarantee that all array references are valid. If an array reference is not valid we say that it produces an array reference violation. We can define lo(A)=0 and up(A)=−1 for null arrays. In this case, out of bounds tests cover all violations. Array references typically (but not exclusively) occur in the body of loops. The loop index variable is often used in subscript expressions within the body of the loop.

The goal of our method and its variants is to produce a program, or segment of one, in which all array reference violations are detected by explicit array reference tests. This is achieved in an efficient manner, performing a reduced number of tests. The ability to perform array reference tests in a program is important for at least three reasons:

1. Accesses to array elements outside the range of valid indices for the array have been used in numerous attacks on computer systems. See S. Garfinkel and G. Spafford,

Practical Unix and Internet Security

, O'Reilly and Associates (1996), and D. Dean, E. Felton and D. Wallach, “Java security: From HotJava to Netscape and beyond”,

Proceedings of the

1996

IEEE Symposium on Security and Privacy

, May 1996.

2. Accesses to array elements outside the range of valid indices for the array are a rich source of programming errors. Often the error does not exhibit itself until long after the invalid reference, making correction of the error a time-consuming and expensive process. The error may never flagrantly exhibit itself, leading to subtle and dangerous errors in computed results.

3. Detection of array reference violations are mandated by the semantics of some programming languages, such as Java™. (Java is a trademark of Sun Microsystems, Inc.).

Naively checking every array reference that occurs has an adverse effect on the execution time of the program. Thus, programs are often run with tests disabled. Therefore, to fully realize the benefits of array reference tests it is necessary that they be done efficiently.

Prior art for detecting an out of bounds array reference or a memory reference through an invalid memory address can be found in U.S. Pat. No. 5,535,329, U.S. Pat. No. 5,335,344, U.S. Pat. No. 5,613,063, and U.S. Pat. No. 5,644,709. These patents give methods for performing bounds tests in programs, but do not give methods for a programmer, a compiler or translation system for a programming language, or a run-time system to reduce the number of tests.

Prior art exists for the narrow goal of simply reducing the run-time overhead of array bounds testing, while changing the semantics of the program in the case where an error occurs. See P. Cousot and R. Cousot, “Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints”,

Conference Record of the

4

th

ACM Symposium on Principles of Programming Languages

, pp. 238-252, January 1977; W. H. Harrison, “Compiler analysis for the value ranges for variables”,

IEEE Transactions on Software Engineering

, SE3(3):243-250, May 1997; P. Cousot and N. Halbwachs, “Automatic discovery of linear restraints among variables in a program”,

conference Record of the

5

th

ACM Symposium on Principles of Programming Languages

”, pp. 84-96, January 1978; P. Cousot and N. Halbwachs, “Automatic proofs of the absence of common runtime errors”,

Conference Record of the

5

th

ACM Symposium on Principles of Programming Languages

, pp. 105-118, January 1978; B. Schwarz, W. Kirchgassner and R. Landwehr, “An optimizer for Ada—design, experience and results”,

Proceedings of the ACM SIGPLAN '

88

Conference on Programming Language Design and Implementation

, pp. 175-185, June 1988; V. Markstein, J. Cocke and P. Markstein, “Elimination of redundant array subscript range checks”,

Proceedings of the ACM SIGPLAN '

82

Conference on Programming Language Design and Implementation

, pp. 114-119, June 1982; R. Gupta, “A fresh look at optimizing array bounds checking”,

Proceedings of the ACM SIGPLAN '

90

Conference on Programming Language Design and Implementation

, pp. 272-282, June 1990; P. Kolte and M. Wolfe, “Elimination of redundant array subscript range checks”,

Proceedings of the ACM SIGPLAN '

95

Conference on Programming Language Design and Implementation

, pp. 270-278, June 1995; J. M. Asuru, “Optimization of array subscript range checks”,

ACM Letters on Programming Languages and Systems

, 1(2):109-118, June 1992; R. Gupta, “Optimizing array bounds checks using flow analysis”,

ACM Letters on Programming Languages and Systems

, 1-4(2):135-150, March-December 1993; and U.S. Pat. No. 4,642,765. These approaches fall into two major groups. In the first group, P. Cousot and R. Cousot, W. H. Harrison, P. Cousot and N. Halbwachs (both citations), and B. Schwarz et al., analysis is performed on the program to determine that an reference A

2

[σ

2

] in a program will lead to an out of bounds array reference only if a previous reference A

1

[σ

1

] also leads to an out of bounds array reference. Thus, if the program is assumed to terminate at the first out of bounds array reference, only reference A

1

[σ

1

] needs to be checked, since reference A

2

[σ

2

] will never actually perform an out of bounds array reference. These techniques are complementary to our method. That is, they can be used in conjunction with our method to provide some benefit, but are not necessary for our method to work, or for our method to provide its benefit.

In the second group, V. Markstein et al., R. Gupta (both citations), P. Kolte et al., J. M. Asuru, and U.S. Pat. No. 4,642,765, bounds tests for array references within loops of a program are optimized. Consider an array reference of the form A[σ] where the subscript expression σ is a linear function in a loop induction variable. Furthermore, the value of σ can be computed prior to the execution of each iteration. A statement is inserted to test this value against the bounds of the array prior to the execution of each iteration. The statement raises an exception if the value of σ in a reference A[σ] is less than lo(A), or is greater than up(A). The transformations in this group can also use techniques from the first group to reduce the complexity of the resulting tests.

The weakness of the first group is that at least one test must remain in the body of any loop whose index variable is used to subscript an array reference. In general, since inner-most loops index arrays, and since the number of iterations greatly exceeds the number of operations within a single iteration, the overhead of bounds testing is linearly proportional to the running time of the program, albeit with a smaller constant term than in the unoptimized program (which tests all references). Second, the methods as described do not work in loops with constructs such as Java's “try/catch” block. If the methods are extended to work with these constructs, the scope over which redundant tests can be found will be reduced, and in general the effectiveness of the transformation will be reduced.

The second group overcomes these weakness, but at the expense of no longer throwing the exception at the precise point in program execution that the invalid reference occurs. For example, the exception for an out of bounds reference that occurs in an iteration of a loop is thrown, after the transformation, before the iteration begins executing. This can make debugging the cause of an invalid access more difficult. (Diagnostic code within the program set up to give debugging information might not execute after the transformation.) Also, for certain programming languages like Java, the resulting program almost certainly does not preserve the original semantics.

Finally, none of the methods in the two groups are thread safe. The methods of the two groups have no concept of actions occurring in other threads that may be changing the size of data objects in the thread whose code is being transformed. Thus, in programs which have multiple threads, the transformations may not catch some violations, and at the same time erroneously detect some non-existent violations.

The methods we describe overcome all of these objections. When our methods are applied to programs amenable to the techniques of the first group, the number of tests that do not result in detecting a violation is less than linearly proportional to the running time of the program. Our methods also handle “try/catch” constructs. Our methods detect when an out of bounds reference is to occur immediately before the reference occurs in the original program semantics. Thus, the state of the program as reflected in external persistent data structures or observable via a debugger is identical to the state in the original program. Our transformations are also thread safe and efficient to execute. Moreover, they expose safe regions of code, code that is guaranteed not perform an invalid reference, to more aggressive compiler optimizations.

Next, we introduce some notation and concepts used in describing the preferred embodiment. We discuss the issues of multi-dimensional arrays and arrays of arrays. We then present our notation for loops, loop bodies, sections of straight-line code, and array references. Finally, we discuss some issues related to loops.

Arrays can be multi-dimensional. A d-dimensional array has d axes, and each axis can be treated as a single-dimensional array. Without loss of generality, the indexing operations along each axis can be treated independently. A particular case of multi-dimensional arrays are rectangular arrays. A rectangular array has uncoupled bounds along each axis. Some programming languages allow ragged arrays, in which the bounds for one axis depend on the index variable for another axis. Ragged arrays are usually implemented as arrays of arrays, instead of true multi-dimensional arrays. For arrays of arrays, each indexing operation can also be treated independently.

A body of code is represented by the letter B. We indicate that a body of code B contains expressions on variables i and j with the notation B(i,j). A body of code can be either a section of straight line code, indicated by S, a loop, indicated by L, or a sequence of these components.

Let L(i,l,u,B(i)) be a loop on index variable i. The range of values for i is [l,l+1, . . . ,u]. If l>u, then the loop is empty (zero iterations). The body of the loop is B(i). This corresponds to the following code:

do i=l,u

B(i)

end do,

which we call a do-loop. Let the body B(i) of the loop contain array references of the form A[σ], where A is a single-dimensional array or an axis of a multi-dimensional array. In general, σ is a function of the loop index: σ=σ(i). If the body B(i) contains ρ references of the form A[σ], we label them A

1

[σ

1

], A

2

[σ

2

], . . . , A

ρ

[σ

ρ

].

In the discussion of the preferred embodiment, all loops have a unit stride. Because loops can be normalized, this is not a restriction. Normalization of a loop produces a loop whose iteration space has a stride of “1”. Loops with positive nonunity strides can be normalized by the transformation

\begin{matrix} do i = l_{i}, u_{i}, s_{i} & do i = 0, ⌊ \frac{u_{i} - l_{i}}{s_{i}} ⌋ \\ B (i) & \Rightarrow & B (l_{i} + {is}_{i}) \\ end do & end do \end{matrix}

A loop with a negative stride can be first transformed into a loop with a positive stride:

\begin{matrix} do i = u_{i}, l_{i}, - s_{i} & do i = l_{i}, u_{i}, s_{i} \\ B (i) & \Rightarrow & B (u_{i} + l_{i} - i) \\ end do & end do \end{matrix}

Loops are often nested within other loops. The nesting can be either perfect, as in

\begin{matrix} do i = l_{i}, u_{i} \\ do j = l_{j}, u_{j} \\ B (i, j) \\ end do \\ end do, \end{matrix}

where all the computation is performed in the body of the inner loop, or not perfect, where multiple bodies can be identified. For example,

\begin{matrix} do i = l_{i}, u_{i} \\ do j = l_{j}, u_{j} \\ B_{1} (i, j) \\ end do \\ do k = l_{k}, u_{k} \\ B_{2} (i, k) \\ end do \\ end do, \end{matrix}

has bodies B

1

(i,j) and B

2

(i,k) where computation is performed.

Finally, we note that standard control-flow and data-flow techniques (see S. Muchnick,

Advanced Compiler Design and Implementation

, Morgan Kaufmann Publishers, 1997) can be used to recognize many “for”, “while”, and “do-while” loops, which occur in Java and C, as do-loops. Many go to loops, occurring in C and Fortran, can be recognized as do-loops as well.

SUMMARY OF THE INVENTION

The present invention provides a method for reducing the number of array reference tests performed during the execution of a program while detecting all invalid references and maintaining the same semantics as the original program. The invention describes several methods for providing this functionality. The general methodology of all variants is to determine regions of a program execution that do not need any explicit tests, and other regions that do need tests. These regions may be lexically identical, and possibly generated only at run-time. (This is the case of loop iterations, some of which may need tests, and some of which may not.) The different methods and variants then describe how to generate code consisting both of sections with explicit tests and sections with no explicit tests. The regions of the program that are guaranteed not to cause violations execute the sections of code without tests. The regions of the program that can cause violations execute the sections of code with at least enough tests to detect the violation. The methods and variants differ in the number of sections that need to be generated, the types of tests that are created in the sections with tests, and the structure of the program amenable to the particular method.

In the most general form, an inspector examines the run-time instruction stream. If an instruction causes an array reference violation, an appropriate test instruction and the original instruction are sent to an executor for execution. The inspector may translate the instructions to a form more suitable for the executor.

The first major variant on the method works by transforming loops in the program. It essentially implements the inspector through program transformations. The transformation can either be performed at the source level by a programmer or preprocessor, or in an intermediate form of the program by a compiler or other automatic translator. For each loop, up to 5

ρ

versions of the loop body are formed, where ρ is the number of array references in the loop that are subscripted by the loop control variable. Each version implements a different combination of array reference tests. A driver loop dynamically selects which version of the loop to use for each iteration. A variant of this method uses compile-time analysis to recognize that some of the versions of the loop will never execute, and therefore these versions of the loop need not be instantiated in the program. Loop nests are handled by recursively applying the transformation to each loop in the nest.

The second major variant on the method also works by transforming loops in the program. The transformation can either be performed at the source level by a programmer or preprocessor, or in an intermediate form of the program by a compiler or other automatic translator. The iteration space of the loop to be transformed is divided into three regions having one of the following properties:

1. all array references using a loop control variable are valid;

2. all iterations whose value of the loop control variable is less than those contained in the first section; and

3. all iterations whose value of the loop control variable is greater than those contained in the first section.

We call the first region the safe region, and it is guaranteed not to generate a violation on those array references involving the loop control variable. The other two regions are unsafe as they may generate violations in those references. Appropriate versions (sections) of code are generated to implement each region. Tests are generated only for code sections implementing unsafe regions of the iteration space. The exact definition of the regions and the form of the code sections implementing them depends on many implementation options. In particular, it is possible to implement this method with only two distinct versions of code. Loop nests are handled by recursively applying the transformation to each loop in the nest. In some situations, it is possible to then hoist and coalesce generated code using standard techniques.

The third major variant of the method is more selective in applying the transformations of the second variant to loop nests. The transformations are applied always in outer to inner loop order. Within a loop that has been divided into regions, they are applied only to the regions with no tests.

The fourth major variant on the method works by transforming loops in the program. The transformation can either be performed at the source level by a programmer or preprocessor, or in an intermediate form of the program by a compiler or other automatic translator. This method can be applied to loop nests where each loop is comprised of

1. a possibly empty section of straight-line code;

2. a possibly empty loop; and

3. a possibly empty section of straight-line code.

The loop nest is divided into multiple regions, where each region either (1) has no invalid array references that use a loop control variable, or (2) has one or more invalid array references that use a loop control variable. A section of code with no array reference tests is created to implement regions of type (1). Another section of code with all necessary array reference tests is created to implement regions of type (2). A driver loop steps through the regions of the iteration space of the original loop, executing code in the generated sections as appropriate.

The fifth major variant extends the concept of versions of code to any sequence of instructions. Given a set of array references in a program, two versions of code are generated: (1) one version precedes the execution of each array reference with all the necessary tests to detect any violation, whereas (2) the other version performs no tests before array references. If any violations may occur during the execution of the set of references, then version (1) is executed. Otherwise, version (2) is executed.

Finally, the sixth major variant introduces the use of speculative execution to allow optimizations to be performed on code in which array reference tests are necessary. Given a set of array references in a program, two versions of code are generated: (1) a version which allows optimizations that version which does not allow these optimizations. The first version is do not preserve the state of the program when violations occur, and (2) a executed first, and its results are written to temporary storage. If no violations occur, then the results of the computation are saved to permanent storage. If array reference violations do occur, then the computation is performed using the second version. Therefore, the state of the program at the time of the violation is precisely preserved.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:

FIG. 1

is a flow chart illustrating the process for transformation of arr instruction stream with array references by an inspector, the inspector adding explicit checks before each array reference in the output stream which is executed by an executor;

FIG. 2

is an enumeration of all the outcomes of an array reference;

FIG. 3

is an enumeration of tests needed to detect an invalid array reference;

FIG. 4

is a schematic diagram showing the relationship between elements in the lattice of primitive tests enumerated in

FIG. 3

;

FIG. 5

is a table showing the relationship between outcomes (enumerated in

FIG. 2

) and tests (enumerated in

FIG. 3

) that test for that outcome;

FIG. 6

is a table which defines the function max(a, b) for two tests a and b;

FIG. 7

is a schematic illustration of the layout of the X array, which contains outcomes of array references;

FIG. 8

is a schematic illustration of the layout of the T array, which contains tests for array references;

FIG. 9

illustrates how to implement a loop with multiple body versions;

FIG. 10

is a schematic illustration of the layout of the vector, which defines the regions for an iteration space;

FIG. 11

is a flow chart of a process to transform loops in a program, the transformation adding the necessary explicit checks before each array reference to detect violations;

FIG. 12

is a flow chart showing a variation of the process in

FIG. 11

where iterations of the loop are grouped into regions;

FIG. 13

illustrates the result of applying the transformation of

FIG. 12

to loop

1301

(

1311

in detailed format), the resulting loop being shown in

1305

(

1314

in detailed format);

FIG. 14

illustrates one alternative to implementing the resulting loop

1305

;

FIG. 15

illustrates another alternative to implementing the resulting loop

1305

;

FIG. 16

shows the application of the transformation of

FIG. 12

to both loops in loop nest

1601

, the resulting structure being shown in

1609

;

FIG. 17

illustrates one alternative to implementing the resulting structure

1609

;

FIG. 18

is a flow chart showing a variation of the process in

FIG. 12

in which only the versions of the loop body actually used during execution of the loop are generated;

FIG. 19

illustrates the process of transformation of a loop structure in a computer program that generates regions with three different characteristics: (i) regions with no array reference violations, (ii) regions that precede regions with characteristic (i), and (iii) regions that succeed regions with characteristic (i);

FIG. 20

illustrates a variant of the process in

FIG. 19

that only generates two types of loop bodies;

FIG. 21

illustrates another variant of the process in

FIG. 19

that only generates two types of loop bodies;

FIG. 22

illustrates the implementation of the process in FIG.

19

through the compile-time generation of multiple instances of the loop;

FIG. 23

illustrates the implementation of the processes in FIG.

20

and FIG.

21

through the compile-time generation of multiple instances of the loop;

FIG. 24

illustrates the process of applying the transformation in

FIG. 19

to a loop nest;

FIG. 25

illustrates the first step in the process of applying the transformation in

FIG. 20

to a loop nest;

FIG. 26

illustrates the second step in the process of applying the transformation in

FIG. 20

to a loop nest;

FIG. 27

illustrates the first step in the process of applying the transformation in

FIG. 21

to a loop test;

FIG. 28

illustrates the second step in the process of applying the transformation in

FIG. 21

to a loop nest;

FIG. 29

illustrates the process of applying the transformation in

FIG. 22

to a loop nest;

FIG. 30

illustrates the process of applying the transformation in

FIG. 23

to a loop nest;

FIG. 31

illustrates the first step of the selective application of the method of

FIG. 19

to a loop nest, the method being applied according to the implementation in

FIG. 14

;

FIG. 32

illustrates the second step of the selective application of the method of

FIG. 19

to a loop nest, the method being applied according to the implementation in

FIG. 14

;

FIG. 33

illustrates the first step of the selective application of the method of

FIG. 19

to a loop nest, the method being applied according to the implementation in

FIG. 15

;

FIG. 34

illustrates the second step of the selective application of the method of

FIG. 19

to a loop nest, the method being applied according to the implementation in

FIG. 15

;

FIG. 35

illustrates the first phase of the process of selectively applying the transformation in

FIG. 20

to a loop nest, the transformation being applied to the outer loop

3502

of

3501

to generate

3509

;

FIG. 36

illustrates the second phase of the process of selectively applying the transformation in

FIG. 20

to a loop nest, the transformation being applied to the inner loop

3615

of

3601

, and the resulting structure being shown in

3622

;

FIG. 37

illustrates the first phase of the process of selectively applying the transformation in

FIG. 21

to a loop nest, the transformation being applied to the outer loop

3702

of

3701

to generate

3709

;

FIG. 38

illustrates the second phase of the process of selectively applying the transformation in

FIG. 21

to a loop nest, the transformation being applied to the inner loop

3812

of

3801

, and the resulting structure being shown in

3818

;

FIG. 39

illustrates the process of selectively applying the transformation in

FIG. 22

to a loop nest

3901

, the intermediate result being shown in

3909

and the final result in

3931

;

FIG. 40

illustrates the process of selectively applying the transformation in

FIG. 23

to a loop nest

4001

, the intermediate result being shown in

4009

and the final result in

4031

;

FIG. 41

illustrates the process of optimizing array reference tests for a perfect loop nest, wherein the driver loop

4112

dynamically instantiates multiple regions of the original loop nest

4101

;

FIG. 42

illustrates the same process of

FIG. 41

when applied to a loop nest

4201

with the following structure, the body of each loop in the loop nest comprising: (i) a possibly empty section of straight-line code, (ii) a possibly empty loop, and (iii) a possibly empty section of straight-line code;

FIG. 43

illustrates an implementation of the process of

FIG. 41

which uses only two instances of the loop body

4306

, the instances being shown in

4317

and

4326

;

FIG. 44

illustrates the same process of

FIG. 43

when applied to a loop nest

4401

, the body of each loop in the loop nest comprising: (i) a possibly empty section of straight-line code, (ii) a possibly empty loop, and (iii) a possibly empty section of straight-line code, this implementation following the patten of

FIG. 15

;

FIG. 45

illustrates an implementation of the same process of

FIG. 44

which uses an if-statement

4519

to select between two different instances of the original loop

4501

, the first instance being

4520

and the second instance being

4536

, this implementation following the pattern of

FIG. 14

;

FIG. 46

illustrates a process to optimize array reference checks in a loop nest

4601

by executing the whole iteration space as a single region, two instances of the original loop nest being created in

4617

, the first instance being represented by loop

4520

and the second instance being represented by loop

4536

;

FIG. 47

illustrates a process to optimize array reference checks in a loop nest

4701

by executing the whole iteration space as a single region, two instances of the original loop nest being created in

4717

, one in

4719

and the other in

4735

, the if-statement

4718

selecting the appropriate instance at run time;

FIG. 48

illustrates the process of generating two versions of code containing a set of array references

4801

, the first version (

4807

-

4812

) containing tests for all array references and the second version (

4814

-

4819

) containing no tests, an if-statement

4806

selecting which version is actually executed; and

FIG. 49

illustrates the process of generating two versions of code containing a set of array references, the first version (

4907

-

4912

) executing the references and computation on those references speculatively, allowing the user or compiler to optimize the code, and if no bounds violations occur, then the code of lines

4922

-

4925

copies the result of the execution into permanent storage, the second version (

4914

-

4920

) executing if an out-of-bounds access occurs during the execution of the first version, optimization and transformations across access checks being disallowed in this version so the access violation is detected precisely, with the state of the program at the time of the violation being preserved.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

Referring now to the drawings, and more particularly to

FIG. 1

, there is shown an embodiment of the invention. A computer program is represented by an input instruction stream

101

. The instruction stream contains instructions from an instruction set. Some of these instructions are explicit array references as shown in

102

,

103

and

104

of

FIG. 1. 104

represents a generic array reference. A

j

is the array or array axis being indexed, and σ

j

is the subscript indexing into A

j

. The instructions are processed by the inspector

105

, that determines the outcome χ

j

of each array reference A

j

[σ

j

]. Possible values for χ

j

are given in FIG.

2

. The inspector

105

then generates an output instruction stream

106

to the executor

113

, which executes the instructions.

For each array reference A

j

[σ

j

] in the input instruction stream

102

, inspector

105

generates a pair of instructions (τ(χ

j

,A

j

[σ

j

]),A

j

[σ

j

]) in

106

, where τ(χ

j

,A

j

[σ]) is a test to detect outcome χ

j

in reference A

j

[σ

j

]. Therefore,

102

,

103

and

104

of

FIG. 1

are transformed into pairs (

107

,

108

), (

109

,

110

), and (

111

,

112

), respectively. The test τ(χ

j

,A

j

[σ

j

]) must identify all violations related to outcome χ

j

of A

j

[σ

j

]. We define five primitive tests, which are shown in FIG.

3

. These primitive tests are ordered, according to the lattice shown in FIG.

4

: An arrow a→b in the lattice indicates that test b identifies at least as many violations as test a. In that case we say that test b is greater than or equal to test a, or that test b covers test a. We use the notation b≧a to denote that test b is greater than or equal to test a.

Each occurrence χ

j

has an associated minimum test τ

min

(χ

j

) that is the smallest test (in the lattice sense) that detects the violation corresponding to this occurrence. The values of χ

j

and the associated minimum test are shown in FIG.

5

. Note that, in terms of correctness of detecting violations, a test a can always be replaced by a test b as long as b≧a. We define max(a,b), the maximum of two tests a and b, according to the table in

FIG. 6. A

sequence of n tests (a

1

,a

2

, . . ,a

n

), for the same array reference, is equivalent to the maximum test max(a

1

, a

2

, . . . , a

n

) of the sequence. This makes it legal to replace a single test a by a sequence of tests (b

1

,b

2

, . . . ,b

m

) as long as max(b

1

, b

2

, . . . , b

m

)≧a. It is also legal to replace a sequence (a

1

, . . . , a

n

) by a sequence (b

1

, . . . , b

m

) whenever max(a

1

, a

2

, . . . , a

n

)≦max(b

1

, b

2

, . . . , b

m

). From now on, we only consider primitive tests. The actual test τ(χ

j

,A

j

[σ

j

]) to be performed before an array reference A

j

[σ

j

] can be any test satisfying

τ(χ

j

,A

j

[σ

j

])≧τ

min

(χ

j

). (1)

Note that it is also valid to use an estimated outcome {tilde over (χ)}

j

instead of the actual outcome χ

j

as long as τ

min

({tilde over (χ)}

j

)≧τ

min

(χ

j

).

As inspector

105

processes the instructions from input instruction stream

101

and generates output instruction stream

106

, it can perform format conversions. In particular, if τ(χ

j

,A

j

[σ

j

])=notest, then no instruction needs to be generated for this test. Also, the A

j

[σ

j

] instructions in output instruction stream

106

can be optimized to execute only valid references, since any violations are detected explicitly by τ(χ

j

,A

j

[σ

j

]). When all of the input instruction stream

101

has been processed by inspector

105

, it signals executor

113

that the inspection phase has ended.

The executor

113

receives the instructions in the output instruction stream

106

and it may either execute them immediately, or store some number of instructions in an accessible storage

114

for execution at a later time. If the executor

113

has not begun executing the instructions in output stream

106

when the inspector

105

signals that it has concluded processing instructions in input instruction stream

101

, the executor

113

begins executing the stored instructions when it receives the aforementioned signal.

As an optimization, if the inspector

105

determines that it has examined all of the static instruction stream, and that the dynamic instruction stream will not change with respect to the number and type of tests needed, the inspector

105

may signal the executor

113

to begin executing instructions out of accessible storage

114

. This optimization can only be performed when the executor

113

is storing the instructions from the inspector

105

.

This method to generate a program execution that explicitly detects invalid array references, in an efficient manner, while preserving the original program semantics constitutes our invention.

Specializing the Method for Loops

Let L(i,l

i

,u

i

,B(i)) be a loop on control variable i and body B(i). The iteration space for this loop consists of the values of i=l

i

, l

i

+1, . . . , u

i

. If l

i

>u

i

, then the loop is empty (i.e., it has zero iterations). Let the body of the loop contain ρ array references of the form A

j

[σ

j

],j=1, . . . , ρ, where A

j

is a single dimensional array or an axis of a multi-dimensional array. The subscript σ

j

is an index into array A

j

, and in general it is a function of i:σ

j

=σ

j

(i). We order the references A

j

[σj] so that A

j

[σ

j

] appears before A

j+1

[σ

j+1

], for all j=1, . . . , ρ−1. The structure of the loop is as follows:

concise form

explicit-references form

do i=l

i

,u

i

A

1

[σ

1

]

do i=l

i

,u

i

A

2

[σ

2

]

B(i)

.

.

.

end do

A

ρ

[σ

ρ

]

end do

We define χ

j

[i] as the outcome of reference A

j

[σ

j

] in iteration i. An outcome represents the result of executing the reference. The possible values for χ

j

[i] are given in FIG.

2

. We can express the outcome χ

j

[i] as a function of the reference A

j

[σ

j

] and the value of the loop index i:

χ

j

[i]=γ(A

j

[σ

j

],i). (2)

We represent by χ[i] the vector describing the ρ outcomes for iteration i. Also, we represent by X[l

i

:u

i

] the matrix of all outcomes for all iterations of loop L(i,l

i

,u

i

,B(i)). X[i]=χ[i] and X[i][j]=χ

j

[i]. The structure of X is shown in FIG.

7

. Again, note that it is valid to use a function γ(A

j

[σ

j

],i) that computes an estimate {tilde over (χ)}

j

[i] of the outcome as long as τ

min

(χ

j

[i])≧τ

min

(χj[i])

We want to identify any violations (null pointers or out of bounds) that happen during execution of reference A

j

[σ

j

]. Therefore, before each reference A

j

[σ

j

] in iteration i, we need to perform a test τ

j

[i] that identifies the appropriate violation. The actual test to be performed before an array reference A

j

[σ

j

] in iteration i can be expressed as a function of the outcomes of array references, the value of j for the particular reference, and the loop index i:

τ

j

[i]=ζ(X,j,i) (3)

where ζ(X,j,i) can be any function satisfying

ζ(X,j,i)≧τ

min

(χ

j

[i]). (4)

In particular, the choice of test for one reference can depend on the outcome of another reference, as in:

\begin{matrix} ζ (X, j, i) = {\begin{matrix} no test & if χ_{k} [i] = OK, \forall k = 1, \dots, ρ, \\ all tests & otherwise . \end{matrix} & (2) \end{matrix}

In this case, for each iteration, either an all tests is performed before each array reference, or no tests are performed.

We represent by τ[i] the vector describing the ρ tests that have to be performed for iteration i. We represent by T[l

i

:u

i

] the matrix of all tests for all iterations of loop L(i,l

i

,u

i

,B(i)). T[i]=τ[i] and T[i][j]=τ

j

[i]. The structure of T is shown in FIG.

8

. The computation of τ

j

[i] for all j and all i can be performed by a function Z that operates on matrix X producing matrix T (the function Z(X) can be seen as a matrix form of ζ(X,j,i)):

T[l

i

:u

i

]=Z(X[l

i

:u

i

]). (6)

We define B

τ[i]

(i) as a version of body B(i) that performs the tests described by τ[i]. B

τ[i]

(i) can be constructed by the transformation:

\begin{matrix} B (i) & B_{τ [i]} (i) \\ \begin{matrix} A_{1} [σ_{1}] \\ A_{2} [σ_{2}] \\ ⋮ \\ A_{ρ} [σ_{ρ}] \end{matrix} & ⟹ & \begin{matrix} perform test τ_{1} [i] on A_{1} [σ_{1}] \\ A_{1} [σ_{1}] \\ perform test τ_{2} [i] on A_{2} [σ_{2}] \\ A_{2} [σ_{2}] \\ ⋮ \\ perform test τ_{ρ} [i] on A_{ρ} [σ_{ρ}] \\ A_{ρ} [σ_{ρ}] \end{matrix} \end{matrix}

The loop L(i,l

i

,u

i

,B(i)) can be transformed into a loop {overscore (L)}(i,l

i

,u

i

,B

96 [i]

(i)) which performs the tests to detect the violations:

The transformed loop {overscore (L)}(i,l

i

,u

i

,B

τ[i]

(i)) can be generated dynamically or statically. Using dynamic code generation, the appropriate versions of the loop body B

τ[i]

(i) can be generated before each iteration. (Versions can be saved and reused.) Using static code generation techniques, the transformed loop

901

in

FIG. 9

can be implemented as shown in

905

.

For reasons of performance, it is appropriate to group together iterations of the iteration space that have similar characteristics. A region of an iteration space is defined as a sequence of consecutive iterations of a loop that perform the same tests. That is, if iterations i

1

and i

2

belong to the same region, then τ[i

1

]=τ[i

2

]. The iteration space of a loop can be divided into n regions, represented by a vector [1:n] of regions, with each region identified by a triple:

[δ]=([δ].l,[δ].u,[δ].τ),δ=1, . . . , n (7)

where [δ].l is the first iteration of region δ, [δ].u is the last iteration of region δ and [δ].τ is the vector of tests for all iterations in region δ. The structure of vector [1:n] is shown in FIG.

10

. Using this partitioning of the iteration space into regions, the loop L(i, l

i

, u

i

, B(i)) can be transformed as follows:

\begin{matrix} do δ = 1, n \\ do i = l_{i}, u_{i} & do i = ℛ [δ] \cdot l, ℛ [δ] \cdot u \\ B (i) & \Rightarrow & B_{ℛ [δ] \cdot τ} (i) \\ end do & end do \\ end do \end{matrix}

A partitioning can have any number of empty regions. A region [δ] is empty if [δ].l>[δ].u. Given a partitioning vector with empty regions, we can form an equivalent partitioning vector with only nonempty regions by removing the empty regions from the first vector. From now on, we consider only partitioning vectors for which every region is nonempty.

For a partitioning vector [1:n] to be valid, the following conditions must hold:

1. The first iteration of region [δ+1] must be the iteration immediately succeeding the last iteration of region [δ] in execution order, for δ=1, . . . , n−1. We can express this requirement as:

[δ+1].l=[δ].u+1. (8)

2. The first iteration of [1] must be the first iteration, in execution order, of the iteration space. We can express this requirement as:

[1].l=l

i

. (9)

3. The last iteration of [n] must be the last iteration, in execution order, of the iteration space. We can express this requirement as:

[n].u=u

i

. (10)

4. The test for array reference A

j

[σ

j

] in region [σ], as defined by the test vector [δ].τ, has to be greater than or equal to the minimum test for each outcome of reference A

j

[σ

j

] in region [δ]. We can express this requirement as:

\begin{matrix} ℛ [δ] \cdot τ_{j} \geq \overset{ℛ [δ] \cdot u}{\max_{i = ℛ [δ] \cdot l}} (τ_{\min} (χ_{j} [i])) . & (11) \end{matrix}

Note that there are many legal partitionings of an iteration space (infinite, if empty regions are used). In particular, given a valid partitioning vector [1:n], it is legal to move the first iteration of a region [δ] to the preceding region [δ−1] provided that:

1. Region 1[δ−1] exists. That is, δ=2, . . . , n.

2. The test vector [δ−1].τ is greater than or equal to the test vector [δ].τ. That is, [δ−1].τ

j

≧[δ].τ

j

, for j=1, . . . , 92 .

Conversely, given a valid partitioning vector [1:n], it is legal to move the last iteration of a region [δ] to the succeeding region [δ+1] provided that:

1. Region [δ+1] exists. That is, δ=1, . . . , n−1.

2

. The test vector [δ+1].τ is greater than or equal to the test vector [δ].τ. That is, [δ].τ

j

≧[δ].τ

j

, for j=1, . . . , σ.

These rules can be applied repeatedly to allow multiple iterations to be moved from one region to another.

The partitioning of most interest is what we call the minimum partitioning, which has the minimum number of regions. This partitioning is obtained when all the nonempty regions [δ] are maximal in size. A region [δ]=([δ].l,[δ].u,[δ].τ) is maximal in size if τ[[δ].l−1]≠[δ].τ and τ[[δ].u+1]≠[δ].τ. That is, if the region can not be enlarged by adding preceding or succeeding iteration, then the region is maximal in size. Note that τ[l

i

−1] and τ[u

i

+1] are undefined and therefore different from any [δ].τ. The minimum partitioning may contain regions that could be shown to be empty at run-time or with extensive compile-time analysis, but cannot be shown empty at compile-time or with naive compile-time analysis.

The vector of tests for array references in a loop execution (represented by matrix T) is defined by function ξ(X,j,i). The matrix T defines which versions of the loop body need to be generated. Depending on when code generation occurs, different levels of information are available about the spaces which form the domain and range of the function ξ(X,j,i) for a particular execution of the loop. The ξ(X,j,i) function is a mapping from the space of outcomes(X), array references, and the loop iteration space into the space of tests (T). Of the inputs, or domain, only the space of array references can generally be precisely known from a naive examination of the source code of the program. Without further analysis of the program, the space of the vector of outcomes is assumed to contain all possible vectors of outcomes, and the integer space forming the loop iteration space is bounded only by language limits on the iteration space.

Standard analytic techniques such as constant propagation (S. Muchnick,

Advanced Compiler Design and Implementation

, Morgan Kaufmann Publishers, 1997) and range analysis (W. Blum and R. Eigenmann, “Symbolic range propagation”,

Proceedings of the

9

th

International Parallel Processing Symposium

, April 1995) can be used to more precisely compute the iteration space that partially forms the domain of the ξ(X,j,i) function. Analysis may also show that some vectors of outcomes are not possible. For example, if the loop lower bound is known, the array lower bound is known, and all terms in array references that do not depend on the loop control variable are known, it may be possible to show, by examining the source code, that no lower bounds violations will be in the vector of outcomes. More aggressive symbolic analysis can be used to show that no consistent solution exists for some combinations of outcomes. For example, it may be possible to show that a lower bound violation of A

1

and an upper bound violation of A

2

are inconsistent for the same value of the loop index variable i. When more complete information is available to the algorithms computing regions, more precise region generation (closer to a minimum partitioning) is possible. More information can be made available by delaying the computation of regions until run-time (using the inspector-executor and other techniques described below) or until load-time (using dynamic or just-in-time compilation) (J. Auslander, M. Philiopose, C. Change, S. Eggers, and B. Bershad, “Fast effective dynamic compilation”,

Procedings of the ACM SIGPLAN '

96

Conference on Programming Language Design and Implementation

, pp. 149-159, Philadelphia, Pa., 21-24 May 1996).

The following algorithm (procedure inspector) shows the implementation of an inspector that computes the minimum partitioning of an iteration space. The inspector collapses the computation of the outcome vector χ[i], the test vector τ[i], and the actual partitioning. For each reference A

j

[σ

j

], the inspector verifies the outcome χ

j

[i] of the reference and immediately computes τ

j

[i]. The particular inspector shown implements ξ(X,j,i)==τ

min

(χ

j

[i]), but any legal ξ(X,j,i) function could be used. After building the vector τ for iteration i, the inspector checks if this τ is different from the test vector {overscore (τ)} of the current region. The inspector either starts a new region or adds iteration i to the current region.

procedure inspector (R,n,l

i

,u

i

,(Aj[σ

j

],j=1, . . . , ρ))

n=0

{overscore (τ)}=undefined

do i=l

i

,u

i

do j=1,ρ

τ

j

=no test

if (A

j

=null) τ

j

=null test

elsif (σ

j

<lo(A

j

))τ

j

=lb test

elsif (σ

j

>up(A

j

))τ

j

=ub test

end if

end do

if (τ={overscore (τ)}) then

R[n].u=i

else

n=n+1

R[n]=(i,i,τ)

{overscore (τ)}=τ

end if

end do

end procedure

An inspector to construct the minimum partitioning of an iteration space.

An embodiment of the invention to optimize array reference tests for a loop L(i,l

i

,u

i

,B(i)) is shown in FIG.

11

. Outcomes for each array reference of a loop are computed in

1101

of FIG.

11

. Tests for each array reference of a loop are computed in

1102

of FIG.

11

. The different versions of the loop body B

τ

(i) are generated in

1103

of

FIG. 11

for all possible values of τ. Execution of loop {overscore (L)}(i,l

i

,u

i

,B

τ[i]

(i)) in

1104

of

FIG. 11

completes the process.

Based solely on the list of primitive tests of

FIG. 3

, there are 5

ρ

different versions of a loop body with ρ array references A

j

[σ

j

]. We arrive at this number because for each A

j

[σ

j

] there are five possible tests (FIG.

3

): no test, null test, lb test, ub test, and all tests. However, the choice of a function ξ(X,j,i) naturally limits the test vectors that can be generated. The particular inspector shown never generates an all tests, and therefore only 4

ρ

different versions of the loop body are possible.

Summarizing, the process of optimizing the array access checks in a loop involves:

1. Using a function γ(A

j

[σ

j

],i) to compute matrix X where X[i][j]=χ

j

[i] is the outcome of reference (A

j

[σ

j

] in iteration i (

1101

in FIG.

11

).

2. Using a function ξ(X,j,i) to compute vector T where T[i][j]=τ

j

[i] is the test to be performed before reference (A

j

[σ

j

] in iteration i (

1102

in FIG.

11

).

3. Generating the necessary versions B

τ

(i) of body B(i) (

1103

in FIG.

11

).

4. Executing the appropriate version B

τ[i]

(i) in each iteration i of loop L(i,l

i

,u

i

,B(i)) (

1104

in FIG.

11

).

The process for optimizing array reference tests using regions is shown in FIG.

12

. The main differences between this process and that of

FIG. 11

are in the computation of regions

1203

and in the execution of loop

1205

. The transformation used to execute the loop in

1205

is shown in FIG.

13

. The original loop

1301

(shown in more detail in

1311

) is transformed into loop

1305

(shown in more detail in

1314

). Driver loop

1315

iterates over the regions, and loop

1316

executes the iterations for each region. Execution of

1314

corresponds to performing

1205

in FIG.

12

. Again, the loop body versions can be generated either dynamically or statically. Static generation can be performed following the patterns shown in FIG.

14

and

FIG. 15

, as explained next.

Let there be m distinct values for the test vectors [1].τ, [2].τ, . . . ,[n].τ. We label these test vectors τ

1

,τ

2

, . . . ,τ

m

. We need to generate m versions of the loop body, B

τ

1

(i),B

τ

2

(i), . . . ,B

τ

m

(i), one for each possible test vector τ. In

FIG. 14

, m different loops of the form

do i=[δ].l,[δ].u

B

τ

k

(i)

end do

are instantiated, one for each version of τ

k

, k=1, . . . , m. These loops are shown in

1410

,

1415

, and

1421

. The loops are guarded by their respective if-statements (

1409

,

1414

, and

1420

) that select only one loop for execution in each iteration of the driver loop

1408

.

An alternative is shown in FIG.

15

. Again, m different loops of the form

do i=l

k

(δ),u

k

(δ)

B

τ

k

(i)

end do

are instantiated, one for each version of τ

k

, k=1, . . . , m. These loops are shown in

1509

,

1512

, and

1516

. To guarantee that only one of the loops executes in each iteration of the driver loop

1508

, we define:

\begin{matrix} l_{k} (δ) = {\begin{matrix} ℛ [δ] \cdot l & if ℛ [δ] \cdot τ = τ^{k} \\ 0 & if ℛ [δ] \cdot τ \neq τ^{k} \end{matrix} & (12) \\ u_{k} (δ) = {\begin{matrix} ℛ [δ] \cdot u & if ℛ [δ] \cdot τ = τ^{k} \\ - 1 & if ℛ [δ] \cdot τ \neq τ^{k} \end{matrix} & (13) \end{matrix}

This method of partitioning the iteration space of a loop into regions and implementing each region with code that tests for at least the array violations that occur in that region constitute our method for optimizing array reference tests during the execution of a loop while preserving the semantics of the loop. It is important to note that this method can be applied to each loop in a program or program fragment independently.

FIG. 16

illustrates the method as applied to both loops

1602

and

1604

of a doubly nested loop structure

1601

. The resulting structure is shown in

1609

.

1

is the regions vector for loop

1602

, and

2

is the regions vector for loop

1604

. Note that the versions S

1

(i

1

) and S

2

(i

1

) in

1609

(

1612

and

1618

) are controlled just by

1

, while the versions of the loop body B(i

1

, i2) (

1615

) is controlled both by

1

and

2

. The actual implementation of

1609

can use any combination of the methods in FIG.

14

and FIG.

15

.

FIG. 17

shows an implementation using the method of

FIG. 14

for both loops.

In some situations, it may be possible to directly compute χ

j

[i] and τ

j

[i] for a reference A

j

[σ

j

] and all values of i. When this is the case, the reference A

j

[σ

j

] can be dropped from procedure inspector, and we can just make τ

j

=τ

j

[i] in each iteration i. When τ

j

[i] can be directly computed for all references A

j

[σ

j

] and all values of i, the inspector becomes unnecessary. Stated differently, the role of the inspector is performed by direct computation.

Direct Computation of Outcomes

Consider an array reference of the form A[ƒ(i)] in the body B(i) of loop L(i,l

i

,u

i

,B(i)). Let the function ƒ(i) be either monotonically increasing or monotonically decreasing in the domain i=l

i

, . . . , u

i

. (We later relax this constraint for some types of functions.) We use the notation lo(A) and up(A) to denote the lower and upper bounds of A respectively. For the method to work in all cases, we define lo(A)=0 and up(A)=−1 when A is null. The safe region of the iteration space of i with respect to an indexing operation A[ƒ(i)] is the set of values of i that make ƒ(i) fall within the bounds of A. That is, for all i in the safe region, the outcome χ[i] of reference A[ƒ(i)] is OK. Therefore, in the safe region we want:

lo(A)≦ƒ(i)≦up(A), ∀i ε safe region.

If ƒ(i) is monotonically increasing, the safe region is defined by:

┌ƒ

−1

(lo(A))┐≦i≦└ƒ

−1

(up(A))┘. (14)

Conversely, if ƒ(i) is monotonically decreasing the safe region is defined by:

┌ƒ

−1

(up(A))┐≦i≦└ƒ

−1

(lo(A))┘. (15)

Equations (14) and (15) define the range of values of i that generate legal array indices for A. However, i must also lie between the loop bounds l

i

and u

i

.

In general, we want:

(ƒ(.),A)≦i≦(ƒ(.),A)

where

\begin{matrix} ℒ (f (.), A) = {\begin{matrix} \max (l_{i}, ⌈ f^{- 1} (lo (A)) ⌉) & if f monotonically increasing, \\ \max (l_{i}, ⌈ f^{- 1} (up (A)) ⌉) & if f monotonically decreasing \end{matrix} and & (16) \\ 𝒰 (f (.), A) = {\begin{matrix} \min (u_{i}, ⌊ f^{- 1} (up (A)) ⌋) & if f monotonically increasing, \\ \min (u_{i}, ⌊ f^{- 1} (lo (A)) ⌋) & if f monotonically decreasing . \end{matrix} & (17) \end{matrix}

(ƒ(.),A) and (ƒ(.),A) are, respectively, the safe lower bound and safe upper bound of the iteration space of i with respect to A[ƒ(i)]. If (ƒ(.),A)>(ƒ(.),A), then the safe region of the iteration space of i with respect to A[ƒ(i)] is empty. If (ƒ(.),A)≦(ƒ(.),A), then these safe bounds partition the iteration space into three regions: (i) the region from l

i

to (ƒ(.),A)−1, (ii) the safe region from (ƒ(.),A) to (ƒ(.),A), and (iii) the region from (ƒ(.),A)+1 to u

i

. Any one of regions (i), (ii), and (iii) could be empty. We can compute the outcome χ[i] of array reference A[ƒ(i)] in each of these regions:

if ƒ(i) is monotonically increasing,

\begin{matrix} χ [i] = {\begin{matrix} < & l_{i} \leq i \leq ℒ (f (.), A) - 1 \\ OK & ℒ (f (.), A) \leq i \leq 𝒰 (f (.), A) \\ > & 𝒰 (f (.), A) + 1 \leq i \leq u_{i} \end{matrix} & (18) \end{matrix}

if ƒ(i) is monotonically decreasing,

\begin{matrix} χ [i] = {\begin{matrix} > & l_{i} \leq i \leq ℒ (f (.), A) - 1 \\ OK & ℒ (f (.), A) \leq i \leq 𝒰 (f (.), A) \\ < & 𝒰 (f (.), A) + 1 \leq i \leq u_{i} \end{matrix} & (19) \end{matrix}

If the body B(i) has ρ indexing operations on i, of the form A

1

[ƒ

1

(i)],A

2

[ƒ

2

(i)], . . . , A

ρ

[ƒ

ρ

(i)], we can compute (ƒ(.),A

j

) and (ƒ

j

(.),A

j

) (1≦i≦ρ) using Equations (16) and (17). From there, we can compute the outcome χ

j

[i] for each reference A

j

[ƒ

j

(i)] and iteration i using Equations (18) and (19). Next, we discuss how to actually compute (ƒ(.),A) and (ƒ(.),A) in the case of some common subscript functions.

Linear Subscripts

In the particular case of a linear subscript function of the form ƒ(i)=ai+b, the inverse function ƒ

−1

can be easily computed:

\begin{matrix} f^{- 1} (i) = \frac{i - b}{a} . & (20) \end{matrix}

Also, the monotonicity of ƒ(i) is determined from the value of a: if a>0, then ƒ(i) is monotonically increasing, and if a<0, then ƒ(i) is monotonically decreasing. Note that the values of a and b need not be known at compile time, since (ƒ(.),A) and (ƒ(.),A) can be efficiently computed at run-time.

Affine Subscripts

Consider the d-dimensional loop nest

do i

1

= l

i

1

,u

i

1

do i

2

= l

i

2

,u

i

2

. . .

do i

d

= l

i

d

,u

i

d

B(i

1

,i

2

, . . . , i

d

)

end do

. . .

end do

end do.

Let there be an array reference A[ƒ(i

1

,i

2

, . . . , i

d

)] in the body of B(i

1

,i

2

, . . . , i

d

). Let the subscript be an affine function of the form ƒ(i

1

,i

2

, . . . , i

d

)=a

1

i

1

+a

2

i

2

+ . . . +a

d

i

d

+b, where i

1

,i

2

, . . . , i

d

are loop index variables, and a

1

,a

2

, . . . , a

d

, b are loop invariants. At the innermost loop (i

d

) the values of i

1

,i

2

, . . . , i

d−1

are fixed, and ƒ(.) can be treated as linear on i

d

. Determination of safe bounds for the i

1

,i

2

, . . . , i

d−1

loops can be done using the inspector/executor method described above. Alternatively, these safe bounds can be approximated. Replacing true safe bounds (ƒ(.),A) and (ƒ(.),A) by approximated safe bounds {overscore ()} and {overscore ()} does not introduce any hazards as long as {overscore ()}≧(ƒ(.),A) and {overscore ()}≦(ƒ(.),A). Techniques for approximating the iteration subspace of a loop that accesses some range of an affinely subscripted array axis are described in S. P. Midkiff, “Computing the local iteration set of block-cyclically distributed reference with affine subscripts”,

Sixth Workshop on Compilers for Parallel Computing

, 1996, and K. van Reeuwijk, W. Denissen, H. J. Sips, and E. M. R. M. Paalvast, “An implementation framework for HPF distributed arrays on message-passing parallel computer systems”,

IEEE Transactions on Parallel and Distributed Systems

, 7(9):897-914, September 1996.

Constant Subscripts

For an array reference A[ƒ(i)] where ƒ(i)=k (a constant), ƒ(i) is neither monotonically increasing nor monotonically decreasing. Nevertheless, we can treat this special case by defining

ℒ (k, A) = {\begin{matrix} l & if (lo (A) \leq k \leq up (A)), \\ \max (l, u + 1) & otherwise \end{matrix}

(k,A)=u.

Basically, the safe region for reference A[k] is either the whole iteration space, if k falls within the bounds of A, or empty otherwise.

Modulo-function Subscripts

Another common form of array reference is A[ƒ(i)] where ƒ(i)=g(i) mod m+ai+b. In general, this is not a monotonic function. However, we know that the values of ƒ(i) are within the range described by ai+b+j, for i=l

i

, l

i

+1, . . . , u

i

and j=0,1, . . . , m−1. We define a function h(i,j)=ai+b+j. Let h

max

be the maximum value of h(i,j) in the domain i=l

i

, l

i

+1, . . . , u

i

and j=0,1, . . . , m−1. Let h

min

be the minimum value of h(i,j) in the same domain. These extreme values of h(i,j) can be computed using the techniques described by Utpal Banerjee, in “Dependence Analysis”,

Loop Transformations for Restructuring Compilers

, Kluwer Academic Publishers, Boston, Mass., 1997. Then we can define

ℒ (f (i), A) = {\begin{matrix} l_{i} & if ((lo (A) \leq h_{\min}) ⩓ (up (A) \geq h_{\max})), \\ \max (l_{i}, u_{i} + 1) & otherwise, \end{matrix}

(ƒ(i),A)=u

i

.

That is, the safe region is the whole iteration space if we can guarantee that ƒ(i) is always within the bounds of A, or empty otherwise.

Known Subscript Range

All the previous functions are particular cases of subscript functions for which we can compute their range of values. If for an array reference A[ƒi)] we know, by some means, that h

min

≦ƒ(i)≦

max

, then we can define

ℒ (f (i), A) = {\begin{matrix} l_{i} & if ((lo (A) \leq h_{\min}) ⩓ (up (A) \geq h_{\max})), \\ \max (l_{i}, u_{i} + 1) & otherwise, \end{matrix}

(ƒ(i),A)=u

i

.

That is, the safe region is the whole iteration space if we can guarantee that ƒ(i) is always within the bounds of A, or empty otherwise.

If the subscript function is not one of the described cases, then the more general inspector/executor method described above should be used to determine the outcomes of an array reference A[σ]. This method also lifts the restriction on function ƒ(.) being monotonically increasing or decreasing.

Application of Direct Computation of Outcomes

Let there be ρ array references A

j

[ρ

i

] in the body B(i) of a loop L(i,l

i

,u

i

,B(i)). Consider the case where all ρ references are of the form A

j

[ƒ

j

(i)],1≦j≦ρ and ƒ

j

(i) is such that we can compute (ƒ

j

(.),A

j

) and (ƒ

j

(.),A

j

) as described above. Therefore, we can directly compute χ

j

[i] for any j and i. Moreover, there are only three possible values for an outcome χ

j

[i]:(<,OK,>). We number these outcomes 0, 1, and 2, respectively. Note that there are 3

ρ

possible values for χ[i]. Two arrays L[1:ρ][0:2] (for lower bounds) and U[1:ρ][0:2] (for upper bounds) are formed. Element L[i][j] contains the upper bound of loop indices that lead to outcome j at reference i. Element U[i][j] contains the upper bound of loop indices that lead to outcome j at reference i. Elements L[i][j] and U[i][j] can be computed using:

\begin{matrix} L [j] [0] = {\begin{matrix} l_{i} & if f (i) monotonically increasing, \\ 𝒰 (f_{j} (.), A_{j}) + 1 & if f (i) monotonically decreasing, \end{matrix} & (21) \\ U [j] [0] = {\begin{matrix} ℒ (f_{j} (.), A_{j}) - 1 & if f (i) monotonically increasing, \\ u_{i} & if f (i) monotonically decreasing, \end{matrix} & (22) \end{matrix}

L[j][1]=(ƒ

j

(.),A

j

), (23)

U[j][1]=(ƒ

j

(.),A

j

), (24)

\begin{matrix} L [j] [2] = {\begin{matrix} 𝒰 (f_{j} (.), A_{j}) + 1 & if f (i) monotonically increasing, \\ l_{i} & if f (i) monotonically decreasing, \end{matrix} & (25) \\ U [j] [2] = {\begin{matrix} u_{i} & if f (i) monotonically increasing, \\ ℒ (f_{j} (.), A_{j}) - 1 & if f (i) monotonically decreasing . \end{matrix} & (26) \end{matrix}

Outcome types on a particular reference divide the iteration space into three disjoint segments (regions). Therefore, outcome types for all ρ references divide the iteration space into 3

ρ

disjoint regions (3

ρ

is the number of possible values for χ[i]). Let ξ(δ) denote the j

th

digit of the radix-3 representation of δ. We can form a vector {overscore ()}[1:3

ρ

] of all possible regions, where {overscore ()}[δ] is the region with outcome ξ

j

(δ) for reference A

j

[ƒ

j

(i)]. The tests performed in this region are the minimum necessary to detect array access violations, according to the outcome. The region {overscore ()}[δ] can be computed as:

\begin{matrix} \overline{ℛ} [δ] \cdot l = \max_{j = 1}^{ρ} (L [j] [ξ_{j} (δ)]), & (27) \\ \overline{ℛ} [δ] \cdot u = \min_{j = 1}^{ρ} (U [j] [ξ_{j} (δ)]), & (28) \end{matrix}

{overscore ()}[δ].τ=(τ

min

(ξ

1

(δ)),τ

min

(ξ

2

(δ)), . . . ,τ

min

(ξ

ρ

(δ))). (29)

Vector {overscore ()} cannot be used directly in

1314

of

FIG. 13

because the regions are not necessarily ordered by increasing iteration number. Therefore, we compute vector by sorting {overscore ()}[δ] on its l field:

[1:3ρ]={overscore ()}[1:3

ρ

] sorted in ascending order of l. (30)

This vector {overscore ()} can now be used in

1314

to execute the loop.

The next method is a variant of the method for optimizing array reference tests during the execution of a loop while preserving the semantics of the loop described above in which only the versions of the loop body B(i) that might actually be executed are generated. In particular, versions of the loop body B(i) are generated only for those regions described in the previous section for which it cannot be shown by data flow and symbolic analysis that {overscore ()}[δ].l>{overscore ()}[δ].u, 1≦δ≦3

ρ

. We use the same notation developed above. An embodiment of the method is shown in FIG.

18

.

FIG. 18

shows the process by which the necessary versions of the loop are generated. Computation of χ

j

[i] (shown in

1801

) and τ

j

[i] (shown in

1802

) for i=l

i

, . . . , u

i

,j=1, . . . ,ρ takes place as in FIG.

12

. The computation of the regions vector [1:n] (shown in

1803

) is done as before, except as noted below. In step

1804

, the set S

τ

of tests for each iteration is computed, and in step

1805

, a loop body B

τ

(i) is generated for each unique τ in set S

τ

. It is the action of steps

1803

,

1804

and

1805

that distinguish this variant from the section entitled “Specializing the Method for Loops” disclosed above. In particular, step

1803

is implemented using data flow and symbolic analysis to more accurately determine non-empty regions. In steps

1804

and

1805

, the regions so identified are generated. Finally, in step

1806

, the transformed version of loop L(i,l

i

,u

i

,B(i)) is executed.

Computing a Safe Region for the Iteration Space

We have shown above how to compute the safe bounds of a loop with respect to an array reference A[ƒ(i)]. If the body B(i) has ρ indexing operations on i, of the form A

1

[ƒ

1

(i)],A

2

[ƒ

2

(i)], . . . , A

ρ

[ƒ

ρ

(i)], we can compute (ƒ

j

(.),A

j

) and (ƒ

j

(.),A

j

)(1≦j≦ρ) using Equations (16) and (17) (or one of the other variations described above). The safe range for i, with respect to all references, is between the maximum of all (ƒ

j

(.),A

j

) and the minimum of all (ƒ

j

(.),A

j

):

\begin{matrix} ℒ^{s} = \max_{j = 1}^{ρ} (ℒ (f_{j} (.), A_{j})), & (31) \\ 𝒰^{s} = \min_{j = 1}^{ρ} (𝒰 (f_{j} (.), A_{j})), & (32) \end{matrix}

where the safe region of the iteration space is defined by i=

s

,

s

+1, . . . ,

s

. For values of i in this region, we are guaranteed that any array reference A

j

[ƒ

j

(i)] will not cause a violation. If

s

>

s

, then this region is empty. Note that the safe region, as defined by

s

and

s

, corresponds to region {overscore ()}[11 . . . 1

3

] defined above:

s

={overscore ()}[11 . . . 1

3

].l (33)

s

={overscore ()}[11 . . . 1

3

].u (34)

To extract loop bounds that we can use in our methods, we define two operators, Ψ

l

and Ψu:

(l

s

,τ

l

)=Ψ

l

(l

i

,u

i

s

,

s

,(ƒ

l

(.), . . . ,ƒ

ρ

(.))),

(u

s

,τ

u

)=Ψ

u

(l

i

,u

i

,

s

,

s

,(ƒ

l

(.), . . . ,ƒ

ρ

(.))),

where l

s

is the lower safe bound and u

s

is the upper safe bound for the entire iteration space of i. We want to generate l

s

and u

s

such that they partition the iteration space into three regions: (i) a region from l to l

s

−1, (ii) a region from l

s

to u

s

, and (iii) a region from u

s

+1 to u. Region (ii) has the property that no run-time checks are necessary. The operators Ψ

i

and Ψ

u

also produce two vectors, τ

l

and τ

u

, with ρ elements each. The element τ

l

(j) defines the type of run-time check that is necessary for reference A

j

[ƒ

j

(i)] in region (i).

Correspondingly, the element τ

u

(j) defines the type of run-time check necessary for reference A

j

[ƒ

j

(i)] in region (iii). We consider the two possible cases for

s

and

s

separately.

s

>

s

: In this case, the safe region of the loop is empty. We select l

s

=u+1 and u

s

=u, thus making region (i) equal to the whole iteration space and the other regions empty. The value of τ

u

is irrelevant, and τ

l

can be selected, for simplicity, to indicate that all checks need to be performed in all references.

s

≦

s

: In this case, there is a nonempty safe region, and we simply make l

s

=

s

and u

s

=

s

. From the computation of the safe bounds, we derive that for values of i less than l

s

, accesses of the form A[ƒ(i)] can cause violations on the lower bound of A, if ƒ(i) is monotonically increasing, and on the upper bound of A, if ƒ(i) is monotonically decreasing. For values of i greater than u

s

accesses of the form A[ƒ(i)] can cause violations on the upper bound of A, if ƒ(i) is monotonically increasing, and on the lower bound of A, if ƒ(i) is monotonically decreasing. The values of τ

l

and τ

u

have to reflect the necessary run-time checks.

Summarizing: We define operators Ψ

l

and Ψ

u

to compute:

\begin{matrix} l^{s} = {\begin{matrix} u + 1 & if ℒ^{s} > 𝒰^{s}, \\ ℒ^{s} & if ℒ^{s} \leq 𝒰^{s} . \end{matrix} & (35) \\ u^{s} = {\begin{matrix} u & if ℒ^{s} > 𝒰^{s}, \\ 𝒰^{s} & if ℒ^{s} \leq 𝒰^{s} . \end{matrix} & (36) \\ τ_{j}^{l} = {\begin{matrix} all tests & if ℒ^{s} > 𝒰^{s} and (ℒ (f_{j} (.), A_{j}) > l_{i} or 𝒰 (f_{j} (.), A_{j}) < u_{i}), \\ no test & if ℒ^{s} > 𝒰^{s} and (ℒ (f_{j} (.), A_{j}) \leq l_{i} and 𝒰 (f_{j} (.), A_{j}) \geq u_{i}), \\ lb test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically increasing and ℒ (f_{j} (.), A_{j}) > l_{i}, \\ no test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically increasing and ℒ (f_{j} (.), A_{j}) \leq l_{i}, \\ ub test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically decreasing and ℒ (f_{j} (.), A_{j}) > l_{i}, \\ no test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically decreasing and ℒ (f_{j} (.), A_{j}) \leq l_{i} . \end{matrix} & (37) \\ τ_{j}^{u} = {\begin{matrix} all tests & if ℒ^{s} > 𝒰^{s} and (ℒ (f_{j} (.), A_{j}) > l_{i} or 𝒰 (f_{j} (.), A_{j}) < u_{i}), \\ no test & if ℒ^{s} > 𝒰^{s} and (ℒ (f_{j} (.), A_{j}) \leq l_{i} and 𝒰 (f_{j} (.), A_{j}) \geq u_{i}), \\ ub test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically increasing and 𝒰 (f_{j} (.), A_{j}) < u_{i}, \\ no test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically increasing and 𝒰 (f_{j} (.), A_{j}) \leq u_{i}, \\ lb test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically decreasing and 𝒰 (f_{j} (.), A_{j}) < u_{i}, \\ no test & if ℒ^{s} \leq 𝒰^{s} and f_{j} (i) monotonically decreasing and 𝒰 (f_{j} (.), A_{j}) \leq u_{i} . \end{matrix} & (38) \end{matrix}

If A

j

is null in a reference A

j

[ƒ

j

(i)], then, according to our definition, lo(A

j

)=0 and up(A

j

)=−1. This, in turn, makes (ƒ

j

(.),A

j

)>(f

j

(.),A

j

) and consequently

s

>

s

. Therefore, if there is a nonempty safe region (i.e.,

s

≦

s

), then necessarily none of the pointers A

j

is null. This safe region needs neither run-time bounds tests nor run-time null pointer tests. In fact, if

s

≦

s

, then null pointer tests are superfluous in the regions preceding and succeeding the safe region.

The method to optimize array reference tests works by partitioning the iteration space into regions of three different types: (i) regions that do not require any tests, (ii) regions which require tests as defined by τ

l

, and (iii) regions which require tests as defined by τ

u

. Of particular interest is the partitioning into exactly three regions, as defined by the following vector [1:3]:

\begin{matrix} \begin{matrix} ℛ [1] \cdot l = l_{i} & ℛ [1] \cdot u = l_{i}^{s} - 1 & ℛ [1] \cdot τ = τ^{l} \\ ℛ [2] \cdot l = l_{i}^{s} & ℛ [2] \cdot u = u_{i}^{s} & ℛ [2] \cdot τ = τ^{false} \\ ℛ [3] \cdot l = u_{i}^{s} + 1 & ℛ [3] \cdot u = u_{i} & ℛ [3] \cdot τ = τ^{u} \end{matrix} & (39) \end{matrix}

where τ

false

is a test vector indicating that no tests are performed (on the ρ references of the form A

j

[ƒ

j

(i)]). Other legal partitioning vectors can be generated by subdividing each of the three regions defined in Equation (39). It is also legal to move the first iteration of [2] to [1] or the last iteration of [2] to [3]. (The moving can be repeated as many times as desired.) Note also that it is perfectly legal to compute τ

l

and τ

u

in any manner that results in test vectors greater than those computed in Equations (37) and (38), respectively. Even τ

false

can be redefined as to include as many tests as desired. However, using exactly the three regions as computed in Equation (39) and τ

l

and τ

u

computed in Equations (37) and (38) has the advantage that the partitioning is minimal for a general loop, and that the tests for each reference are minimal for the corresponding partitioning.

An implementation of the method using the generic transformation of

FIG. 13

is shown in FIG.

19

. Note that only three versions of the loop body are necessary, since [δ]. τ can only assume three values (τ

l

, τ

u

, and τ

false

) as shown in

1911

. For compactness, we use the notation B

l

, B

u

, and B

false

, to denote the three loop body versions B

τ

l

, B

τ

u

, and B

τ

false

, respectively. The explicit implementation of the transformed loop can follow either of the patterns shown in FIG.

14

and FIG.

15

.

We mentioned that τ

l

and τ

u

can be chosen to be any vector greater than as defined in Equations (37) and (38). In particular, we can make [1].τ=[3].τ=τ

true

, where τ

true

is a test vector indicating that an all tests should be performed before each array reference. This leads to a variation of the method that only requires two versions of the loop body: B

τ

false

and B

τ

true

, which for compactness, we represent by B

false

and B

true

, respectively. The transformations representing this variant method are illustrated in

FIGS. 20 and 21

.

FIG. 20

follows the pattern of

FIG. 14

, while

FIG. 21

follows the pattern of FIG.

15

. This variation of the method only uses two versions of code for the loop body.

Because our method and its variant always generate three regions with specific test vectors, they can be implemented by explicitly instantiating three loops with the appropriate loop bounds as defined by the vector. The implementation of the method using three loop body versions (B

l

, B

u

, and B

false

) is shown in

FIG. 22

while the implementation using only two versions (B

true

and B

false

) is shown in FIG.

23

. In

FIG. 22

, loop

2206

implements region [1] of

2201

, loop

2209

implements region [2] of

2201

, and loop

2212

implements region [3] of

2201

. Correspondingly, in

FIG. 23

, loop

2306

implements region [1] of

2301

, loop

2309

implements region [2] of

2301

, and loop

2312

implements region [3] of

2301

.

Applying the Optimizations to Loop Nests

The methods described above can be applied to each and every loop in a program. In particular, they can be applied recursively to all loops in a loop nest. The order of application does not affect the final result. In our discussion we will sometimes show the application of the methods from outermost loop to innermost loop order. At other times, we will show the application of the methods from innermost to outermost loop order.

In

FIG. 24

, a doubly nested original loop is shown in

2401

. The body of the outer i

1

loop (

2402

) consists of a (possibly empty) section of straight-line code S

1

(i

1

) (

2403

), followed by an inner i

2

loop (

2404

), followed by a (possibly empty) section of straight-line code S

2

(i

1

) (

2407

). The body of the inner loop B(i

1

,i

2

) (

2405

) will, in general, contain references to both loop index variables, i

1

and i

2

.

We first apply the transformation to the outer i

1

loop, resulting in the structure shown in

2409

.

1

is the partitioning vector for the i

1

loop. The three test vectors for regions

1

[1],

1

[2], and

1

[3] are τ

1

l

, τ

1

false

, and τ

1

u

, respectively. S

R

1

[δ].τ

(i

1

)

1

, S

R

1

[δ].τ

(i

1

)

2

, and B

R

1

[δ

1

].τ

(i

1

,i

2

) represent S

1

(i

1

), S

2

(i

1

), and B(i

1

,i

2

), respectively, with the array reference tests indicated by

1

[δ

1

].τ.

We then apply the method for optimizing array reference tests within an arbitrary loop structure to the inner loop i

2

, resulting in the structure shown in

2419

.

2

is the partitioning vector for the i

2

loop. The three test vectors for regions

2

[1],

2

[2], and

2

[3] are τ

2

l

, τ

2

false

, and τ

2

u

, respectively. B

R

1

[δ

1

].τ.R

2

[δ

2

].τ

(i

1

,i

2

) represents B(i

1

,i

2

) with the array reference tests indicated by

1

[δ

1

].τ and

2

[δ

2

].τ. The actual implementation of

2419

can follow either of the patterns shown in

FIG. 14

or FIG.

15

.

Note that this recursive application of the method for optimizing array reference tests within an arbitrary loop structure generates three versions of S

1

(i

1

) and S

2

(i

1

) (one for each value of

1

[δ

1

].τ ε (τ

1

l

,τ

1

u

,τ

1

false

)) and nine versions of B(i

1

,i

2

) (one for each combination of

1

[δ

1

].τ ε (τ

1

l

,τ

1

u

,τ

1

false

) and

2

[δ

2

].τε(τ

2

l

,τ

2

u

,τ

2

falsel

). In general, for a d-dimensional loop nest, 3

d

versions of the innermost loop body B(i

1

,i

2

, . . . ,i

d

) are generated by the recursive application of the method for optimnizing array reference tests within an arbitrary loop structure.

FIG. 25

shows the first step in the application of a variation of the method for optimizing array reference tests within an arbitrary loop structure which only uses two versions of code for the loop being optimized. In

FIG. 25

, this variation is applied to a doubly nested loop

2501

. This doubly nested loop has the same structure as

2401

. In this particular case we are using the implementation of the method shown in FIG.

20

.

In this first step, the transformation is applied to the outer i

1

loop

2502

, resulting on the structure shown in

2509

.

1

is the partitioning vector for the i

1

loop. The three test vectors for regions

1

[1],

1

[2], and

1

[3] are ,τ

1

true

, τ

1

false

, and τ

1

true

, respectively. S

true

1

1

(i

1

), S

true

1

2

(i

1

) and B

true

1

(i

1

,i

2

) represent S

1

(i

1

), S

2

(i

1

), and B(i

1

,i

2

), respectively, with the array reference tests indicated by τ

1

true

. Correspondingly, S

false

1

1

(i

1

), S

false

1

2

(i

1

), and B

false

1

(i

1

,i

2

) represent S

1

(i

1

), S

2

(i

1

), and B(i

1

,i

2

), respectively, with the array reference tests indicated by τ

1

false

.

The second step in the application of this variation of the method to the doubly nested loop is shown in FIG.

26

. The result

2509

of the first step is shown again in

2601

for convenience. The resulting structure, after the second step of transformations, is shown in

2622

. The inner i

2

loop

2606

is transformed into loop

2627

, while the inner i

2

loop

2615

is transformed into loop

2645

. For each iteration of the i

1

loop,

2

is the partitioning vector for the i

2

loop. The three test vectors for regions

2

[1],

2

[2], and

2

[3] are τ

2

true

, τ

2

false

, and τ

2

true

, respectively.

Note that there are four versions of B(i

1

,i

2

), one for each combination of

1

[δ

1

].τε (τ

1

true

, τ

1

false

) and

2

[δ

2

].τε (τ

2

true

, τ

2

false

). In general, for a d-dimensional loop nest, 2

d

versions of the innermost loop body B(i

1

,i

2

, . . . ,i

d

) are generated by recursive application of this variation of the method for optimizing array reference tests within an arbitrary loop structure.

This same method can be applied recursively using the implementation shown in FIG.

21

. The first step of this particular case is shown in FIG.

27

. The method is applied to a doubly nested loop

2701

, which has the same structure as

2401

. In the first step, the outer i

1

loop

2702

is transformed into loop

2710

, which iterates over the three regions of the iteration space of

2702

. Iterations of

2711

are only executed for

1

[δ

1

].τ=true. Iterations of

2718

are only executed for

1

[δ

1

].τ=false. This is achieved by setting the loop bounds appropriately:

\begin{matrix} l_{{true}_{1}} (δ_{1}) = {\begin{matrix} ℛ_{1} [δ_{1}] \cdot l & if ℛ_{1} [δ_{1}] \cdot τ = true \\ 0 & if ℛ_{1} [δ_{1}] \cdot τ = false \end{matrix} & (40) \\ l_{{false}_{1}} (δ_{1}) = {\begin{matrix} ℛ_{1} [δ_{1}] \cdot l & if ℛ_{1} [δ_{1}] \cdot τ = false \\ 0 & if ℛ_{1} [δ_{1}] \cdot τ = true \end{matrix} & (41) \\ u_{{true}_{1}} (δ_{1}) = {\begin{matrix} ℛ_{1} [δ_{1}] \cdot u & if ℛ_{1} [δ_{1}] \cdot τ = true \\ - 1 & if ℛ_{1} [δ_{1}] \cdot τ = false \end{matrix} & (42) \\ u_{{false}_{1}} (δ_{1}) = {\begin{matrix} ℛ_{1} [δ_{1}] \cdot u & if ℛ_{1} [δ_{1}] \cdot τ = false \\ - 1 & if ℛ_{1} [δ_{1}] \cdot τ = true \end{matrix} & (43) \end{matrix}

S

true

1

1

(i

1

), S

true

1

2

(i

1

), B

true

1

(i

1

,i

2

), S

false

1

1

(i

1

), S

false

1

2

(i

1

), and B

false

1

(i

1

,i

2

) in

FIG. 27

are exaclty the same as in FIG.

25

.

The result

2709

from the first step of the transformation is replicated in

2801

of

FIG. 28

for convenience.

FIG. 28

shows the second step of the transformation. The inner i

2

loop

2805

is transformed into loop

2822

, while the inner i

2

loop

2812

is transformed into loop

2834

. Each of these loops implements one instance of the partitioning vector

2

. Once again, the bounds for the i

2

loops have to be set appropriately:

\begin{matrix} l_{{true}_{2}} (δ_{2}) = {\begin{matrix} ℛ_{2} [δ_{2}] \cdot l & if ℛ_{2} [δ_{2}] \cdot τ = true \\ 0 & if ℛ_{2} [δ_{2}] \cdot τ = false \end{matrix} & (44) \\ l_{{false}_{2}} (δ_{2}) = {\begin{matrix} ℛ_{2} [δ_{2}] \cdot l & if ℛ_{2} [δ_{2}] \cdot τ = false \\ 0 & if ℛ_{2} [δ_{2}] \cdot τ = true \end{matrix} & (45) \\ u_{{true}_{2}} (δ_{2}) = {\begin{matrix} ℛ_{2} [δ_{2}] \cdot u & if ℛ_{2} [δ_{2}] \cdot τ = true \\ - 1 & if ℛ_{2} [δ_{2}] \cdot τ = false \end{matrix} & (46) \\ u_{{false}_{2}} (δ_{2}) = {\begin{matrix} ℛ_{2} [δ_{2}] \cdot u & if ℛ_{2} [δ_{2}] \cdot τ = false \\ - 1 & if ℛ_{2} [δ_{2}] \cdot τ = true \end{matrix} & (47) \end{matrix}

The method for optimizing array reference tests within an arbitrary loop structure and its variations through explicit instantiation of the regions of the loop iteration space can also be applied recursively. Application of the method of

FIG. 22

to a doubly nested loop is shown in FIG.

29

. The usual original loop is shown in

2901

. We can apply the method to the inner i

2

loop

2904

first, which results in the three loops

2912

,

2915

, and

2918

shown in

2909

. Each of these loops implements one region of the iteration space of the original i

2

loop

2904

. The method is then applied to the outer i

1

loop

2910

, resulting in the three loops

2924

,

2937

, and

2950

. Each of these loops implements one region of the iteration space of the original i

1

loop

2902

. Note that, as previously discussed, there are nine versions of loop body B(i

1

,i

2

), which appear explicitly in

2923

.

Application of the method of

FIG. 23

to a doubly nested loop is shown in FIG.

30

. The usual original loop is shown in

3001

. We can apply the method to the inner i

2

loop

3004

first, which results in the three loops

3012

,

3015

, and

3018

shown in

3009

. Each of these loops implements one region of the iteration space of the original i

2

loop

3004

. The method is then applied to the outer i

1

loop

3010

, resulting in the three loops

3024

,

3037

, and

3050

. Each of these loops implements one region of the iteration space of the original i

1

loop

3002

. Note that, as previously discussed, there are nine versions of loop body B(i

1

,i

2

), which appear explicitly in

3023

.

Selective Application of the Methods to Loop Nests

Instead of applying the foregoing methods recursively to all loops in a loop nest, different strategies can be applied. The application of any of these methods to a loop results in a partitioning of the iteration space of the loop into two or three types of regions (depending on the method). One of the types, characterized by the τ

false

test vector, does not need any array reference tests on references indexed by the loop control variable. Since the other types, characterized by the τ

l

, τ

u

, or τ

true

test vectors, contain tests on at least some array references, the benefits of applying the transformation recursively are diminished. On the other hand, by continuing to apply the transformation to the τ

false

regions we can potentially arrive at regions that are free of all array reference tests. For many programs, these regions with no tests are expected to contain most or all of the iterations of the loop nest.

Therefore, we propose variants to the foregoing methods in which the transformation is recursively applied always from outer loop to inner loop order. The recursion is applied only to those regions without tests on the loop index variable.

Consider first the method for optimizing array reference tests within an arbitrary loop structure as described above. In

FIG. 31

we show the transformation applied to the outer i

1

loop

3102

of a doubly nested loop

3101

. The resulting structure is shown in

3109

, using the implementation of FIG.

14

. We show the implementation explicitly in this case because the next step of the transformation will be applied only to loop

3123

, which belongs to the region with no tests.

3109

is replicated in

3201

of

FIG. 32

for convenience. This figure shows the recursive application of the method to the i

2

loop

3215

. This loop is transformed into loop

3245

, which contains three instances of the i

2

loop in

3247

,

3252

, and

3257

. Each instance implements one of the three types of regions. The overall structure resulting from the selective recursive application of the method is shown in

3231

.

Note that only five versions of the loop body B(i

1

,i

2

) are generated: B

l

1

,true

2

(i

1

i

2

) (which is the same as B

l

1

(i

1

,i

2

), B

false

1

,l

2

(i

1

,i

2

), B

false

1

,false

2

(i

1

,i

2

), B

false

1

,u

2

(i

1

,i

2

) and B

u

1

,true

2

(i

1

,i

2

) (which is the same as B

u

1

(i

1

,i

2

)). In general, for a d-dimensional loop nest, 2d+1 versions of the innermost loop body B(i

1

,i

2

, . . . , i

d

) are generated by the selective recursive application of the method. (If tests are not specified for a loop with control variable i

j

, then all references indexed by i

j

must be fully tested.)

The exact same operation can be performed using the implementation of FIG.

15

. The first step is shown in FIG.

33

. The method is applied to the outer i

1

loop

3302

of the double loop nest

3301

. This results in structure

3309

, replicated in

3401

of

FIG. 34

for convenience. The method is applied again to the inner i

2

loop

3412

, resulting in the final structure

3425

.

Selective recursion can also be applied to the variation of the method which only uses two versions of code for the loop being optimized.

FIG. 35

shows the method applied to the outer i

1

loop

3502

of a doubly nested loop

3501

, using the implementation of FIG.

20

.

FIG. 35

is identical to FIG.

25

. Using selective recursion, the method is applied next only to loop

3523

. Structure

3509

is replicated in

3601

of

FIG. 36

for convenience. Loop

3615

is transformed into loop

3636

, which contains two instances,

3638

and

3643

, of the inner i

2

loop.

Note that three versions of the loop body B(i

1

,i

2

) are generated: B

true

1

,true

2

(i

1

,i

2

) (which is the same as B

true

1

(i

1

,i

2

)), B

false

1

,true

2

(i

1

,i

2

), and B

false

1

,false

2

(i

1

,i

2

). In general, for a d-dimensional loop nest, d+1 versions of the innermost loop body B(i

1

, i

2

, . . . , i

d

) are generated by the selective recursive application of the method.

FIGS. 37 and 38

show the selective recursive application of the method using the implementation of FIG.

21

.

FIG. 37

is identical to FIG.

27

. In it, the method is applied to the outer i

1

loop

3702

of a doubly nested loop

3701

. This results in structure

3709

. Structure

3709

is replicated in

3801

of FIG.

38

. The method is then applied to the i

2

loop

3812

. Loop

3812

is transformed into loop

3829

. Note the same three versions of the loop body B(i

1

,i

2

) in

3818

as in

3622

.

Finally, selective recursion can also be used with the method for optimizing array reference tests within an arbitrary loop structure and its variations through explicit instantiation of the regions of the loop iteration space. Selective recursive application of the method of

FIG. 22

to a doubly nested loop

3901

is shown in FIG.

39

. The method is first applied to the outer i

l

loop

3902

, which results in the three loops

3910

,

3917

and

3924

in

3909

. Loop

3917

implements the region with no array reference tests on the index variable i

1

, as indicated by the S

false

1

1

(i

1

), B

false

1

(i

1

,i

2

), S

false

1

2

(i

1

) versions of code. The method is then applied recursively only to i

2

loop

3919

, which results in the three loops

3941

,

3944

, and

3947

in

3931

. Note the same five versions of the loop body B(i

1

,i

2

) as in

FIGS. 32 and 34

.

Selective recursive application of the method of

FIG. 23

to a doubly nested loop

4001

is shown in FIG.

40

. The method is first applied to the outer i

1

loop

4002

which results in the three loops

4010

,

4017

, and

4024

in

4009

. Loop

4017

implements the region with no array reference tests on the index variable i

1

, as indicated by the S

false

1

1

(i

1

), B

false

1

(i

1

,i

2

), S

false

1

2

(i

1

) versions of code. The method is then applied recursively only to i

2

loop

4019

which results in the three loops

4041

,

4044

, and

4047

in

4031

. Note the same three versions of the loop body B(i

1

,i

2

) as in

FIGS. 36 and 38

.

Methods for Perfect Loop Nests

Consider a d-dimensional perfect loop nest as follows:

do i

l

= l

i

1

,u

i

1

do i

2

= l

i

2

,u

i

2

. . .

do i

d

= l

i

d

,u

i

d

B(i

1

,i

2

,...,i

d

)

end do

. . .

end do

end do

This loop nest defines a d-dimensional iteration space where i

1

, i

2

, . . . , i

d

are the axes of the iteration space. Each iteration, except for the last, has an immediately succeeding iteration in execution order. Conversely, each iteration, except the first, has an immediately preceding iteration in execution order. This d-dimensional iteration space can be partitioned into a sequence of regions defined by a vector [1:n], where

[δ]=(([δ].l

1

, [δ].l

2

, . . . , [δ].l

d

), [δ].u

1

, [δ].u

2

, . . . , [δ].u

d

), [δ].τ),

[δ].l

j

=lower bound of loop i

j

in region [δ],

[δ].u

j

=upper bound of loop i

j

in region [δ],

[δ].τ=test vector for region [δ].

A partitioning can have any number of empty regions, since an empty region does not execute any iterations of the iteration space. Given a generic partitioning vector {overscore ()}[1:m] with (m−n) empty regions, we can form an equivalent partitioning vector [1:n] with only nonempty regions by removing the empty regions from {overscore ()}[1:m]. From now on, we consider only partitionings where every region [δ] is nonempty.

For a partitioning vector to be valid, the following conditions must hold:

1. The first iteration of region [δ+1] must be the iteration immediately succeeding the last iteration of region [δ] in execution order, for δ=1, . . . ,n−1.

2. The first iteration of [δ] must be the first iteration, in execution order, of the iteration space.

3. The last iteration of [n] must be the last iteration, in execution order, of the iteration space.

4. The test for array reference A

j

[σ

j

] in region [δ], as defined by the test vector element [δ].τ

j

, has to be greater than or equal to the minimum test for each outcome of reference A

j

[σ

j

] in region [δ]. We can express this requirement as:

[δ].τ≧

[δ]

max(τ

min

(χ

j

[i

1

,i

2

, . . . ,i

d

])).jk (48)

where the max is computed over all iterations of region [δ] and χ

j

[i

1

,i

2

, . . . , i

d

] is the (possibly estimated) outcome of reference A

j

[σ

j

] in iteration point (i

1

,i

2

, . . . , i

d

).

Using any valid partitioning, a d-dimensional perfect loop nest can be implemented by a driver loop that iterates over the regions. This is shown in

FIG. 41. 4101

is the original perfect loop nest, while

4111

is an implementation of this loop nest with a driver loop

4112

iterating over the regions.

The partitioning also works when the body of each loop i

j

in a d-dimensional loop nest consists of three sections, each possibly empty: (i) a section of straight line code S

j

, followed by (ii) a loop i

j+1

, followed by (iii) another section of straight line code S′

j

. This structure is shown in

4201

of FIG.

42

. The implementation with a driver loop

4218

iterating over the regions is shown in

4217

. The execution of each section of straight line code S

j

and S′

j

is guarded with an if-statement to guarantee that they are executed only at the right times. As a result of the partitioning of the iteration space, the same value for the set (i

1

, i

2

, . . . , i

j

) can occur on consecutive iterations of the driver loop. For correct execution, S

j

should only execute for the first occurrence of a particular value of (i

1

, i

2

, . . . , i

j

). This first occurrence corresponds to the execution of a region with [δ].l

j+1

=l

i

j+1

, [δ].l

j+2

=l

i

j+2

, . . . [δ].l

d

=l

i

d

. Correspondingly, S′

j

should only execute for the last occurrence of a particular value of (i

1

, i

2

, . . . , i

j

). This last occurrence corresponds to the execution of a region with [δ].u

j+1

=u

i

j+1

, [δ].u

j+2

=u

i

j+2

, . . . , [δ].u

d

=u

i

d

. Thus, we arrive at the expressions for the guards used in

4217

:

\begin{matrix} 𝒢_{j} (ℛ [δ]) = \underset{k = j}{\overset{d}{⩓}} (ℛ [δ] \cdot l_{k} = l_{i_{k}}) & (49) \\ 𝒢_{j}^{'} (ℛ [δ]) = \underset{k = j}{\overset{d}{⩓}} (ℛ [δ] \cdot u_{k} = u_{i_{k}}) & (50) \end{matrix}

Finally, the test vector [δ].τ must be applied to the section of straight line code for each region [δ]. We denote by S

j

[δ].τ

a version of S

j

that performs test [δ].τ

j

before reference A

j

[σ

j

], if this reference occurs in S

j

. The same is valid for S′

j

[δ].τ

with respect to S′

j

.

We are particularly interested in computing a partitioning where the regions can be of two kinds:

1. All array references A

j

[σ

j

] in the region execute successfully.

2. Some array reference A

j

[σ

j

] in the region causes a violation. For a region [δ] of kind (1), we can define [δ].τ=false as a test vector that does not test any reference A

j

[σ

j

]. For a region [δ] of kind (2), we can define [δ].τ=true as a test vector that performs all tests on every array reference A

j

[σ

j

]. In this case, we only need two versions of B(i

1

, i

2

, . . . , id): (i) B

true

(i

1

, i

2

, . . . , i

d

) performs all tests for all array references, and (ii) B

false

(i

1

, i

2

, . . . , i

d

) does not perform any tests.

If all references are of the form that allow the safe bounds l

i

j

s

and u

i

j

s

to be computed for every loop i

j

, then the aforementioned partitioning can be computed by the following algorithm:

Procedure for computing the regions of a loop nest.

procedure regions(R,j,(α

1

, α

2

, ..., α

j−1

), (ω

1

, ω

2

, ..., ω

j−1

), δ, d, B

S1 if(l

j

< l

j

s

) then

S2 δ = δ + 1

S3 R[δ] = ((α

1

, ..., α

j−1

, l

j

l

j+1

, ..., l

d

), (ω

1

, ..., ω

j−1

, l

j

s

−1, u

j+1

, ...,

u

d

), true)

S4 endif

S5 if (j = d) then

S6 if (l

d

s

≦ u

d

s

) then

S7 δ = δ + 1

S8 R[δ] = ((α

1

, ..., α

d−1

, l

d

s

), (ω

1

, ..., ω

d−1

, u

d

s

), false)

S9 endif

S10 else

S11 do k = l

j

s

, u

j

s

S12 regions(R,j+1,(α

1

, α

2

, ..., α

j−1

, k), (ω

1

, ω

2

, ..., ω

j−1

, k), δ, d, B

S13 end do

S14 end if

S15 if (u

j

> u

j

s

) then

S16 δ = δ + 1

S17 R[δ] = ((α

1

, α

j−1

, u

j

s

+1, l

j+1

, ..., l

d

), (ω

1

, ..., ω

j−1

, u

j

, u

j+1

, ...,

u

d

), true)

S18 end if

end procedure

The algorithm in procedure regions ( ) takes seven parameters:

1. The vector that is being computed.

2. The index j indicating that region extends along index variable i

j

are being computed.

3. The vector (α

1

,α

2

, . . . ,α

j−1

) where α

k

is the lower bound for loop index i

k

in the regions to be computed.

4. The vector (ω

1

,ω

2

, . . . ,ω

j−1

), where ω

k

is the upper bound for loop index i

k

in the regions to be computed.

5. The count δ of the number of regions already computed.

6. The dimensionality d of the loop nest.

7. The vector [1:d], where [j]=(l

j

, u

j

, l

j

s

, u

j

s

) contains the full and safe bounds for loop i

j

.

To compute the entire vector of regions, the invocation regions (, 1, ( ), ( ), δ=0, d, ) should be performed. The value of δ at the end of the computation is the total number of regions in .

An important optimization. If, for a particular value of j, l

j

s

=l

j

and u

j

s

=u

j

, then the safe region along axis i

j

of the iteration space corresponds to the entire extent of the axis. If l

k

s

=l

k

and u

k

s

=u

k

for k=j+1, . . . , d, then axis i

j

can be partitioned into only three regions: (i) one region from l

j

to l

j

s

−1, (ii) one region from l

j

s

to u

j

s

, and (iii) one region from u

j

s

+1 to u

j

. Each of these regions spans the entire iteration space along axes i

j+1

, i

j+2

, . . . , i

d

. Collapsing multiple regions into a single region reduces the total number of iterations in the driver loop

4218

and, consequently, reduces the run-time overhead of the method. To incorporate this optimization in the computation of regions, procedure regions is modified as follows:

Optimized procedure for computing the regions of a loop nest.

procedure regions(R,j,(α

1

, α

2

, ..., α

j−1

), (ω

1

, ω

2

, ..., ω

j−1

), δ, d, B

if(l

j

< l

j

s

) then

δ = δ + 1

R[δ] = ((α

1

, ..., α

j−1

, l

j+1

, ..., l

d

), (ω

1

, ..., ω

j−1

, l

j

s

−1, u

j+1

, ...,

u

d

), true)

endif

if (j = d) then

if (l

d

s

≦ u

d

s

) then

δ = δ + 1

R[δ] = ((α

1

, ..., α

d−1

, l

d

s

), (ω

1

, ..., ω

d−1

, u

d

s

) false)

endif

elseif (nochecks((l

j+1

, l

j+2

, ..., l

d

), (l

s

j+1

, l

s

j+2

, ..., l

d

s

), (u

j+1

, u

j+2

,

..., u

d

),

(u

s

j+1

, u

s

j+2

, ..., u

d

s

)) then

δ = δ + 1

R[δ] = ((α

1

, ..., α

j−1

, l

j

s

, l

j+1

, ..., l

d

), (ω

1

, ..., ω

j−1

, u

j

s

,

u

j+1

, ..., u

d

), false)

else

do k + l

j

s

, u

j

s

regions(R,j+1,(α

1

, α

2

, ..., α

j−1

, k), (ω

1

, ω

2

, ..., ω

j−1

, k), δ, d, B)

end do

end if

if (u

j

> j

j

s

) then

δ = δ + 1

R[δ] = ((α

1

, ..., α

j−1

, u

j

s

+1, l

j+1

, ..., l

d

), (ω

1

, ..., ω

j−1

, u

j

,

u

j+1

, ..., u

d

), true)

end if

end procedure

boolean function no checks((l

l

, ..., l

m

), (l

l

s

, ..., l

m

s

), (u

l

, ..., u

m

), u

l

s

,

..., u

m

s

))

if (((l

i

= l

i

s

)(u

i

= u

i

s

))∀i=1, ..., m) then

return true

else

return false

end if

end function

We discuss two alternatives for the implementation of the method for optimizing array reference tests within a loop nest using static code generation. If the vector is computed as previously described, only two versions of the loop body need to be generated, B

true

(i

1

, i

2

, . . . , i

d

) and

B

false

(i

1

, i

2

, . . . , i

d

). Also, two versions of S

j

, S

j

true

and S

j

false

and two versions of S′

j

, S′

j

true

and S′

j

false

, need to be generated. Using these two versions of the loop body and two versions of each section of straight line code, a d-dimensional loop nest can be transformed as shown in

FIGS. 43 and 44

. The loop nest without sections of straight line code

4301

is transformed into

4311

. The loop nest with sections of straight line code

4401

is transformed into

4417

. The transformed code in

4311

and

4417

has two instances of the original loop nest. The bounds for each loop in these loop nests can be computed by

\begin{matrix} \begin{matrix} l_{j}^{true} (δ) = ℛ [δ] \cdot l_{j} \\ u_{j}^{true} (δ) = ℛ [δ] \cdot u_{j} \end{matrix}} if (ℛ [δ] \cdot τ = true) & (51) \\ \begin{matrix} l_{j}^{true} (δ) = 0 \\ u_{j}^{true} (δ) = - 1 \end{matrix}} if ℛ [δ] \cdot τ = false) and & (52) \\ \begin{matrix} l_{j}^{false} (δ) = ℛ [δ] \cdot l_{j} \\ u_{j}^{false} (δ) = ℛ [δ] \cdot u_{j} \end{matrix}} if ℛ [δ] \cdot τ = false) & (53) \\ \begin{matrix} l_{j}^{false} (δ) = 0 \\ u_{j}^{false} (δ) = - 1 \end{matrix}} if ℛ [δ] \cdot τ = true) . & (54) \end{matrix}

The guard expressions used in the if-statements in

4417

can be expressed in terms of these bounds:

\begin{matrix} 𝒢_{j} (l) = \underset{k = j}{\overset{d}{⩓}} (l_{k} = l_{i_{k}}) & (55) \\ 𝒢_{j}^{'} (u) = \underset{k = j}{\overset{d}{⩓}} (u_{k} = u_{i_{k}}) & (56) \end{matrix}

Alternatively, the implementation shown in

FIG. 45

can be used. It is semantically equivalent to the method shown in

FIG. 44

, but it does not require computing the new loop bounds l

j

true

(δ), u

j

true

(δ), l

j

false

(δ), and u

j

false

(δ) used in FIG.

44

. Instead, an if-statement

4519

is used to verify which kind of region is region [δ]. If the region requires a test on any array reference, then the loop nest

4520

-

4534

is executed. If the region does not require any tests on array references, then the loop nest

4536

-

4550

is executed.

Using a Single Region

In the extreme case, we can partition the entire iteration space of a d-dimensional loop nest into a single region [1]. This region is defined as:

[1].l

j

=l

i

j

,for j=1, . . . , d, (57)

[1].u

j

=u

i

j

,for j=1, . . . , d, (58)

\begin{matrix} ℛ [1] \cdot τ_{j} = {\begin{matrix} all tests & if any χ_{k} [i_{1}, \dots, i_{d}] is a violation, k = 1, \dots, ρ \\ no test & otherwise . \end{matrix} & (59) \end{matrix}

We use [1].τ=true to denote that all tests must be performed, and [1].τ=false to denote that no tests are to be performed.

Using only two versions of the loop body and each section of straight line code, we transform the loop nest

4601

into

4617

. Note that, since there is only one region, there is no need for a driver loop in

4617

. There are two instances of the loop nest in

4617

, and the bounds are computed by

\begin{matrix} \begin{matrix} l_{j}^{true} = ℛ [1] \cdot l_{j} \\ u_{j}^{true} (δ) = ℛ [1] \cdot u_{j} \end{matrix}} if (ℛ [1] \cdot τ = true) & (60) \\ \begin{matrix} l_{j}^{true} = 0 \\ u_{j}^{true} = - 1 \end{matrix}} if ℛ [1] \cdot τ = false) and & (61) \\ \begin{matrix} l_{j}^{false} = ℛ [1] \cdot l_{j} \\ u_{j}^{false} = ℛ [1] \cdot u_{j} \end{matrix}} if ℛ [1] \cdot τ = false) & (62) \\ \begin{matrix} l_{j}^{false} = 0 \\ u_{j}^{false} = - 1 \end{matrix}} if ℛ [1] \cdot τ = true) . & (63) \end{matrix}

FIG. 47

shows another transformation of the loop nest

4701

into

4717

that implements a partitioning with a single region. A single test

4718

is used in

4717

to verify the kind of region. The appropriate instance of the loop nest is selected based on the test. The value of the flag check is the same as [1].τ. If the safe bounds l

i

j

s

and u

i

j

s

can be computed for every loop i

j

, then check can be computed by

\begin{matrix} check = {\begin{matrix} false & if (l_{i_{j}}^{s} = l_{i_{j}}) and (u_{i_{j}}^{s} = u_{i_{j}}) for all j = 1, \dots, d, \\ true & otherwise . \end{matrix} & (64) \end{matrix}

In general, check=true if the outcome of any array reference A[σ] in the execution of the loop nest is a violation or is unknown, and check=false if all outcomes of array references in the execution of the loop nest are either successful or not executed.

Treating Sequences of Array References with Versions

This approach of dynamically selecting from two versions of a loop (one with all tests and another with no tests) can be extended to any sequence of array references.

Consider the sequence of ρ array references (A

1

[σ

1

], A

2

[σ

2

], . . . , A

ρ

[σ

ρ

]) shown in

4801

of FIG.

48

. This sequence of references can be replaced by the code in

4805

which dynamically selects from two versions of the references. The version in lines

4807

-

4812

performs all tests, while the version in lines

4814

-

4819

does not perform any tests. The selection is based on the value of check, which can be computed as

\begin{matrix} check = {\begin{matrix} false & if lo (A_{j}) \leq σ_{j} \leq up (A_{j}) for all j = 1, \dots, ρ, \\ true & otherwise . \end{matrix} & (65) \end{matrix}

In general, the evaluation of lo(A

j

)≦σ

j

and σ

j

≦up(A

j

) requires symbolic computation and/or run-time evaluation. For loops in particular, it is necessary to represent the range of values that σ

j

can take. This representation can be in the form of symbolic lower and upper bounds of the range.

This method can be applied to any body of code for which lo(A

j

)≦σ

j

and σ

j

≦up(A

j

) can be evaluated. This body of code can be a loop, a section of straight-line code, a whole procedure, or even an entire program or program module. In the worst case, if the value of lo(A

j

)≦σ

j

or σ

j

≦up(A

j

) cannot be determined, then a conservative guess has to be made in either comparison, which will cause check to evaluate to true, and the version with all tests to be selected. Note that the selected sequence of references can contain any subset of the actual sequence present in a body of code. Array references left out of the sequence can be treated as a special form of reference that includes tests. These references with tests appear in both versions of code.

Speculative Execution

Quite often, explicit tests for the validity of array references impose an additional performance penalty by constraining some compiler or programmer optimizations. On some systems, the cost associated with these optimization constraints is actually higher than the cost of performing the tests themselves. The cost of these constraints can be minimized by speculatively executing the loop. In speculative execution, two versions of the code are formed. The first contains tests to determine if an invalid array reference occurs, but does not insist that the violations be detected precisely. This allows many optimizations that depend on code motion to occur in this first version. (See S. Muchnick,

Advanced Compiler Design and Implementation

, Morgan Kaufmann Publishers, 1997.) The results of the speculative execution are saved in temporary storage until it is determined that the speculative execution finished without any invalid references occurring. At that point, the results are saved to permanent storage. If an invalid array reference occurs or any other exception is detected, the results in temporary storage are discarded, and the computation is performed again in precise order.

Let S be a sequence of code containing one or more array references. The transformation of this code to support speculative execution is shown in

FIG. 49

, where S is represented by

4901

. The transformation proceeds as follows. First, two versions of S are generated, with an imprecisely ordered version S′ in

4907

-

4912

, and a precisely ordered version {overscore (S)} in

4914

-

4920

. Tests are placed into the precisely checked version as described in previous methods (

4914

,

4916

, and

4919

).

The placement of tests in the imprecisely ordered version is constrained only by the data flow of the program. That is, a the test cannot be done before the array is created or the subscript value can been computed. These tests can be optimized using the methods discussed in the Background of the Invention. When the tests show that an invalid array reference occurs, a flag is set to true. (See, for example,

4906

.) This flag indicates that a problem occurred during the execution of the imprecise version and that the precise version should be run. Because array reference violations are not necessarily being detected at the time they occur (either the computation or the tests may have been moved by optimizations), the effect of the computation cannot be allowed to be observed outside of S. Thus, each reference to an array A

j

possibly written to in S′ is replaced by a reference to a corresponding auxiliary array A′

j

(lines

4907

,

4909

,

4912

). An array A′

j

is initialized to the value of A

j

before execution of S′ begins.

If flag is set to true at the end of execution of S′, then a violation occurred, and the computation represented by S is rerun with precise ordering in {overscore (S)} (

4914

-

4920

). The A′

j

arrays are ignored.

If the flag is set to false, then no array reference violations occurred, and it is necessary to copy the results in the various auxiliary A

j

′ arrays to their permanent locations. Lines

4922

through Lines

4925

are added to accomplish this. We note that standard techniques can be used to reduce the number of elements that are actually copied between A

j

′ and A

j

.

At the end of execution of

4905

, independent of the execution path taken, the A

j

′ auxiliary arrays associated with the A

j

arrays can be explicitly freed or garbage collected, depending on the functionality of the underlying run-time system.

While the invention has been described in terms of several preferred embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.

Number	Name	Date	Kind
4642765	Cocke et al.	Feb 1987	A
5293631	Rau et al.	Mar 1994	A
5361354	Greyzck	Nov 1994	A
5586325	MacDonald et al.	Dec 1996	A
5835701	Hastings	Nov 1998	A
5953531	Megiddo et al.	Sep 1999	A
6014723	Tremblay et al.	Jan 2000	A
6076141	Tremblay et al.	Jun 2000	A

Method for optimizing array bounds checks in programs

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract

Description

Claims

US Referenced Citations (8)

Non-Patent Literature Citations (2)

Entry
Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, pp. 454-457, Jun. 1997.*
Fortran 77 for Scientist and Engineers, Third Edition,by Nyhoff and Leestma, “Multimensional Arrays”, pp. 385-391.