The foregoing will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
The present method and apparatus performing program optimization using Automaton Loop Construct (ALC) defines a ALC and further includes rewriting key parts of enterprise business applications using finite state automata, and includes ways to use the knowledge contained in these to have a compiler remove the extra reformulations of the data.
Presented is a new automaton loop construct, and a way to partially deforest a program with it, which can be used to solve the problem. This construct is simple, functional, easy to deforest against and partially deforest against, and is also quite familiar and intuitive to computer scientists. While this construct can be generally straightforward for computer scientists to reason about and write, and for a compiler to partially evaluate, deforest, and partially deforest, for a wide range of problems, the ALC will be explained in reference to the particular problem of XML serialization as part of an XML processing scenario. Note that the present invention is not intended to be limited to an XML serialzier, the XML serializer example is used for explanation purposes.
Functional XML processing code can be written around an ALC which can be optimized using classical functional language techniques, in order to precompute more of the output of XML processing, such as transformation, query, etc. Since this construct is a well-understood abstraction, it can be easy to use for computer scientists, and the increased optimizability can bring substantial speedups to this performance-critical arena.
Computer scientists are quite familiar with the finite state automaton. The present “Automaton Loop Construct” (ALC) is basically a finite state automaton tailored to be easily deforestable and partially deforestable. The ALC works as follows:
1. The automaton starts in an initial state.
2. A sequence of objects is consumed one at a time.
3. A consumed object is matched to the first possible state transition given the current state. The state transition consists of four parts: (1) a state number, (2) a pattern against which to match a potential source object, (3) a target state, and (4) a sequence of output objects. The automaton outputs the output object sequence associated with the state transition and changes to the associated target state. Steps 2-3 are repeated until all objects are consumed from the input.
Automatons are very well suited for use in deforestation optimizations. One of the important things about our limited automata is that all transitions are on a known constant state, so that a compiler can make deductions using this information. Consider the following example of a simplified XML serializer:
The automaton has two states—state 2 indicates that the closing “>” on a start tag has not been yet outputted, and state 1 indicates that the closing “>” on the last start tag (if any) has already been outputted.
To produce this fragment: <foo bar=“baz”>fluff</foo>, the following subroutine could be used:
The advantage of the automaton representation of the ALC arises when used with partial evaluation and deforestation. For example, suppose the Output_Foo subroutine is called in the middle of a program in which XML is outputted. At any given invocation of Output_Foo, it is unknown whether the XML serializer is in state 1 or 2. However, regardless of whether the serializer is in state 1 or 2 at the beginning of the invocation of Output_Foo, after the first Begin_Tag event is processed, the serializer will deterministically be in state 2. The important part is that the compiler can determine this statically with relative ease by simply looking at the definition of the automaton. Therefore, at runtime, instead of requiring a switch to be executed before every event which is processed by the automaton, only the first event requires an if statement; the rest of the event stream, from the Add_Attribute(“bar”, “baz”) portion on, can be partially evaluated, or precomputed, at compile time. For example, code for the sample subroutine can always be generated as:
Building up a more complicated case, consider the following subroutine:
Using the kind of compilation discussed above, the compiler can generate code like the following for this subroutine:
Since Output_Foo, as seen above, always leaves in state 1, this code can be optimized as follows:
By specializing the Output_Foo code to the specific values of the initial state, the results are:
If appropriate, the two called subroutines can then be inlined, producing the following code:
The straightforward deforestation and partial deforestation of the ALC enables other well-known functional optimizations in order to precompute exactly the parts of the output of the XML processing which could be known at compile time, while leaving the other parts uncalculated until runtime. For example, it is unknown what exactly to do with the “one” begin tag. Then, a large segment of the output is precomputed. Then, given the opaque “Unknown” call, it is unknown what to do for it or after it for the “two” begin tag, but then the rest of the output is precomputed.
In contrast, if the serializer had been written using a more general construct, such as a fold construct from ML or Lisp, the compiler would have to do more complex data flow analysis to determine that the second event and on would result in deterministic output. Obviously, if it was written in an imperative style, with state, then this analysis would be even more difficult.
One of the keys to this optimization is partial automaton deforestation, which is accomplished using the act of splitting one ALC which has an initial state, into repeatedly executing a similar automaton that takes as an input argument its initial state and returns its final state. For example, consider any general ALC:
ALC(initial-state=i, transitions=t, input=x)
For ‘partially deforested automaton’, a generic version of this ALC can be written as:
Generic(state-arg, input-arg)=ALC(initial-state=state-arg, transitions=t, input=input-arg), and after the sequence of input is exhausted, it returns the then current state.
A call to the ALC can be rewritten as a call to the generic version with passing the ALC's initial state to it, and ignoring the result state:
junk=Generic(i, x)
This has the exact same behavior as the initial ALC did, and thus this process can be accomplished for rewriting any ALC. This is then useful for “splitting up” the ALC, as can be seen in the examples above where as much as possible was precomputed and the “Generic ALC”s were left to compute the rest at runtime. As an illustrative example, consider the sequence case which is in
ALC(initial-state=i, transitions=t, input=sequence(a,b,c, . . . , z))
can be rewritten as:
Generic(transitions=t) called on i and sequence(a,b,c, . . . , z)
Which, using the rewrite for sequence given in
ia=i
ib=Generic(transitions=t) called on ia and a
ic=Generic(transitions=t) called on ib and b
id=Generic(transitions=t) called on ic and c
ifinal=Generic(transitions=t) called on iz and z
junk=ifinal
After this rewrite, or any other rewrite or “splitting up”, we can continue this recursive process by repeatedly rewriting each ALC (Generic) in turn, using each input (a, b, c, . . . z, respectively, in the example). Whatever is precomputable at compile time will be precomputed, thus removing the need for extra objects and conversion passes at runtime. This can dramatically improve the performance of XML processing, as an example, and many kinds of computer processing, in general.
These analyses can lead to precomputation of much of the output, based on just the operations, even before any input is seen. However, these optimizations come at a cost: All main parts of the computation must be functional, and must be easy to deforest against. In particular, for XML processing, all data goes through the XML serializer before output, and to get these benefits the serializer must be functional and easy to deforest against.
Most XML processing today is implemented in imperative C or Java programs, and their serializers are imperative, and thus not at all amenable to straightforward partial evaluation or deforestation. Other current alternatives use functional languages, which are amenable to classical functional language analysis and optimizations in general. Unfortunately, these serializers use constructs such as folds which, while easier to deal with than imperative code for some partial evaluation, are not at all straightforward to deforest against. Even the conversion to automata and then deforesting them to produce a single automata, produces a final automata that will not be straightforward to deforest, and is not easy to partially deforest. Thus, no current solutions can be compiled to eliminate all the recodings for compile-time-computable parts of the output, and highly-optimized processing for the unknown parts. By contrast, partial deforestation and implementation of key parts of a program with our limited automata, while easy to understand and program, can dramatically improve performance of many enterprise applications and XML processing performance as one particular example.
A flow chart of the presently disclosed method is depicted in
Referring now to
In processing block 14, the match is rewritten by realizing that the automaton applied to the results of the match will yield the same results as computing the automaton on each case of the match. This would be the same for any conditional.
In processing block 16, the sequence is rewritten by realizing that the automaton applied to the sequence will yield the same results as computing the automaton in sequence on each member of the sequence, passing states between the automaton executions using partial automaton deforestation.
In processing block 18, the function call is rewritten by realizing that the automaton applied to the results of the function call will yield the same results as calling a new function which does the same work as the old function, but also calls the ALC at the end on its results before passing them back.
Referring now to
Processing block 116 recites optimizing the program, the optimizing including pre-computing as much output as possible using the at least one ALC, the optimizing resulting in optimized program code. Processing block 118 discloses that the optimizing comprises checking top-level program constructs and matching at least one of the top-level program constructs to at least one ALC. Processing block 120 states optimizing the program includes the at least one ALC being provided with a sequence of objects which are processed one at a time, and wherein the current state starts as the initial state for the at least one ALC and wherein as each object is processed the processed object is matched to a first possible state transition of the ALC such that the state number of the transition matches the current state and the pattern matches the processed object, the at least one ALC outputs the output object sequence in the matched state transition and changes to the target state in the matched transition, and wherein prior to processing, the object is compiled and optimized based on at least one of the group comprising partial evaluation, deforestation, partial deforestation and language compilation techniques.
Processing block 122 recites comprising executing the optimized program code. In a particular embodiment, as recited in processing block 124 the program comprises an extensible markup language (XML) processing program and wherein the ALC comprises a core XML serializer.
Processing block 126 discloses the optimized program code is subject to additional processing, the additional processing including at least one of the group comprising storing the optimized program code, compiling the optimized program code to native code, compiling the optimized program code to byte code, and compiling the optimized program code to Virtual Memory (VM) instructions.
By way of the above described ALC and method of performing program optimization using Automaton Loop Construct (ALC), instead of construction programs as they are today, mostly functional programs are constructed including the new ALC constructs in key places. A compiler partially evaluates the entire programs, where possible, deforests as much as possible, including all of the techniques that have been described to precompute as much of the output as possible. The resulting code is executed, in any form that code can be executed in, including the ALCs. There is less reformulating of data, and less processing, and thus the program runs much faster and use less memory.
Having described preferred embodiments of the invention it will now become apparent to those of ordinary skill in the art that other embodiments incorporating these concepts may be used. Additionally, the software included as part of the invention may be embodied in a computer program product that includes a computer useable medium. For example, such a computer usable medium can include a readable memory device, such as a hard drive device, a CD-ROM, a DVD-ROM, or a computer diskette, having computer readable program code segments stored thereon. The computer readable medium can also include a communications link, either optical, wired, or wireless, having program code segments carried thereon as digital or analog signals. Accordingly, it is submitted that that the invention should not be limited to the described embodiments but rather should be limited only by the spirit and scope of the appended claims.