The present invention relates to a method for translating programs for reconfigurable architectures.
Conventional paralleling compilers normally use special constructs such as semaphores and/or other methods for synchronization. Conventionally, technology-related methods are used. Conventional methods are not suitable for combining functionally specified architectures with the associated time response and imperatively specified algorithm. The methods used will, therefore, supply satisfactory solutions only in special cases.
Compilers for reconfigurable architectures conventionally use macros which have been specially generated for the particular reconfigurable hardware, hardware description languages (e.g. Verilog, VHDL, and System-C) being used in most cases for generating the macros. These macros are then called up (instanced) out of the program flow by a normal high-level language (e.g. C, C++).
The present invention relates to a method for automatically mapping functionally or imperatively formulated computing rules onto different target technologies, for example, onto ASICs, reconfigurable chips (FPGAs, DPGAs, VPUs, ChessArray, KressArray, Chameleon, etc.; combined under the term VPU in the text which follows), sequential processors (CISC/RISC CPUs, DSPs, etc.; combined under the term CPU in the text which follows) and parallel processor systems (SMP, MMP, etc.). In this connection, particular reference is made to the following: P 44 16 881.0-53, DE 197 81 412.3, DE 197 81 483.2, DE 196 54 846.2-53, DE 196 54 593.5-53, DE 197 04 044.6-53, DE 198 80 129.7, DE 198 61 088.2-53, DE 199 80 312.9, PCT/DE 00/01869, DE 100 36 627.9-33, DE 100 28 397.7, DE 101 10 530.4, DE 101 11 014.6, PCT/EP 00/10516, EP 01 102 674.7, U.S. Ser. No. 10/009,649 (PACT13), (PACT17), (PACT18), (PACT22), (PACT24), (PACT25), U.S. Ser. No. 60/317,876 (PACT26US), U.S. Pat. No. 6,425,068 B1 (PACT02), U.S. Pat. No. 6,088,795 (PACT04), U.S. Pat. No. 6,081,903 (PACT08), U.S. Ser. No. 09/623,052 (PACT10), each of which is expressly incorporated herein by reference in its entirety.
VPUs basically may include a multidimensional homogeneous or inhomogeneous flat or hierarchical arrangement (PA) of cells (PAEs) which may perform arbitrary functions, particularly logical and/or arithmetic functions and/or storage functions and/or network functions. The PAEs are associated with a loading unit (CT) which determines the operation of the PAEs by configuration and possibly reconfiguration. The method is based on an abstract parallel machine model which, apart from the finite automaton, also integrates imperative problem specifications and provides for an efficient algorithmic derivation of an implementation to different technologies.
a illustrates the structure of a normal finite automaton in which a combinational network is combined with a register.
b illustrates a finite automaton by a reconfigurable architecture.
a illustrates a combinational network with the associated variables.
b illustrates a combinational network with the associated variables, where x1:=x1+1.
c illustrates the behavior of a finite automaton calculating x1:=x1+1 within a configuration.
a illustrates an iterative calculation for i:=1 to 10, x1:=x1*x1.
b illustrates an iterative calculation for i:=1 to 10, x1;=x1*x1.
a illustrates the determination of PAR(p) for each row of a graph.
b and 7c illustrate the VEC calculations of two functions.
a illustrates the mapping of the graph of
b illustrates the graph of
c illustrates the graph of
d illustrates the graph of
a illustrates the function according to (i4
i5)
i9) and (i3
(i6
i7
i8)
(i10
i11)) may be executed in parallel.
b illustrates the function according to i2
(i4
i5)
i9
i12) and (i3
(i6
i7
i8)
(i10
i11) are in parallel.
Conventionally, the basis for working through virtually any method for specifying algorithms is the finite automaton.
The finite automaton makes it possible to map complex algorithms onto any sequential machines as illustrated in
In principle, any sequential program may be interpreted as a finite automaton and in most cases a very large combinational network is produced. For this reason, the combinational operations in the programming of traditional “von Neumann” architectures—i.e., in all CPUs—are split into a sequence of in each case individual simple predetermined operations (OpCodes) in registers in the CPU. This splitting results in states for controlling the combinational operation split into a sequence, which states do not exist or are not needed within the original combinational operation. The states of a von Neumann machine to be processed may be distinguished in principle, therefore, from the algorithmic states of a combinational network, i.e., the registers of finite automatons.
In contrast to the rigid OpCodes of CPUs, the VPU technology provides for the flexible configuration of complex combinational operations (complex instruction) in accordance with the algorithm to be mapped. The VPU technology is described in the following documents, each of which is incorporated herein by reference in its entirety: U.S. Pat. No. 5,943,242 (PACT01), U.S. Pat. No. 6,425,068 B1 (PACT02), U.S. Pat. No. 6,119,181 (PACT03), U.S. Pat. No. 6,088,795 (PACT04), U.S. Pat. No. 6,021,490 (PACT05), U.S. Pat. No. 6,081,903 (PACT08), U.S. Ser. No. 09/623,052 (PACT10), U.S. Ser. No. 10/009,649 (PACT13), German Patent Application No. 100 28 397.7 (PACT17), German Patent Application No. 101 10 530.4 (PACT18), German Patent Application No. 100 50 442.6 (PACT22), European Patent Application No. 01 102 674.7 (PACT24).
Operation of the Compiler
It is furthermore an operation of the compiler to generate the complex instructions in such a manner that it may be executed for as long as possible in the PAE matrix without reconfiguration.
The compiler also generates the finite automaton from the imperative source text in such a manner that it may be executed optimally in the PAE matrix.
The finite automaton may be split into configurations.
The processing (interpreting) of the finite automaton may be done in a VPU in such a manner that the configurations generated may be progressively mapped to the PAE matrix and the operating data and/or states, which may need to be transmitted between the configurations, may be stored in the memory. For this purpose, the method described in U.S. Pat. No. 6,088,795 (PACT04) or, respectively, the corresponding architecture may be used.
In other words, a configuration represents a plurality of instructions; a configuration determines the operation of the PAE matrix for a multiplicity of clock cycles during which a multiplicity of data is processed in the matrix; these originate from a source external to the VPU and/or an internal memory and may be written to an external source and/or to an internal memory. The internal memories replace the set of registers of a conventional CPU in such a manner that, e.g., a register may be represented by a memory and, according to the operating principle of the VPU technology, it is not a data word which is stored per register but an entire data record per memory.
The data and/or states of the processing of a configuration being executed are stored in the memories and are thus available for the next configuration executed.
A difference from compilers paralleling on an instruction basis consists in that the method emulates a combinational network on a PAE matrix whereas conventional compilers combine sequences of instructions (OpCodes).
Exemplary WHILE Language
In the text that follows, the operation of the compiler is illustrated by way of an example in accordance with a simple language. The principles of this language are described in the “Armin Nückel thesis.” However, this only describes the mapping of a function to a static combinational network. An aspect of the present invention is the mapping to configurations which are then mapped to the PAE matrix in a temporal sequence in accordance with the algorithm and the states resulting during the processing.
The “WHILE” programming language is defined as follows:
Syntax: WHILE . . .
Constructs: Instruction
An instruction or a sequence of instructions may be mapped to a combinational network by the compiler method described.
a illustrates a combinational network with the associated variables. The content of one and the same variable (e.g. x1) may change from one stage (0301) of the network to the next (0302).
This change is illustrated by way of example for the assignment x1: =x1+1 in
Addressing of Variables
For the purpose of addressing for reading the operands and for storing the results, address generators may be synchronized with the combinational network of the assignment. With each variable processed, corresponding new addresses may be generated for operands and results (
Since typically a plurality of data are processed within a certain configuration of the PAEs in the present data processing model, simple FIFO modes are available for most applications, at least for the data memories, which are used for storing data and states of the data processing (virtually as replacement of a conventional set of registers of conventional CPUs) within this description (compare U.S. Pat. No. 6,088,795 (PACT04)).
Sequences of Instructions
A sequence of the exemplary assignment may be generated as follows (
x1:=0;
WHILE TRUE DO
This sequence may now be mapped in accordance with an assignment, as described above, and address generators for operands and results.
Finite Sequences
In the following, a particular example embodiment of sequences from the defined constructs of the WHILE language will be discussed. A finite sequence of the exemplary assignment may be generated as follows:
FOR i:=1 TO 10
Such a sequence may be implemented by two types:
a) The first type is in accordance with generating an adder for calculating i in accordance with the WHILE construct (above) and a further adder for calculating x1. The sequence is mapped as a loop and calculated iteratively (
b) The second type is in accordance with rolling out the loop which dispenses with the calculation of i as a function. The calculation of x1 is instanced i-times and built up as a pipeline which produces i concatenated adders (
Conditions
Conditions may be expressed in accordance with WHILE. For example:
x1:=0;
WHILE x1<10 DO
The mapping generates an additional PAE for processing the comparison. The result of the comparison is represented by a status signal (compare U.S. Pat. No. 6,081,903 (PACT08)) which is evaluated by the PAEs processing the instruction and the address generators.
The resultant mapping is illustrated in
Basic Method
According to this basic method, each program may be mapped in a system which is built up as follows:
1. Memory for operands
2. Memory for results
3. Address generator(s)
4. Network of a) assignments and/or b) While instructions.
Handling States
A distinction is made between algorithmically relevant and irrelevant states. Relevant states are necessary within the algorithm for describing its correct operation. They are essential to the algorithm. Irrelevant states are produced by the hardware used and/or by the selected mapping or from other secondary reasons. They are essential for the mapping (i.e., hardware).
It is only the relevant states which may need to be obtained with the data. For this reason, they are stored together with the data in the memories since they occurred either as a result of the processing with the data or are necessary as operands with the data for the next processing cycle.
In contrast, irrelevant states are necessary only locally and locally in time and do not, therefore, need to be stored.
a) The state information of a comparison is relevant for the further processing of the data since it determines the functions to be executed.
b) Assume a sequential divider is produced, for example, by mapping a division instruction to hardware which only supports the sequential division. This results in a state which identifies the mathematical step within the division. This state is irrelevant since only the result (i.e., the division carried out) is required for the algorithm. In this case, only the result and the time information (i.e., the availability) are thus needed.
The time information may be obtained by the RDY/ACK handshake, for example in the VPU technology described in U.S. Pat. No. 5,943,242 (PACT01), U.S. Pat. No. 6,425,068 B1 (PACT02), and U.S. Ser. No. 10/009,649 (PACT13). However, it is noted in this regard that the handshake also does not represent a relevant state since it only signals the validity of the data as a result of which the remaining relevant information, in turn, is reduced to the existence of valid data.
Handling Time
In many programming languages, particularly in sequential ones such as C, a precise temporal order is implicitly predetermined by the language, for example, in sequential programming languages by the order of the individual instructions. If required by the programming language and/or the algorithm, the time information may be mapped to synchronization models such as RDY/ACK and/or REQ/ACK or a time stamp method described in German Patent Application No. 101 10 530.4 (PACT18).
Macros
More complex functions of a high-level language, such as loops, are implemented by macros. The macros are predetermined by the compiler and instanced at translation time (compare
The macros are built up either of simple language constructs of the high-level language or at assembler level. Macros may be parameterized in order to provide for a simple adaptation to the algorithm described (compare
Feedback Loops and Registers
Undelayed feedbacks which resonate uncontrolled may arise within the mapping of an algorithm into a combinational network.
In VPU technologies described in U.S. Pat. No. 6,425,068 B1 (PACT02), this is prevented by the structure of the exemplary PAE in that at least one register is permanently defined in the PAEs for the purpose of decoupling.
In general, undelayed feedbacks may be detected by analyzing the graph of the combinational network produced. Registers for decoupling are then inserted collectively into the data paths in which an undelayed feedback exists.
The correct operation of the calculation is also ensured by inserting registers by using handshake protocols (e.g., RDY/ACK).
Time Domain Multiplexing (TDM)
In principle, any PAE matrix implemented in practice has only one finite quantity. For this reason, a partitioning of the algorithm according to numeral 4 of the basic method (above) into a plurality of configurations which are successively configured may need to be performed in the subsequent step. The aim is to calculate as many data packets as possible in the network without having to reconfigure.
Between the configurations, a buffer memory is introduced which—similar to a register in the case of CPUs—stores the data between the individual configurations executed sequentially.
In other words, in VPU technology, it is not an OpCode which is sequentially executed but complex configurations. Whereas, in the case of CPUs, an OpCode typically processes a data word, a plurality of data words (a data packet) are processed by a configuration in the VPU technology. As a result, the efficiency of the reconfigurable architecture increases due to a better relationship between reconfiguration effort and data processing.
In the VPU technology, a memory may be used instead of a register since it is not data words but data packets which are processed between the configurations. This memory may be constructed as random access memory, stack, FIFO, or any other memory architecture, a FIFO typically providing the best one and the one which is implemented most easily.
Data are then processed by the PAE matrix in accordance with the algorithm configured and stored in one or more memories. The PAE matrix is reconfigured after the processing of a set of data and the new configuration takes the intermediate results from the memory(ies) and continues the execution of the program. In the process, new data may also easily flow additionally into the calculation from external memories and/or to the peripherals, and results may likewise be written to external memories and/or to the peripherals.
In other words, the typical sequence of data processing is the reading out of internal RAMs, the processing of the data in the matrix, and writing the data into the internal memories; arbitrary external sources may be easily used for data processing or destinations used for data transfers in addition to or instead of internal memories.
Whereas “sequencing” in CPUs is defined as the reloading of an OpCode, “sequencing” of VPUs is defined as the (re)configuring of configurations.
The information regarding when and/or how sequencing takes place (i.e., which is the next configuration that should be configured) may be represented by various information items which may be used individually or in combination. For example, the following strategies are appropriate for deriving the information:
a) defined by the compiler at translation time,
b) defined by the event network (Trigger, U.S. Pat. No. 6,081,903 (PACT08)),
c) defined by the fill ratio of the memories (Trigger, U.S. Pat. No. 6,081,903 (PACT08), U.S. Pat. No. 6,088,795 (PACT 04)).
Influence of the TDM on the Processor Model
The partitioning of the algorithm decisively determines the relevant states which are stored in the memories between the various configurations. If a state is only relevant within a configuration (locally relevant state), it is not necessary to store it.
Nevertheless, it is useful to store these states for the purpose of debugging of the program to be executed in order to provide the debugger with access to these states. This necessity is described in greater detail in the debugging method U.S. Ser. No. 09/967,497 (PACT21) at the same date. Furthermore, states may become relevant additionally if a task switch mechanism is used (e.g. by an operating system or interrupt sources) and current configurations executed are interrupted, other configurations are loaded and the aborted configuration is to be continued at a later time. A more detailed description follows.
A simple example follows to illustrate the discriminating feature for locally relevant states:
The possible use of an operating system may have an additional influence on the observation and handling of states. Operating systems may use, for example, task schedulers for administering a number of tasks in order to provide multitasking.
Task schedulers terminate tasks at a particular time, start other tasks and return to the further processing of the aborted task after the other ones have been processed. If it is ensured that a configuration, which corresponds to the processing of a task, terminates only after the complete processing, i.e., when all data and states to be processed within this configuration cycle are stored, locally relevant states may remain unstored.
If, however, the task scheduler terminates configurations before they have been completely processed, local states and/or data may need to be stored. Furthermore, this may be of advantage if the processing time of a configuration cannot be predicted. This also appears useful in conjunction with the holding problem and the risk that a configuration will not terminate (e.g. due to a fault) in order to prevent a deadlock of the entire system.
In other words, taking into consideration task switching, relevant states may also need to be considered to be those which are necessary for task switching and for a new correct start of the data processing.
In the case of a task switch, the memory for results and possibly also the memory for the operands may need to be saved and established again at a later time, that is to say on return to this task. This may be done similarly to the PUSH/POP instructions and conventional methods. Furthermore, the state of the data processing may need to be saved, i.e. the pointer to the last operands completely processed. Reference is made here to German Patent Application No. 101 10 530.4 (PACT18).
Depending on the optimization of the task switch, there are two possibilities, for example:
a) The terminated configuration is reconfigured and only the operands are loaded. Data processing begins once again as if the processing of the configuration has not yet been begun at all. In other words, all data calculations are simply executed from the beginning and calculations may already have been performed previously. This possibility is simple but not very efficient.
b) The terminated configuration is reconfigured and the operands and results already calculated have been loaded into the respective memories. The data processing is continued at the operands which have not been completely calculated. This method is more efficient but presupposes that additional states which occur during the processing of the configuration may become relevant, for example at least one pointer to the last operands completely miscalculated may need to be saved so that it is possible to start again with successors after completed reconfiguration.
Algorithmic Optimization
The translation method described separates control structures from algorithmic structures. For example a loop may be split into a body (WHILE) and an algorithmic structure (instructions).
The algorithmic structures may then be optionally optimized by an additional tool following the separation.
For example, a subsequent algebra software may optimize and minimize the programmed algorithms. Such tools are conventional, e.g. by AXIOM, MARBLE, etc. Due to the minimization, a quicker execution of the algorithm and/or a considerably reduced space requirement may be achieved.
The result of the optimization is then conducted back into the compiler and processed further accordingly.
Applicability for Processors
Instead of a PAE matrix, an arrangement of arithmetic logic units (ALUS) such as conventionally used, for example, in VLIW processors and/or an arrangement of complete processors such as conventionally used, for example, in multiprocessor systems, may also be used. The use of an individual ALU represents a special case so that the method may also be used for conventional CPUs.
In the dissertation by Armin Nückel, a method is described which provides for the translation of the WHILE language into semantically correct finite automatons. Beyond that, a finite automaton may be used as a “subroutine” and conversely. This provides the possibility of mapping a configuration to different implementation technologies such as, e.g., CPUs, symmetric multiprocessors, FPGAs, ASICs, VPUs.
In particular, it is possible to allocate in each case optimally suited hardware to parts of an application. In other words, a data flow structure, for example, may be transferred to a data flow architecture whereas a sequential structure may be mapped to a sequencer.
The problems arising with resource allocations for the individual algorithms may be solved, e.g. by the job assignment algorithm for administering the allocation.
The following is a discussion of several exemplary embodiments of the compiler.
a illustrates an example embodiment of the structure of a normal finite automaton in which a combinational network (0101) is combined with a register (0102). Data may be conducted directly to 0101 (0103) and 0102 (0104). By feeding back (0105) the register to the combinational network, a state may be processed in dependence on the previous states. The processing results are represented by 0106.
b illustrates an example embodiment of the finite automaton by a reconfigurable architecture according to U.S. Pat. No. 5,943,242 (PACT01) and U.S. Pat. No. 6,088,795 (PACT04) (PACT04
The operand and result memories (0202, 0203) are physically or virtually coupled to one another in such a manner that, for example, the results of a function may be used as operands by one another and/or results and operands of a function may be used as operands by one another. Such coupling may be established, for example, by bus systems or by a (re)configuration by which the function and networking of the memories with the 0201 is reconfigured.
c illustrates the behavior of a finite automaton calculating x1:=x1+1 within a configuration. In the next configuration, 0306 and 0304 may need to be exchanged in order to obtain a complete finite automaton. 0305 represents the address generators for the memory 0304 and 0306.
a illustrates the implementation of a simple loop of the type
WHILE TRUE DO
0402 may be produced as a macro from the loop construct. The macro is instanced by the translation of WHILE. 0405 is either also part of the macro or is inserted precisely when and where an undelayed feedback exists in accordance with an analysis of the graphs.
b illustrates the structure of a genuine loop of the type
WHILE x1<10 DO
In addition, there may be a circuit which checks the validity of the result (0410) and forwards 0404a to the subsequent functions (0411) only when the termination criterion of the loop has been reached. The termination criterion is detected by the comparison x1<10 (0412). As a result of the comparison, the relevant status flag (0413) may be conducted to 0402 for controlling the loop and 0411 for controlling the forwarding of the result. 0413 may be implemented, for example, by triggers described in U.S. Pat. No. 6,081,903 (PACT08). Similarly, 0413 may be sent to a CT which thereupon detects the termination of the loop and performs a reconfiguration.
a illustrates the iterative calculation of
FOR i:=1 TO 10
The basic function generally corresponds to
b illustrates the rolling out of the calculation of
FOR i:=1 TO 10
As a reconfiguration criterion, the fill ratio of the memories (0606, 0607: memory full/empty) and/or 0601, which indicates the termination of the loop, may be used. In other words, the fill ratio of the memories generates triggers (compare U.S. Pat. No. 5,943,242 (PACT01), U.S. Pat. No. 6,021,490 (PACT05), U.S. Pat. No. 6,081,903 (PACT08), U.S. Ser. No. 09/623,052 (PACT10) which are sent to the CT and trigger a reconfiguration. The state of the loop (0601) may also be sent to the CT. The CT may then configure the subsequent algorithms when the termination criterion is reached or possibly first process the remaining parts of the loop (0603, 0604, 0605) and then load the subsequent configurations.
Limits of Parallelability
a) If the calculation of the operands is independent of the feedback 0608, the loop may be calculated in blocks, i.e. in each case by filling the memories 0606/0607. This results in a high degree of parallelism.
b) If the calculation of an operand is dependent on the result of the previous calculation, that is to say 0608 is included in the calculation, the method becomes more inefficient since in each case only one operand may be calculated within the loop.
If the usable ILP (Instruction Level Parallelism) within the loop is high and the time for reconfiguration is low (compare U.S. Pat. No. 6,425,068 B1 (PACT02), U.S. Pat. No. 6,088,795 (PACT04), U.S. Ser. No. 10/009,649 (PACT13), German Patent Application No. 100 28 397.7 (PACT17)), a calculation rolled out to PAEs may still be efficient on a VPU.
If this is not the case, it may be useful to map the loop to a sequential architecture (a separate processor from PA or implementation within the PA described in U.S. Pat. No. 6,425,068 B1 (PACT02), U.S. Pat. No. 6,088,795 (PACT04) and especially U.S. Ser. No. 10/009,649 (PACT13) (
The calculation times may be analyzed either at translation time in the compiler in accordance with the next section or may be measured empirically at run time and subsequently optimized.
Analysis and Paralleling Method
For the analysis and performance of the paralleling, various conventional methods are available.
In the text which follows, a preferred method will be described.
Functions to be mapped, where an application may be composed of an arbitrary number of different functions, are represented by graphs (compare U.S. Ser. No. 10/009,649 (PACT13)). The graphs are examined for the parallelism contained in them and all methods for optimizing may be used ab initio.
Instruction Level Parallelism (ILP)
ILP expresses which instructions may be executed at the same time. Such an analysis is possible on the basis of dependencies of nodes in a graph. Corresponding methods are conventional and in mathematics. Reference is made, for example, to VLIW compilers and synthesis tools.
Attention may need to be paid to, e.g., possible interleaved conditional executions (IF) since a correct statement of the path which may be executed in parallel may frequently be scarcely made or not at all since there is a great dependence on the value space of the individual parameters which is frequently not known or only inadequately. A precise analysis may also pick up such an amount of computing time that it may no longer be usefully performed.
In such cases, the analysis may be simplified, for example, by notes by the programmer and/or it is possible to work in such a manner in accordance with corresponding compiler switches that, in the case of doubt, the starting point has to be either a high parallelability (possible by losing resources) or a lower parallelability (possibly by losing performance). As well, an empirical analysis may be performed at run time in these cases. As described in U.S. Ser. No. 09/623,052 (PACT10), German Patent Application No. 100 28 397.7 (PACT17), methods are available which allow statistics about the program behavior at run time. In this manner, a maximum parallelability may be initially assumed, for example. The individual paths report each pass back to a statistics unit (e.g., implemented in a CT (compare U.S. Ser. No. 09/623,052 (PACT10) and German Patent Application NO. 100 28 397.7 (PACT17)) and in principle, units according to U.S. Pat. No. 6,088,795 (PACT04) can also be used). It may now be analyzed in accordance with statistical measures which paths are actually passed in parallel. Furthermore, there is the possibility of using the data at run time for evaluating which paths are passed frequently or rarely or never in parallel.
Accordingly, it is possible to optimize with a next program call. According to German Patent Application No. 100 50 442.6 (PACT22), European Patent Application No. 01 102 674.7 (PACT24) a number of configurations may be configured either at the same time and then are driven by triggers U.S. Pat. No. 6,081,903 (PACT08) or only a subset is configured and the remaining configurations are later loaded when required due to the fact that the corresponding triggers are sent to a loading unit (CT, U.S. Ser. No. 09/623,052 (PACT10)).
The value PAR(p) used in the text which follows specifies for the purpose of illustration how much ILP may be achieved at a certain stage (p) within the data flow graph transformed from the function (
Vector Parallelism
Vector parallelism may be useful if relatively large amounts of data are to be processed. In this case, the linear sequences of operations may be vectorized, i.e., all operations may simultaneously process data, each separate operation typically processing a separate data word.
This procedure may in some cases not be possible within loops. For this reason, analyses and optimizations may be necessary.
For example, the graph of a function may be expressed by a Petri network. Petri networks have the property that the forwarding of results from nodes is controlled as a result of which, for example, loops may be modeled.
Feeding the result back in a loop determines the data throughput. Examples:
Before loops are analyzed, they may be optimized. For example, all possible instructions may be extracted from the loop and placed in front of or after the loop.
The value VEC used for illustration in the text which follows characterizes the degree of vectorizability of a function. In other words VEC indicates how many data words may be processed simultaneously within a set of operations. VEC may be calculated, for example, from the number of arithmetic logic units needed for a function nnodes and of the data ndata which may be calculated at the same time within the vector, e.g., by VEC=ndata/nnodes.
If a function may be mapped, for example up to 5 arithmetic logic units (nnodes=5) and data may be processed at the same time in each of the arithmetic logic units (ndata=5), VEC=1 (
VEC may be calculated for an entire function and/or for part-sections of a function.
Evaluation of PAR and VEC
According to
If PAE(p) corresponds to the number of nodes in the row p, all nodes may be executed in parallel.
If PAR(p) is smaller, certain nodes may be executed only in alternation. The alternative executions of in each case one node are combined in each case in one PAE. A selection device enables the alternative corresponding to the status of the data processing to be activated at run time as described, for example, in U.S. Pat. No. 6,081,903 (PACT08).
VEC may be allocated to each row of a graph, also. If VEC=1 for one row, this means that the row remains in existence as pipeline stage. If a row is less than 1, all subsequent rows which are also less than 1 are combined since pipelining is not possible. According to the order of operations, these are combined to form a sequence which is then configured in a PAE and is sequentially processed at run time. Corresponding methods are according to, for example, U.S. Pat. No. 6,425,068 B1 (PACT02) and/or U.S. Pat. No. 6,088,795 (PACT04).
Parallel Processor Models and Reentrant Code
Using the method described, parallel processor models or any complexity may be built up by grouping sequencers. In particular, sequencer structures for mapping reentrant code may be generated.
The synchronizations necessary in each case for this purpose may be performed, for example, by the time stamp method described in German Patent Application No. 101 10 530.4 (PACT18).
Influence on Clocking
If a number of sequences or sequential parts are mapped onto a PA, it may then be useful to match the power of the individual sequences to one another for reasons of power consumption. This may be done in such a manner that the operating frequencies of the sequencers may be adapted to one another. For example, methods according to German Patent Application No. 101 35 210.7-53 (PACT25) and German Patent Application No. 101 10 530.4 (PACT18) allow individual clocking of individual PAEs or PAE groups.
The frequency of a sequencer may be determined in accordance with the number of cycles which it typically needs for processing its assigned function.
If, for example, if it needs 5 clock cycles for processing its function, its clocking should be 5-times higher than the clocking of the remaining system.
Partitioning and Scheduling
Functions may be partitioned in accordance with the aforementioned method. During partitioning, memories for data and relevant status may be correspondingly inserted. Other alternative and/or more extensive methods are described in U.S. Ser. No. 10/009,649 (PACT13) and German Patent Application No. 101 10 530.4 (PACT18).
Some VPUs offer the possibility of differential reconfiguration according to U.S. Pat. No. 5,943,242 (PACT01), U.S. Ser. No. 09/623,052 (PACT10), U.S. Ser. No. 10/009,649 (PACT13), German Patent Application No. 100 28 397.7 (PACT17), German Patent Application No. 100 50 442.6 (PACT22), European Patent Application No. 01 102 674.7 (PACT24). This may be applied if only relatively few changes become necessary within the arrangement of PAEs during a reconfiguration. In other words, only the changes of a configuration compared with the current configuration are reconfigured. In this case, the partitioning may be such that the (differential) configuration following a configuration only contains the necessary reconfiguration data and does not represent a complete configuration.
The reconfiguration may be scheduled by the status which reports function(s) to a loading unit (CT) which selects and configures the next configuration or part-configuration on the basis of the incoming status. In detail, such methods are according to U.S. Pat. No. 5,943,242 (PACT01), U.S. Pat. No. 6,021,490 (PACT05), U.S. Ser. No. 09/623,052 (PACT10), U.S. Ser. No. 10/009,649 (PACT13), German Patent Application No. 100 28 397.7 (PACT17).
Furthermore, the scheduling may support the possibility of preloading configurations during the run time of another configuration. In this arrangement, a number of configurations also may be preloaded speculatively, i.e. without ensuring that the configurations are needed at all. The configurations to be used may be then selected at run time in accordance with selection mechanisms according to U.S. Pat. No. 6,081,903 (PACT08) (see also example NLS in (PACT22/24)).
The local sequences may also be controlled by the status of their data processing as is described in U.S. Pat. No. 6,425,068 B1 (PACT02), U.S. Pat. No. 6,088,795 (PACT04), and U.S. Ser. No. 10/009,649 (PACT13). To carry out their reconfiguration, a further dependent or independent status may be reported to the CT (see, for example, U.S. Pat. No. 6,088,795 (PACT04), LLBACK).
In the text which follows, the following symbols are used for simplifying the notation: , which stands for or, and
, which stands for and.
a illustrates the mapping of the graph of
b illustrates the same graph, for example with maximum usable vectorizability. However, the sets of operations V2=(i1, i3), V3=(i4, i5, i6, i7, i8), V4=(i9, i10, i11) are not parallel (par({2,3,4})=1. This allows resources to be saved by in each case allocating one set P2, P3, P4 of operations to one PAE. The operations to be executed in the respective PAE may be selected by a status signal for each data word in each stage. The PAEs may be networked as pipeline (vector) and each PAE performs one operation per clock cycle over, in each case, different data words.
Sequence:
PAE1 calculates data and forwards them to PAE2. Together with the data, it forwards a status signal which indicates whether i1 or i2 is to be executed.
PAE2 further calculates the data of PAE1. The operation to be executed (i1, i2) may be selected and calculated in accordance with the incoming status signal. In accordance with the calculation, PAE2 forwards a status signal to PAE3 which indicates whether (i4i5) (i6
i7
i8) is to be executed.
PAE3 further calculates the data of PAE2. The operation to be executed (i4i5) (i6
i7
i8) may be selected and calculated in accordance with the incoming status signal. In accordance with the calculation, PAE3 forwards a status signal to PAE4 which indicates whether i9
i10
i11 is to be executed.
PAE4 further calculates the data of PAE3. The operation to be executed i9i10
i11 may be selected and calculated in accordance with the incoming status signal.
PAE5 further calculates the data of PAE4.
A possible corresponding method is described in U.S. Patent No. 6,081,903 (PACT08) (
c again illustrates the same graph. In this example, vectorization is not possible but PAR(p) is high which means that in each case a multiplicity of operations may be executed simultaneously within one row. The operations which may be performed in parallel are P2={i1i2}, P3={i4
i5
i6
i7
i8}, P4={i9
i10
i11}. The PAES may be networked in such a manner that they may arbitrarily exchange any data with one another. The individual PAEs only perform operations if there is an ILP in the corresponding cycle and are otherwise neutral (NOP) and, if necessary, the clock and/or power may be switched off in order to minimize the power dissipation.
Sequence:
In the first cycle, only PAE2 is operating and forwards the data to PAE2 and PAE3.
In the second cycle, PAE2 and PAE3 are operating in parallel and forward their data to PAE1, PAE2, PAE3, PAE4, PAE5.
In the third cycle, PAE1, PAE2, PAE3, PAE4, PAE5 are operating and forward the data to PAE2, PAE3, PAE5.
In the fourth cycle, PAE2, PAE3, PAE5 are operating and forward the data to PAE2.
In the fifth cycle only PAE2 is operating.
The function thus needs 5 cycles for calculation. The corresponding sequencer thus may operate with 5-times the clock in relationship to its environment in order to achieve a corresponding performance.
A possible corresponding method is described in U.S. Pat. No. 6,425,068 B1 (PACT02) (
d illustrates the graph of
The function also needs 5 cycles for the calculation, cy1=(i1), cy2=(i2i3), cy3=(i4
i5
i6
i7
i8), cy4=(i9
i10
i11), cy5=(i12). The corresponding sequencer should thus operate at 5-times the clock in relationship to its environment in order to achieve a corresponding performance.
Such a function may be mapped, for example, similarly to
The mappings illustrated in
In (i4
i5)
i9) and (i3
(i6
i7
i8)
(i10
i11)) may be executed in parallel. (i4
i5), (i6
i7
i8), (i10
i11) are in each case alternating. The function may also be vectorized. It thus makes it possible to build up a pipeline in which 3 PAEs (PAE4, PAE5, PAE7) in each case determine the function to be executed by them in each case in accordance with status signals.
b illustrates a similar example in which no vectorization is possible. However, the paths (i1i2
(i4
i5)
i9
i12) and (i3
(i6
i7
i8)
(i10
i11) are in parallel.
This makes it possible to achieve the optimum performance by using two PAEs which also process the parallel paths in parallel. The PAEs may be synchronized to one another in accordance with status signals which are generated by PAE1 since it calculates the beginning (i1) and the end (i12) of the function.
It should be noted that a multiple arrangement of sequencers may result in a symmetric parallel processor model (SMP) or similar multiprocessor models currently used.
Furthermore, it should be pointed out that all configuration registers for the scheduling may also be loaded with new configurations in the background and during the data processing. For example,:
In the method described in U.S. Pat. No. 6,425,068 B1 (PACT02), independent storage areas or registers are available which may be executed independently. Certain places are jumped to by incoming triggers and jumping is also possible in accordance with jump instructions (JMP, CALL/RET) which may also be conditionally executable.
In the method described in U.S. Pat. No. 6,088,795 (PACT04), write and read pointers are independently available as a result of which, in principle, an independence and thus the possibility of access in the background, are given. In particular, it is possible to segment the memories as a result of which additional independence is given. Jumping is possible in accordance with jump instructions (JMP, CALL/RET) which may also be conditionally executable.
In the method according to U.S. Pat. No. 6,081,903 (PACT08), the individual registers which may be selected by the triggers are basically independent and therefore allow an independent configuration, particularly in the background. Jumps within the registers are not possible and selection takes place exclusively via the trigger vectors.
Number | Date | Country | Kind |
---|---|---|---|
101 39 170 | Aug 2001 | DE | national |
101 42 903 | Sep 2001 | DE | national |
101 44 732 | Sep 2001 | DE | national |
101 45 792 | Sep 2001 | DE | national |
Number | Name | Date | Kind |
---|---|---|---|
2067477 | Cooper | Jan 1937 | A |
3242998 | Gubbins | Mar 1966 | A |
3681578 | Stevens | Aug 1972 | A |
3757608 | Willner | Sep 1973 | A |
3855577 | Vandierendonck | Dec 1974 | A |
4498172 | Bhavsar | Feb 1985 | A |
4566102 | Hefner | Jan 1986 | A |
4591979 | Iwashita | May 1986 | A |
4663706 | Allen et al. | May 1987 | A |
4682284 | Schrofer | Jul 1987 | A |
4706216 | Carter | Nov 1987 | A |
4720780 | Dolecek | Jan 1988 | A |
4739474 | Holsztynski | Apr 1988 | A |
4761755 | Ardini et al. | Aug 1988 | A |
4811214 | Nosenchuck et al. | Mar 1989 | A |
4852043 | Guest | Jul 1989 | A |
4852048 | Morton | Jul 1989 | A |
4860201 | Stolfo et al. | Aug 1989 | A |
4870302 | Freeman | Sep 1989 | A |
4891810 | de Corlieu et al. | Jan 1990 | A |
4901268 | Judd | Feb 1990 | A |
4910665 | Mattheyses et al. | Mar 1990 | A |
4967340 | Dawes | Oct 1990 | A |
5014193 | Garner et al. | May 1991 | A |
5015884 | Agrawal et al. | May 1991 | A |
5021947 | Campbell et al. | Jun 1991 | A |
5023775 | Poret | Jun 1991 | A |
5043978 | Nagler et al. | Aug 1991 | A |
5047924 | Fujioka et al. | Sep 1991 | A |
5065308 | Evans | Nov 1991 | A |
5081375 | Pickett et al. | Jan 1992 | A |
5109503 | Cruickshank et al. | Apr 1992 | A |
5113498 | Evan et al. | May 1992 | A |
5115510 | Okamoto et al. | May 1992 | A |
5123109 | Hillis | Jun 1992 | A |
5125801 | Nabity et al. | Jun 1992 | A |
5128559 | Steele | Jul 1992 | A |
5142469 | Weisenborn | Aug 1992 | A |
5144166 | Camarota et al. | Sep 1992 | A |
5193202 | Lee et al. | Mar 1993 | A |
5203005 | Horst | Apr 1993 | A |
5204935 | Mihara et al. | Apr 1993 | A |
5208491 | Ebeling et al. | May 1993 | A |
5226122 | Thayer et al. | Jul 1993 | A |
RE34363 | Freeman | Aug 1993 | E |
5233539 | Agrawal et al. | Aug 1993 | A |
5247689 | Ewert | Sep 1993 | A |
5274593 | Proebsting | Dec 1993 | A |
5287472 | Horst | Feb 1994 | A |
5294119 | Vincent et al. | Mar 1994 | A |
5301284 | Estes et al. | Apr 1994 | A |
5301344 | Kolchinsky | Apr 1994 | A |
5303172 | Magar et al. | Apr 1994 | A |
5336950 | Popli et al. | Aug 1994 | A |
5347639 | Rechtschaffen et al. | Sep 1994 | A |
5349193 | Mott et al. | Sep 1994 | A |
5353432 | Richek et al. | Oct 1994 | A |
5361373 | Gilson | Nov 1994 | A |
5379444 | Mumme | Jan 1995 | A |
5410723 | Schmidt et al. | Apr 1995 | A |
5418952 | Morley et al. | May 1995 | A |
5421019 | Holsztynski et al. | May 1995 | A |
5422823 | Agrawal et al. | Jun 1995 | A |
5425036 | Liu et al. | Jun 1995 | A |
5426378 | Ong | Jun 1995 | A |
5428526 | Flood et al. | Jun 1995 | A |
5430687 | Hung et al. | Jul 1995 | A |
5440245 | Galbraith et al. | Aug 1995 | A |
5440538 | Olsen et al. | Aug 1995 | A |
5442790 | Nosenchuck | Aug 1995 | A |
5444394 | Watson et al. | Aug 1995 | A |
5448186 | Kawata | Sep 1995 | A |
5455525 | Ho et al. | Oct 1995 | A |
5457644 | McCollum | Oct 1995 | A |
5465375 | Thepaut et al. | Nov 1995 | A |
5473266 | Ahanin et al. | Dec 1995 | A |
5473267 | Stansfield | Dec 1995 | A |
5475583 | Bock et al. | Dec 1995 | A |
5475803 | Stearns et al. | Dec 1995 | A |
5475856 | Kogge | Dec 1995 | A |
5483620 | Pechanek et al. | Jan 1996 | A |
5485103 | Pedersen et al. | Jan 1996 | A |
5485104 | Agrawal et al. | Jan 1996 | A |
5489857 | Agrawal et al. | Feb 1996 | A |
5491353 | Kean | Feb 1996 | A |
5493239 | Zlotnick | Feb 1996 | A |
5497498 | Taylor | Mar 1996 | A |
5506998 | Kato et al. | Apr 1996 | A |
5510730 | El Gamal et al. | Apr 1996 | A |
5511173 | Yamaura et al. | Apr 1996 | A |
5513366 | Agarwal et al. | Apr 1996 | A |
5521837 | Frankle et al. | May 1996 | A |
5522083 | Gove et al. | May 1996 | A |
5530873 | Takano | Jun 1996 | A |
5530946 | Bouvier et al. | Jun 1996 | A |
5532693 | Winters et al. | Jul 1996 | A |
5532957 | Malhi | Jul 1996 | A |
5535406 | Kolchinsky | Jul 1996 | A |
5537057 | Leong et al. | Jul 1996 | A |
5537601 | Kimura et al. | Jul 1996 | A |
5541530 | Cliff et al. | Jul 1996 | A |
5544336 | Kato et al. | Aug 1996 | A |
5548773 | Kemeny et al. | Aug 1996 | A |
5555434 | Carlstedt | Sep 1996 | A |
5559450 | Ngai et al. | Sep 1996 | A |
5561738 | Kinerk et al. | Oct 1996 | A |
5570040 | Lytle et al. | Oct 1996 | A |
5574930 | Halverson, Jr. et al. | Nov 1996 | A |
5583450 | Trimberger et al. | Dec 1996 | A |
5586044 | Agrawal et al. | Dec 1996 | A |
5587921 | Agrawal et al. | Dec 1996 | A |
5588152 | Dapp et al. | Dec 1996 | A |
5590345 | Barker et al. | Dec 1996 | A |
5590348 | Phillips et al. | Dec 1996 | A |
5596742 | Agarwal et al. | Jan 1997 | A |
5600265 | El Gamal et al. | Feb 1997 | A |
5611049 | Pitts | Mar 1997 | A |
5617547 | Feeney et al. | Apr 1997 | A |
5625806 | Kromer | Apr 1997 | A |
5634131 | Matter et al. | May 1997 | A |
5649176 | Selvidge et al. | Jul 1997 | A |
5649179 | Steenstra et al. | Jul 1997 | A |
5652894 | Hu et al. | Jul 1997 | A |
5655069 | Ogawara et al. | Aug 1997 | A |
5655124 | Lin | Aug 1997 | A |
5657330 | Matsumoto | Aug 1997 | A |
5659797 | Zandveld et al. | Aug 1997 | A |
5675743 | Mavity | Oct 1997 | A |
5680583 | Kuijsten | Oct 1997 | A |
5713037 | Wilkinson et al. | Jan 1998 | A |
5717943 | Barker et al. | Feb 1998 | A |
5732209 | Vigil et al. | Mar 1998 | A |
5734921 | Dapp et al. | Mar 1998 | A |
5742180 | Detton et al. | Apr 1998 | A |
5748872 | Norman | May 1998 | A |
5754827 | Barbier et al. | May 1998 | A |
5754871 | Wilkinson et al. | May 1998 | A |
5760602 | Tan | Jun 1998 | A |
5761484 | Agarwal et al. | Jun 1998 | A |
5773994 | Jones | Jun 1998 | A |
5778439 | Timberger et al. | Jul 1998 | A |
5784636 | Rupp | Jul 1998 | A |
5794059 | Barker et al. | Aug 1998 | A |
5794062 | Baxter | Aug 1998 | A |
5801715 | Norman | Sep 1998 | A |
5802290 | Casselman | Sep 1998 | A |
5828229 | Cliff et al. | Oct 1998 | A |
5828858 | Athanas et al. | Oct 1998 | A |
5838165 | Chatter | Nov 1998 | A |
5844888 | Narjjyka | Dec 1998 | A |
5848238 | Shimomura et al. | Dec 1998 | A |
5854918 | Baxter | Dec 1998 | A |
5859544 | Norman | Jan 1999 | A |
5865239 | Carr | Feb 1999 | A |
5867691 | Shiraishi | Feb 1999 | A |
5867723 | Chin et al. | Feb 1999 | A |
5884075 | Hester et al. | Mar 1999 | A |
5887162 | Williams et al. | Mar 1999 | A |
5889982 | Rodgers et al. | Mar 1999 | A |
5892370 | Eaton et al. | Apr 1999 | A |
5892961 | Trimberger | Apr 1999 | A |
5901279 | Davis, III | May 1999 | A |
5915123 | Mirsky et al. | Jun 1999 | A |
5924119 | Sindhu et al. | Jul 1999 | A |
5927423 | Wada et al. | Jul 1999 | A |
5933642 | Greenbaum et al. | Aug 1999 | A |
5936424 | Young et al. | Aug 1999 | A |
5943242 | Vorbach et al. | Aug 1999 | A |
5956518 | DeHon et al. | Sep 1999 | A |
5960200 | Eager et al. | Sep 1999 | A |
5966534 | Cooke et al. | Oct 1999 | A |
5970254 | Cooke et al. | Oct 1999 | A |
6011407 | New | Jan 2000 | A |
6014509 | Furtek et al. | Jan 2000 | A |
6021490 | Vorbach et al. | Feb 2000 | A |
6023564 | Trimberger | Feb 2000 | A |
6023742 | Ebeling et al. | Feb 2000 | A |
6034538 | Abramovici | Mar 2000 | A |
6038650 | Vorbach et al. | Mar 2000 | A |
6038656 | Cummings et al. | Mar 2000 | A |
6047115 | Mohan et al. | Apr 2000 | A |
6049222 | Lawman | Apr 2000 | A |
6052773 | DeHon et al. | Apr 2000 | A |
6054873 | Laramie | Apr 2000 | A |
6081903 | Vorbach et al. | Jun 2000 | A |
6085317 | Smith | Jul 2000 | A |
6088795 | Vorbach et al. | Jul 2000 | A |
6092174 | Roussakov | Jul 2000 | A |
6105105 | Trimberger et al. | Aug 2000 | A |
6108760 | Mirsky et al. | Aug 2000 | A |
6119181 | Vorbach et al. | Sep 2000 | A |
6122719 | Mirsky et al. | Sep 2000 | A |
6125408 | McGee et al. | Sep 2000 | A |
6127908 | Bozler et al. | Oct 2000 | A |
6172520 | Lawman et al. | Jan 2001 | B1 |
6202182 | Abramovici et al. | Mar 2001 | B1 |
6216223 | Revilla et al. | Apr 2001 | B1 |
6219833 | Solomon et al. | Apr 2001 | B1 |
6243808 | Wang | Jun 2001 | B1 |
6260179 | Ohsawa et al. | Jul 2001 | B1 |
6263430 | Trimberger et al. | Jul 2001 | B1 |
6279077 | Nasserbakht et al. | Aug 2001 | B1 |
6282627 | Wong et al. | Aug 2001 | B1 |
6286134 | Click, Jr. et al. | Sep 2001 | B1 |
6288566 | Hanrahan et al. | Sep 2001 | B1 |
6289440 | Casselman | Sep 2001 | B1 |
6298472 | Phillips et al. | Oct 2001 | B1 |
6311200 | Hanrahan et al. | Oct 2001 | B1 |
6321366 | Tseng et al. | Nov 2001 | B1 |
6338106 | Vorbach et al. | Jan 2002 | B1 |
6341318 | Dakhil | Jan 2002 | B1 |
6347346 | Taylor | Feb 2002 | B1 |
6349346 | Hanrahan et al. | Feb 2002 | B1 |
6370596 | Dakhil | Apr 2002 | B1 |
6378068 | Foster et al. | Apr 2002 | B1 |
6389379 | Lin et al. | May 2002 | B1 |
6389579 | Phillips et al. | May 2002 | B1 |
6392912 | Hanrahan et al. | May 2002 | B1 |
6405299 | Vorbach et al. | Jun 2002 | B1 |
6421817 | Mohan et al. | Jul 2002 | B1 |
6425068 | Vorbach et al. | Jul 2002 | B1 |
6457116 | Mirsky et al. | Sep 2002 | B1 |
6477643 | Vorbach et al. | Nov 2002 | B1 |
6480937 | Vorbach et al. | Nov 2002 | B1 |
6480954 | Trimberger et al. | Nov 2002 | B2 |
6513077 | Vorbach et al. | Jan 2003 | B2 |
6519674 | Lam et al. | Feb 2003 | B1 |
6526520 | Vorbach et al. | Feb 2003 | B1 |
6538468 | Moore | Mar 2003 | B1 |
6539477 | Seawright | Mar 2003 | B1 |
6542998 | Vorbach et al. | Apr 2003 | B1 |
6571381 | Vorbach et al. | May 2003 | B1 |
6657457 | Hanrahan et al. | Dec 2003 | B1 |
6687788 | Vorbach et al. | Feb 2004 | B2 |
6697979 | Vorbach et al. | Feb 2004 | B1 |
6782445 | Olgiati et al. | Aug 2004 | B1 |
6836839 | Master et al. | Dec 2004 | B2 |
6883084 | Donohoe | Apr 2005 | B1 |
6895452 | Coleman et al. | May 2005 | B1 |
20020038414 | Taylor et al. | Mar 2002 | A1 |
20020143505 | Drusinsky | Oct 2002 | A1 |
20020144229 | Hanrahan | Oct 2002 | A1 |
20020165886 | Lam | Nov 2002 | A1 |
20030014743 | Cooke et al. | Jan 2003 | A1 |
20030046607 | Vorbach | Mar 2003 | A1 |
20030052711 | Taylor et al. | Mar 2003 | A1 |
20030055861 | Lai et al. | Mar 2003 | A1 |
20030056085 | Vorbach | Mar 2003 | A1 |
20030056091 | Greenberg | Mar 2003 | A1 |
20030056202 | Vorbach | Mar 2003 | A1 |
20030093662 | Vorbach et al. | May 2003 | A1 |
20030097513 | Vorbach et al. | May 2003 | A1 |
20030123579 | Safavi et al. | Jul 2003 | A1 |
20030135686 | Vorbach et al. | Jul 2003 | A1 |
20040015899 | May et al. | Jan 2004 | A1 |
20040025005 | Vorbach et al. | Feb 2004 | A1 |
Number | Date | Country |
---|---|---|
42 21 278 | Jan 1994 | DE |
44 16 881 | Nov 1994 | DE |
196 51 075 | Jun 1998 | DE |
196 54 593 | Jul 1998 | DE |
196 54 595 | Jul 1998 | DE |
196 54 846 | Jul 1998 | DE |
197 04 044 | Aug 1998 | DE |
197 04 728 | Aug 1998 | DE |
197 04 742 | Sep 1998 | DE |
198 07 872 | Aug 1999 | DE |
198 61 088 | Feb 2000 | DE |
199 26 538 | Dec 2000 | DE |
100 28 397 | Dec 2001 | DE |
100 36 627 | Feb 2002 | DE |
101 29 237 | Apr 2002 | DE |
102 04 044 | Aug 2003 | DE |
0 221 360 | May 1987 | EP |
0 428 327 | May 1991 | EP |
0 477 809 | Apr 1992 | EP |
0 539 595 | May 1993 | EP |
0 628 917 | Dec 1994 | EP |
0 678 985 | Oct 1995 | EP |
0 686 915 | Dec 1995 | EP |
0 707 269 | Apr 1996 | EP |
0 735 685 | Oct 1996 | EP |
0 835 685 | Oct 1996 | EP |
0 748 051 | Dec 1996 | EP |
0 726 532 | Jul 1998 | EP |
0 926 594 | Jun 1999 | EP |
1 102 674 | Jul 1999 | EP |
1 146 432 | Oct 2001 | EP |
WO 9004835 | May 1990 | WO |
WO9011648 | Oct 1990 | WO |
WO9311503 | Jun 1993 | WO |
WO9408399 | Apr 1994 | WO |
WO9500161 | Jan 1995 | WO |
WO9526001 | Sep 1995 | WO |
WO9826356 | Jun 1998 | WO |
WO9828697 | Jul 1998 | WO |
WO9829952 | Jul 1998 | WO |
WO9831102 | Jul 1998 | WO |
WO9835299 | Aug 1998 | WO |
WO9932975 | Jul 1999 | WO |
WO9940522 | Aug 1999 | WO |
WO9944120 | Sep 1999 | WO |
WO9944147 | Sep 1999 | WO |
WO0017771 | Mar 2000 | WO |
WO0077652 | Dec 2000 | WO |
WO0213000 | Feb 2002 | WO |
WO0221010 | Mar 2002 | WO |
WO0229600 | Apr 2002 | WO |
WO0271248 | Sep 2002 | WO |
WO0271249 | Sep 2002 | WO |
WO02103532 | Dec 2002 | WO |
WO0317095 | Feb 2003 | WO |
WO0323616 | Mar 2003 | WO |
WO0325781 | Mar 2003 | WO |
WO0332975 | Apr 2003 | WO |
WO0336507 | May 2003 | WO |
Number | Date | Country | |
---|---|---|---|
20030056202 A1 | Mar 2003 | US |