This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2009-049114, filed on Mar. 3, 2009; the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a compiling apparatus, a compiling method, and a program product.
2. Description of the Related Art
Conventionally, as a typical method of executing data reception and transmission between instruction sequences each consisting of a plurality of microinstructions in an arithmetic processing unit (processor) that executes instruction sequences concurrently, a method is known to configure the processor so that a register is accessible from any operation unit for input and output. Hereinafter, such configuration is called a centralized register system, and such register is called a centralized register. If the centralized register is employed, a large amount of hardware cost is needed for the processor. This is because, since data reference and data update need to be executed simultaneously between all processing units executed in parallel and register files, data buses proportional to the parallelism, i.e., the number of instructions that can be processed simultaneously and a memory having a plurality of ports are required.
On the other hand, for example, Japanese Patent Application Laid-open No. 2003-99249 discloses a configuration having a register (distributed register) that limits an accessible operation unit to reduce a hardware cost of the port or the bus, which is called a distributed register system. This technology enables distribution of register access by a predetermined data bus with respect to a distributed register. However, if a data path cannot be programmably allocated to the data bus, the configuration of the distributed register system is inferior to that of the centralized register system in terms of versatility. Moreover, a method of realizing a data path by a compiling apparatus (compiler) is not described in Japanese Patent Application Laid-open No. 2003-99249.
A pipeline register can also be considered as one type of a distributed register. There is a method of executing data reception and transmission between instruction sequences via the pipeline register. With this method, data reception and transmission between continuous instructions is realized by bypassing a resultant value of a preceding instruction to be a following input via the pipeline register based on a result of analysis of data dependency. However, the dependence analysis and the bypass control are realized by hardware.
For reducing an amount of hardware for the bypass control, a processing apparatus capable of specifying data to be bypassed in an instruction operand is disclosed in Japanese Patent Application Laid-open No. H11-65844. With this technology, it is considered that data dependence analysis is executed by a compiler or the like and data that needs to be automatically bypassed is extracted to be embedded in an instruction code. However, a method thereof and a compiling method thereof are not described.
A compiling apparatus according to an embodiment of the present invention comprises:
an instruction-sequence-hierarchy-graph generating unit that generates an instruction sequence hierarchy graph by arraying unit graphs, to each of which a data path realized by a plurality of microinstructions included in one instruction sequence is to be allocated and in each of which function units are a node and a data line between the function units is an edge, to correspond to an execution order of a plurality of instruction sequences and by connecting arrayed unit graphs with an edge corresponding to hardware path;
a data path allocating unit that allocates a data path realizing a data flow structure of a source program to each of the unit graphs constituting the instruction sequence hierarchy graph; and
an object program output unit that generates an instruction sequence group based on the data path allocated to the instruction sequence hierarchy graph.
A compiling method according to an embodiment of the present invention comprises:
generating an instruction sequence hierarchy graph by arraying unit graphs, to each of which a data path realized by a plurality of microinstructions included in one instruction sequence is to be allocated and in each of which function units are a node and a data line between the function units is an edge, to correspond to an execution order of a plurality of instruction sequences and by connecting arrayed unit graphs with an edge corresponding to the hardware path;
allocating a data path realizing a data flow structure of a source program to each of the unit graphs constituting the instruction sequence hierarchy graph; and
generating an instruction sequence group based on the data path allocated to the instruction sequence hierarchy graph.
A program product according to an embodiment of the present invention causes the computer to execute:
generating an instruction sequence hierarchy graph by arraying unit graphs, to each of which a data path realized by a plurality of microinstructions included in one instruction sequence is to be allocated and in each of which function units are a node and a data line between the function units is an edge, to correspond to an execution order of a plurality of instruction sequences and by connecting arrayed unit graphs with an edge corresponding to the hardware path;
allocating a data path realizing a data flow structure of a source program to each of the unit graphs constituting the instruction sequence hierarchy graph; and
generating an instruction sequence group based on the data path allocated to the instruction sequence hierarchy graph.
Exemplary embodiments of a compiling apparatus, a compiling method, and a program product according to the present invention will be explained below in detail with reference to the accompanying drawings. The present invention is not limited to the following embodiments.
A compiling apparatus and a compiling program according to a present embodiment generate an object program with a processor including at least one of the following characteristics as a target processor.
A processor including a distributed register as a data reception/transmission unit between instruction sequences.
A processor including a bypass as the data reception/transmission unit between instruction sequences.
A processor capable of data reception/transmission even between operation units that are cascade-connected.
A processor that includes at least one of the above characteristics and a processor that includes at least one of the above characteristics and a centralized register.
Each function unit (FU) such as an operation unit, a buffer, and a register of a target processor needs to be enabled to specify a function and input and output destination by a microinstruction. Each function unit has a different function such as one having only a calculation function or one having a data path control function. In a target processor, a function unit of input and output destination needs to be specified by a microinstruction to realize a data path in an instruction sequence and between instruction sequences.
For example, in the processor shown in
When data stored in the TR01 as a distributed register is handed to the next instruction sequence, a microinstruction “Hold TR01” is used. With this microinstruction, data stored in the TR01 remains in the TR01 without change, so that data reception/transmission to a microinstruction of the next instruction sequence can be made via the TR01. The processor shown in
The present embodiment of the present invention is mainly characterized in that a graph is generated that can easily execute allocation of data paths in an instruction sequence or between instruction sequences to generate an instruction sequence group (object program) that establishes a data path across instruction sequences in a target processor having the above characteristics.
As shown in
The architecture information input to the input receiving unit 1 can be any information so long as it is information from which an FU, a data line, and a direction of a data flow flowing in the data line can be recognized. For example, the architecture information can be an architecture diagram indicating data paths as shown in
The CPU 11 executes a compiling program 16 as a computer program compiling a source program. The display unit 14 is a display device such as a liquid crystal monitor, and displays output information for a user such as an operation screen based on an instruction from the CPU 11. The input unit 15 includes a mouse and a keyboard, from which a user inputs an operation for the compiling apparatus 10. The operation information input to the input unit 15 is sent to the CPU 11.
The compiling program 16 is stored in the ROM 12 and is loaded onto the RAM 13 via a bus line. The CPU 11 executes the compiling program 16 loaded on the RAM 13. Specifically, in the compiling apparatus 10, the CPU 11 reads out the compiling program 16 from the ROM 12, loads the compiling program 16 onto a program storing area in the RAM 13, and executes various processing, in accordance with an instruction input from the input unit 15 by a designer. The source program or the architecture information is input from an external storage device or the like. The CPU 11 executes various processing based on the source program or the architecture information input from the external storage device or the like and temporarily stores data such as the DFG and the IHG generated in the various processing in a data storing area formed in the RAM 13. The CPU 11 outputs the generated object program to the program storing area in the RAM 13, the external storage device, and the like. The compiling program 16 can be stored in a storage device such as a disk or can be loaded onto a storage device such as a disk.
The compiling program 16 executed in the compiling apparatus 10 includes the above units (the input receiving unit 1, the DFG generating unit 2, the IHG generating unit 3, the data path allocating unit 4, and the object program output unit 5). Each of the units is generated on a main storage device by loading them onto the main storage device.
The compiling program 16 executed in the compiling apparatus 10 can be provided in such a way that the compiling program 16 is stored in a computer connected to a network such as the Internet and is downloaded via the network. The compiling program 16 executed in the compiling apparatus 10 can also be provided or distributed via the network such as the Internet. Alternatively, the compiling program 16 can be incorporated in a ROM or the like in advance and provided to the compiling apparatus 10.
Next, an operation of the compiling apparatus 10 is explained.
In
In the architecture diagram shown in
The DFG generating unit 2 analyzes the source program received by the input receiving unit 1 and generates a DFG (S2).
Next, the IHG generating unit 3 generates an IHG based on the architecture information received by the input receiving unit 1 (S3).
In
Next, the IHG generating unit 3 defines a specific node (S12). The specific node represents a node that can input and output a data path across instruction sequences, i.e., an FU of a transmission source (bypass source) capable of transmitting data via a distributed register or a bypass. In an example shown in
Next, the IHG generating unit 3 prepares a plurality of node groups defined at S11 and S12, determines each set as an allocation destination of data paths realized by one instruction sequence, i.e., as a unit graph, and defines an output edge for each node belonging to each unit graph (S13). In the case of a node (cascade-connected node) other than a specific node, the IHG generating unit 3 defines an output edge in a unit graph of the same instruction sequence based on a data line and a direction of a data flow described in the architecture information. In the case of a node of a distributed register of a specific node, the IHG generating unit 3 defines an output edge with a node of the distributed register as a source and the same node belonging to a unit graph of the next instruction sequence as a destination. In the case of a node of a bypass source of a specific node, the IHG generating unit 3 defines an output edge with a node of the bypass source as a source and a node of a bypass destination belonging to a unit graph of an instruction sequence next to an instruction sequence that the node belongs to.
With the above operation, an IHG shown in
In the present embodiment, a graph as shown in
Subsequent to S3 in
The object program output unit 5 generates an instruction sequence group, i.e., an object program, based on microinstructions allocated to the IHG, and outputs it to an external storage device or the like. Because one instruction sequence is generated from each unit graph, a plurality of instruction sequences is generated from an IHG including a plurality of unit graphs.
According to the present embodiment, for a processor that includes a hardware path, such as a distributed register and a bypass, realizing a data path across instruction sequences, a graph is defined in which a destination of a data path from an operation unit and a register concerning a distributed register and a bypass is set to a node in an instruction sequence or a node out of the instruction sequence capable of data passing. A data path is established between nodes in the defined graph. Thus, it is possible to generate an object program that receives and transmits data across instruction sequences.
In the above explanation, a depth first search is explained as an example of a method of searching a data path using an IHG; however, it is not limited thereto. For example, a binary tree search or a node-number first search can be employed as the method of searching a data path.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2009-049114 | Mar 2009 | JP | national |