This application claims the benefit under 35 U.S.C. §119(a) of a Korean Patent Application No. 10-2009-13529, filed Feb. 18, 2009, the disclosure of which is incorporated herein by reference in its entirety for all purposes.
1. Field
The following description relates to a data processing system, and more particularly, to an apparatus and method for processing code into a form suitable for use by a processor.
2. Description of the Related Art
A compiler can process a line of text written in a specified programming language and convert the line of text into a machine language or code that can be used by a computer. When developing a program in a language such as C or Pascal, for example, a programmer writes the lines of text one by one by using an editor. These lines of text are called source code. After writing the source code, the programmer executes a compiler that understands the language of the source code.
The output of this compiling process is called target code or a target module. The target code is machine code that can be processed or executed by a processor on an instruction-by-instruction basis.
According to one aspect, a compiling apparatus includes an analysis unit configured to determine respective data widths of operands and connections in a data flow graph by using type information regarding a type of the operands, and a data processing unit configured to provide instructions, which are to be executed by a processor comprising heterogeneous components that support different data widths, based on the determined data widths of the operands and the connections.
The heterogeneous components may include at least one of a plurality of functional units to process data having different data widths, a plurality of register files to store data having different data widths, and connecting wires suitable for different data widths.
The analysis unit may initialize data widths of input and output nodes in the data flow graph based on the type information of the operands and determine data widths of unknown operands and connections using a fixed-point algorithm.
The data processing unit may select instructions based on the determined data widths of the operands and the connections in the data flow graph. The data processing unit may determine functional units which will execute the selected instructions, respectively. The data processing unit may allocate registers based on the determined data widths of the operands and the connections in the data flow graph.
If the processor is a coarse grained array (CGA) processor, the data processing unit may determine data widths of input and output operands and connections of nodes existing on the CGA processor's routing paths, which are used in executing the selected instructions, based on the determined data widths of the operands and the connections in the data flow graph.
The processor may be a very long instruction word (VLIW) processor or a CGA processor, or a combination of both.
According to another aspect, a compiling method of a compiling apparatus, includes determining, by an analysis unit of the compiling apparatus, respective data widths of operands and connections in a data flow graph by using type information of the operands, and providing, by a data processing unit of the compiling apparatus, instructions, which are to be executed by a heterogeneous processor comprising heterogeneous components that support different data widths, based on the determined data widths of the operands and the connections.
The heterogeneous components may include at least one of a plurality of functional units which process data having different data widths, a plurality of register files which store data having different data widths, and connecting wires suitable for different data widths.
The determining of the respective data widths of the operands and the connections in the data flow graph may include initializing data widths of input and output nodes in the data flow graph based on the type information of the operands, and determining data widths of unknown operands and connections using a fixed-point algorithm.
The providing of the instructions may include selecting instructions based on the determined data widths of the operands and the connections in the data flow graph.
The providing of the instructions may further include determining functional units which will respectively execute the selected instructions.
The providing of the instructions may further include allocating registers based on the determined data widths of the operands and the connections in the data flow graph.
The providing of the instructions may include, if the processor is a coarse grained array (CGA) processor, determining data widths of input and output operands and connections of nodes existing on the CGA processor's routing paths, which are used for execution of the selected instructions, based on the determined data widths of the operands and the connections in the data flow graph.
The heterogeneous processor may be a very long instruction word (VLIW) processor or a CGA processor
Other features and aspects will be apparent from the following description, the drawings, and the claims.
Throughout the drawings and detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. Accordingly, various changes, modifications, and equivalents of the systems, apparatuses and/or methods described herein will be suggested to those of ordinary skill in the art. Descriptions of well-known functions and constructions may be omitted for increased clarity and conciseness.
The processor 100 may include heterogeneous components to process data having multi-width data paths. As long as the processor 100 includes heterogeneous components to process data having different data widths, it can be implemented as a very long instruction word (VLIW) processor, a coarse grained array (CGA) processor, a reduced instruction set computer (RISC), and the like. Here, the heterogeneous components may include at least one of a plurality of functional units which process data having different data widths, a plurality of register files which store data having different data widths, and connecting wires suitable for different data widths.
Referring to
The functional units 130 through 135 and 150 through 155 carry out operations. The functional units 130 through 135 are 64-bit functional units which receive data having a data width of 64 bits, perform an operation on the data, and output the operation result. The functional units 150 through 155 are 32-bit functional units which receive data having a data width of 32 bits, perform an operation on the data, and output the operation result.
The functional units 130 through 135 and 150 through 155 may receive data from different sources and send their data processing results to different destinations. While the functional units 130 through 135 are shown in this arrangement to process data having a data width of 64 bits and the functional units 150 through 155 are shown to process data having a data width of 32 bits are shown in
Each of the register files 110, 120, 140 through 143, 160, and 161 is a collection of registers and temporarily stores data used in the functional units 130 through 135, and 150 through 155. The register file 110 is a 64-bit central register file, and the register file 120 is a 32-bit central register file. The register files 140 through 143 are distributed 64-bit register files, and the register files 160 and 161 are distributed 32-bit register files.
Generally, components of a conventional processor, such as functional units, register files, and connecting wires, are suitable for the same data width. Thus, when processing data having data widths of 32 bits or less, while 32-bit functional units should ideally be used, a conventional processor having only 64-bit functional units may process the data using the 64-bit functional units. However, this is not desirable in terms of semiconductor size and energy efficiency for functional units capable of processing data having large data widths to process data having small data widths. By contrast, an exemplary multi-width heterogeneous processor that includes heterogeneous components (such as heterogeneous functional units, heterogeneous register files, heterogeneous data connecting wires, and heterogeneous multiplexers) as shown in
Referring to
The front end 210 reads source code and converts the source code into intermediate code. The intermediate code is code suitable for optimization, into which a compiler reads and parses source code. After optimization, the intermediate code is converted into assembly code.
The back end 220 receives the intermediate code, performs various optimizations on the intermediate code, and outputs assembly code or binary machine code. The back end 220 may include an analysis unit 222 and a data processing unit 224.
The analysis unit 222 analyzes the intermediate code of the source code to implement various known or to be known optimization methods.
The analysis unit 222 generates a data flow graph showing operations, which are to be mapped onto a reconfigurable array, and data dependency between the operations. According to an aspect, inputs, operations, and outputs are represented as nodes, and data flows are represented as connections in a data flow graph.
The analysis unit 222 determines data widths of input and output operands and connections of each node by using information regarding the type of operands (also referred to as “type information”) that are generated when the source code is converted into the intermediate code. The type information of an operand (for example, a variable value, a constant value, a character, and the like) denotes a value indicating the number of bits of the operand.
The analysis unit 222 initializes data widths of input and output nodes in a data flow graph based on type information of operands. Then, the analysis unit 222 repeatedly performs a fixed-point algorithm (or a fixed-point iteration) on initially set values until values resulting from the fixed-point algorithm do not change, that is, until data widths of unknown operands and connections do not change. Of available data widths, minimal bit widths are determined for unknown operands and connections whose data widths are unknown. Through the above process, data widths of all operands and connections in the data flow graph are determined.
Information about the determined data width of each operand or connection is added to each node or connection in the data flow graph. This information is later used in the process of selecting and scheduling instructions and allocating registers.
The data processing unit 224 provides instructions to be executed on the heterogeneous processor 100 by using the determined data widths of the input and output operands and the connection of each node. The data processing unit 224 selects instructions, which are to be executed by the heterogeneous processor 100, and places or maps operations to functional units. Referring to
The instruction selection unit 232 selects instructions based on the determined data widths of the operands and the connections in the data flow graph. Instructions to be selected may be stored in advance, in the form of instruction sets, in a predetermined storage space. For example, if an input operand of a node which performs an addition operation has a data width of 30 bits and if an output operand thereof has a data width of 32 bits, a 32-bit add instruction is selected.
The instruction scheduling unit 234 determines which functional unit will execute which instruction selected. The register allocation unit 236 determines registers based on the determined data widths of the operands and the connections in the data flow graph.
If the heterogeneous processor 100 is a CGA processor, even when two nodes, which correspond to respective operations, are connected by a single connection in a data flow graph, a functional unit to which one of the two nodes is mapped and another functional unit to which the other one is mapped may be separated from each other. In this case, the instruction scheduling unit 234 determines data widths of input and output operands and connections on the CGA processor's routing paths, which are used in executing the instructions selected, based on the determined data widths of the operands and the connections in the data flow graph. That is, for data delivery, the instruction scheduling unit 234 selects nodes (for example, functional units or register files) on routing paths between nodes and determines data widths of input and output operands and connections (for example, data connecting wires) of the selected nodes.
In each of the nodes 301 through 309, a number (or numbers) shown above characters (for example, “in” (input), “const” (input constant value), “op” (operation), and “out” (output)) indicates a data width of an operand input to the node, and a number shown under the characters indicates a data width of an operand output from the node. When pre-processed, source code gives information regarding the type of input and output nodes. That is, data widths of operands of the input nodes 301, 302, 303, and 306 and the output node 309 are determined based on the pre-processed source code.
In addition, data widths of input and output operands of the operation nodes 304, 305, 307, and 308 between the input nodes 301, 302, 303, and 306 and the output node 309 are determined using the fixed-point algorithm. Once the data widths of the input and output operands of the operation nodes 304, 305, 307, and 308 are determined, an instruction for each of the operation nodes 304, 305, 307 and 308 can be determined.
For example, since 64-bit data is input to and output from each of the operation nodes 304, 305, and 308, a 64-bit instruction is selected for each of the operation nodes 304, 305, and 308. Also, a 32-bit instruction is selected for the operation node 307 because 32-bit data is input to and output from the operation node 307.
Once instructions are selected, they are mapped to respective functional units that will execute the selected instructions. Referring to
The exemplary data flow graph of
That is, if the data flow graph of
Generally, a CGA processor performs a repetitive operation (such as loop operations, for example) which causes large data throughput according to control of a core such as VLIW. The CGA processor typically includes a plurality of functional units. In addition, the CGA processor makes the most of instruction-level parallelism between operations existing in an application to enhance its performance. That is, the CGA processor distributes operations, which may be carried out simultaneously, to a plurality of the functional units therein in order to perform the operations at the same time, thereby reducing the time required to execute an application. Since the functional units in the CGA processor are sparsely connected, operand routing between operations as well as operation placement should be taken into consideration when scheduling instructions.
Thus, referring to
The nodes 301 through 309 in the exemplary data flow graph of
In addition, if functional units, which respectively execute the operation node 305 and the operation node 307, are separated from each other in the CGA processor, a routing path should also be created to deliver the result of executing the operation node 305 to the operation node 307. In this case, the nodes 501 and 502 and a node 504 correspond to functional units or register files on the routing path for delivering the result of executing the operation node 305 to the operation node 307.
In operation 610, respective data widths of operands and connections in a data flow graph are determined based on type information of the operands. Data widths of input and output nodes in the data flow graph may be initialized based on the type information of the operands, and respective data widths of unknown operands and connections may be determined using fixed point algorithm.
In operation 620, instructions to be executed by a heterogeneous processor, which includes heterogeneous components supporting different data widths, are provided to the heterogeneous processor based on the determined data widths of the operands and the connections. As described above, the heterogeneous components may include at least one of a plurality of functional units which process data having different data widths, a plurality of register files, and connecting wires suitable for different data widths.
Instructions may be selected based on the determined data widths of the operands and the connections in the data flow graph, and functional units, which can respectively process the selected instructions, may be determined. In addition, registers may be allocated based on the determined data widths of the operands and the connections in the data flow graph.
If the heterogeneous processor is a CGA processor, data widths of input and output operands and connections of nodes on the CGA processor's routing paths, which are used for execution of instructions selected, may be determined based on the determined data widths of the operands and the connections in the data flow graph.
According to example(s) described above, source code may be more efficiently compiled in consideration of data widths, so that they can be used in a processor capable of performing heterogeneous operations while requiring a smaller chip region and consuming less power.
The subject matter disclosed herein including the methods described above may be recorded, stored, or fixed in one or more computer-readable storage media that includes program instructions to be implemented by a computer to cause a processor to execute or perform the program instructions. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of computer-readable media include magnetic media, such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media, such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations and methods described above, or vice versa. In addition, a computer-readable storage medium may be distributed among computer systems connected through a network and computer-readable codes or program instructions may be stored and executed in a decentralized manner.
Also, codes and code segments for accomplishing the disclosed subject matter can be construed by programmers skilled in the art to which the present subject matter pertains.
A number of exemplary embodiments are described above. Nevertheless, it will be understood that various modification may be made. For example, suitable results may be achieved if the described techniques are performed in a different order and/or if components in a described system, architecture, device, or circuit are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Accordingly, other implementations are within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2009-0013529 | Feb 2009 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
5355494 | Sistare et al. | Oct 1994 | A |
5588152 | Dapp et al. | Dec 1996 | A |
6018759 | Doing et al. | Jan 2000 | A |
6233540 | Schaumont | May 2001 | B1 |
6606588 | Schaumont | Aug 2003 | B1 |
6964029 | Poznanovic et al. | Nov 2005 | B2 |
7134120 | Hammes | Nov 2006 | B2 |
7278137 | Fuhler | Oct 2007 | B1 |
7313671 | Leijten | Dec 2007 | B2 |
7398347 | Pechanek et al. | Jul 2008 | B1 |
7543284 | Bolton et al. | Jun 2009 | B2 |
7996827 | Vorbach et al. | Aug 2011 | B2 |
8166450 | Fuhler | Apr 2012 | B2 |
8627335 | Kosche | Jan 2014 | B2 |
8686475 | Vorbach | Apr 2014 | B2 |
20010039610 | Busa et al. | Nov 2001 | A1 |
20020194236 | Morris | Dec 2002 | A1 |
20030041163 | Rhoades et al. | Feb 2003 | A1 |
20030142818 | Raghunathan et al. | Jul 2003 | A1 |
20030154358 | Seong | Aug 2003 | A1 |
20030229482 | Cook | Dec 2003 | A1 |
20040006584 | Vandeweerd | Jan 2004 | A1 |
20040215940 | Heishi et al. | Oct 2004 | A1 |
20050049843 | Hewitt | Mar 2005 | A1 |
20060005173 | Eng | Jan 2006 | A1 |
20060026578 | Ramchandran et al. | Feb 2006 | A1 |
20060259744 | Matthes | Nov 2006 | A1 |
20070043531 | Kosche et al. | Feb 2007 | A1 |
20070250689 | Aristodemou | Oct 2007 | A1 |
20080120493 | Yoo et al. | May 2008 | A1 |
20080126812 | Ahmed et al. | May 2008 | A1 |
20090172630 | Wang et al. | Jul 2009 | A1 |
20100122105 | Arslan et al. | May 2010 | A1 |
20100325608 | Radigan | Dec 2010 | A1 |
Number | Date | Country |
---|---|---|
2006-505055 | Feb 2006 | JP |
2003-0067892 | Aug 2003 | KR |
1020050030014 | Mar 2005 | KR |
10-2005-0037575 | Apr 2005 | KR |
Entry |
---|
“Data Flow Graphs Intro” by Gang Quan; Published by Maseeh College of Engineering and Computer Science at PSU; 31 Pages; Access Date: Jan. 12, 2015. |
Korean Office Action issued on Jan. 23, 2015 in Korean Application No. KR 10-2009-0013529 (3 pages in English, 4 pages in Korean). |
Number | Date | Country | |
---|---|---|---|
20100211760 A1 | Aug 2010 | US |