This application claims the benefit under 35 U.S.C. ยง119(a) of Korean Patent Application No. 10-2010-0025915, filed on Mar. 23, 2010, the entire disclosure of which is incorporated herein by reference for all purposes.
1. Field
The following description relates to a reconfigurable processor, and more particularly, to a reconfigurable array that may switch between multiple processing modes.
2. Description of the Related Art
In general, a reconfigurable processor has hardware that may be tailored to perform a specific task. If a task is processed only in a hardware fashion, even a slight change of the task may make the processing of the task more difficult because of the fixed functionality of the hardware. On the other hand, if a task is processed only in a software fashion, changes to the task through software modification may be adjusted for but there is a disadvantage in that the task is processed slower as compared to the hardware-based processing.
Therefore, there is a desire for a reconfigurable processor that combines the advantages of hardware-based and software-based processing.
In one general aspect, there is provided a reconfigurable array comprising a processing core including a plurality of processing elements that each have a respective local register file, the processing core executing instructions in a first execution mode and a second execution mode, a central register file to store data related to the instructions, and a controller to control conversion between the first execution mode and the second execution mode of the processing core, and to distribute data to be used in the first execution mode from among the data stored in the central register file to the local register files, in response to a request for conversion from the second mode to the first mode.
In the first execution mode, the processing core may execute the instructions using data from the local register files, and in the second execution mode, the processing core may execute the instructions using data from the central register file.
In the first execution mode, the processing core may execute an instruction related to a loop operation, and in the second execution mode, the processing core may execute an is instruction related to another operation that is not a loop operation.
In the first execution mode, the processing core may execute the instruction related to the loop operation using all of the plurality of processing elements, and in the second execution mode, the processing core may execute the instruction related to the other operation that is not a loop operation using some of the plurality of processing elements.
The controller may analyze a data flow graph representing an execution sequence of the instructions, and may calculate scheduling priorities of data to be used in the first execution mode based on the number of edges of nodes on the data flow graph.
The controller may analyze an interconnection status of the local register files, and may calculate mapping priorities of the local register files using position scores based on the interconnection status of the local register files.
The controller may copy data to be used in the first execution mode to the local register files, based on the scheduling priorities and the mapping priorities.
In another aspect, there is provided a method of controlling a reconfigurable array having at least a first execution mode and a second execution mode, the method comprising detecting a mode conversion request, and distributing data to local register files, which is to be used in the first execution mode from among data stored in a central register file, in response to the mode conversion request, wherein the central register file stores data related to instructions, and the local register files are respectively formed in a plurality of processing elements included in the reconfigurable array.
The first execution mode may be for executing a loop operation and the second execution mode may be for executing another operation that is not a loop operation.
The detecting of the mode conversion request may comprise determining whether a request for conversion from the second execution mode to the first execution mode is received.
The distributing of the data to be used in the first mode may comprise analyzing a data flow graph representing an execution sequence of the instructions, and calculating scheduling priorities of data to be used in the first execution mode based on the number of edges of nodes on the data flow graph, analyzing an interconnection status of the local register file, and calculating mapping priorities of the local register files based on position scores of the local register files that are based on the interconnection status of the local register files, and copying the data to be used in the first execution mode to the local register files, based on the scheduling priorities and mapping priorities of the local register files.
In another aspect, there is provided a reconfigurable processor comprising a processing core comprising a plurality of processing elements for executing instructions, wherein each processing element has a respective local register file for storing data, and the processing core operates in at least a first execution mode and a second execution mode, a central register file for storing data used to execute the instructions and the execution results of the instructions, and a controller for controlling mode-conversion of the processing core such that when the controller receives a request to switch the processing core from the first execution mode to the second execution mode the controller distributes data from the central register file to the respective local registers files based on detected data to be used in the second execution mode.
The first execution mode may be a very long instruction word (VLIW) mode for executing instructions including loop operations and the second execution mode may be a coarse-grained array (CGA) mode for executing instructions that do not include loop operations.
In the first execution mode the plurality of processing elements may access the central register file to read data therefrom to execute the instructions or to store data thereto, and in the second execution mode the plurality of processing elements may access their respective local register files to read data therefrom to execute the instructions or to store data thereto.
While in the first execution mode, the controller may analyze a data flow graph representing an execution sequence of the instructions to detect data to be used in the second execution mode.
The controller may analyze the data flow graph to determine priorities for scheduling data based on the number of edges of nodes of the data flow graph.
The controller may analyze the interconnections of each respective local register file with respect to other local register files to determine mapping priorities of the local register files, and the controller may copy data to the local register files based on the respective mapping priorities of each of the local register files.
The controller may analyze the data flow graph to determine priorities for scheduling data based on the number of edges of nodes of the data flow graph, and the controller may copy data to the local register files based on the mapping priorities of each of the local register files and the scheduling priorities of the data.
Other features and aspects may be apparent from the following description, the drawings, and the claims.
Throughout the drawings and the description, unless otherwise described, the same drawing reference numerals should be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated for clarity, illustration, and convenience.
The following description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. Accordingly, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein may be suggested to those of ordinary skill in the art. Also, descriptions of well-known functions and constructions may be omitted for increased clarity and conciseness.
Referring to
The processing core 101 may be composed of a plurality of processing elements PE#0 through PE#15. In this example, the processing core 101 includes sixteen processing elements. However, the processing core is not limited thereto. For example, the processing core may include 2 processing elements, 8 processing elements, 24 processing elements, and the like. The reconfigurable array may be a processor or may be included in a processor.
Each of the processing elements PE#0 through PE#15 has a local register file (LRF). The LRF stores data that may be used to execute instructions and also stores the execution results of the instructions.
The processing elements PE#0 through PE#15 may process instructions in parallel. For example, each of the processing elements PE#0 through PE#15 may independently process a part of an instruction that is not dependent on other parts of an instruction.
The processing elements PE#0 to PE#15 are connected to each other. For example, the output of a certain processing element PE#6 may be connected to an input of another processing element PE#11. The interconnections between the processing elements PE#0 to PE#15 may generate various combinations of processing elements. An interconnection status between the processing elements PE#0 to PE#15 is referred to as configuration information of the processing core 101. The configuration information of the processing core 101 may be stored in a configuration memory 111. The interconnection status between the processing elements PE#0 to PE#15 may vary based on the configuration information stored in the configuration memory 201. The configuration of the processing core 101 may be optimized for a specific process.
The processing core 101 executes instructions using multiple instruction execution modes. For example, the instruction execution modes may include a Coarse-Grained Array (CGA) mode is for executing instructions associated with loop operations, and a Very Long Instruction Word (VLIW) mode for executing instructions associated with other operations that are not loop operations. For example, the processing core 101 may execute loop operations using one or more of the processing elements PE#0 to PE#15 in the CGA mode, for example, using all of each of the processing elements PE#0 to PE#15. As another example, the processing core 101 may execute other operations that are not loop operations using one or more of the processing elements PE#0 to PE#15 in the VLIW mode. For example, in the VLIW mode, a VLIW instruction word may be fetched from a data memory 113 to a VLIW instruction memory 112. The VLIW instruction word includes a plurality of instructions that are to be processed in parallel. For example, four processing elements PE#0 to PE#3 may process instructions in parallel in the VLIW mode.
The central register file 102 stores various data that may be used to execute the instructions and also stores the execution results of the instructions. For example, in the VLIW mode, the processing elements PE#0 to PE#3 may access the central register file 102 to read data therefrom or store the execution results therein. As another example, in the CGA mode, the processing elements PE#0 to PE#15 may access their respective local register files LRF to read data therefrom or to store the execution results therein.
In order for the individual processing elements PE#0 to PE#15 to use their local register files LRF, in the CGA mode, data to be used in the CGA mode should be copied from the central register file 102 to the local register files LRF in advance or as necessary.
The controller 103 controls mode-conversion of the processing core 101 and may distribute data to be used in the CGA mode to the local register files LRF, for example, when the processing core 101 is mode-converted from the VLIW mode to the CGA mode.
As an example, the controller 103 may detect data (hereinafter, referred to as live data) to be used in the CGA mode by analyzing a data flow graph (DFG) that identifies an instruction execution sequence. For example, the controller 103 may copy the live data to a specific register file LRF based on the interconnection status between the local register files LRF. The scheduling of live data or mapping of the local register files LRF is further described later.
For distribution of live data, the controller 103 may copy all of the live data to the local register files LRF before conversion into the CGA mode, in response to an execution mode conversion request such as an interrupt. Also, the controller 103 may predict a time at which live data will be used in the CGA mode and the controller 103 may copy the live data to the local register files LRF based on the predicted time of demand such that the data is available at the time the live data is needed.
In addition, the controller 103 may copy live data stored in the local register files LRF to the central register file 102, when the CGA mode is converted to the VLIW mode.
In the reconfigurable array 100, the central register file 102 is separated between the CGA mode and the VLIW mode, and the central register file 102 is not shared therebetween. Also, in the CGA mode, because the individual processing elements PE#0 to PE#15 execute instructions using the local register files LRF, flexible CGA scheduling may be performed regardless of the locations and the number of processing elements (for example, PE#0 to PE#3).
Referring to the example shown in
Referring to
For example, all live data may be copied before conversion from the VLIW mode to the CGA mode, however, it is also possible that after entering the CGA mode, the live data may be copied.
Referring to
When the CGA mode is converted to the VLIW mode again, the live data in the local register files LRF may be copied to the central register file 102, and the VLIW architecture 201 illustrated in
In
Referring to
The controller 103 may analyze the interconnection relationship between the local register files LRF. For example, the controller 103 may calculate the number of connections of each local register file LRF with respect to the other local registers LRF. The controller 103 may determine position scores of the individual local register files LRF in proportion to the number of the connections of the local register files LRF. For example, the controller 103 may assign a higher position score to a local register file LRF that has a greater number of connections to other register files LRF.
Accordingly, the controller 103 may copy live data that has a highest priority based on the number of edges to a local register file LRF that has a highest position score. The controller 103 may copy live data that has a second-highest priority to a local register file LRF that also has is a second-highest position score.
Referring to
When the mode conversion request is a request for conversion from the VLIW mode to the CGA mode, in 402 the reconfigurable array distributes data to be used in the CGA mode from among data stored in the central register file 102 to the local register files LRF, in response to the mode conversion request. For example, the controller 103 may copy live data to the local register files LRF before conversion from the VLIW mode to the CGA mode. The controller 103 may copy live data to the local register files LRF based on a prediction time of demand at which live data will be used in the CGA mode.
For example, the reconfigurable array control method may further include an operation of copying data in the local register files LRF to the central register file 102 and then entering the VLIW mode, when a request for conversion from the CGA mode to the VLIW mode is received.
Referring to
In 502, the interconnection status of the local register files LRF may be analyzed and mapping priorities of the local register files LRF may be calculated using position scores of the local register files LRF based on the interconnection status. For example, the controller 103 may assign relatively higher mapping priorities to local register files LRF that have more interconnections to other register files LRF.
In 503, live data is copied to the local register files LRF based on the calculated scheduling priorities and mapping priorities. For example, the controller 103 may sequentially copy live data to the local register files LRF according to the scheduling priorities.
According to the above-described embodiments, because the central register file 102 is separated between the CGA mode and the VLIW mode and is not shared between them, and in the CGA mode, instructions are executed using local register files LRF, operation efficiency in the CGA mode is enhanced and flexible CGA scheduling may be performed.
The processes, functions, methods and/or software described above may be recorded, stored, or fixed in one or more computer-readable storage media that includes program instructions to be implemented by a computer to cause a processor to execute or perform the program instructions. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of computer-readable storage media include magnetic media, such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media, such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations and methods described above, or vice versa. In addition, a computer-readable storage medium may be distributed among computer systems connected through a network and computer-readable codes or program instructions may be stored and executed in a decentralized manner.
A number of examples have been described above. Nevertheless, it should be understood that various modifications may be made. For example, suitable results may be achieved if the described techniques are performed in a different order and/or if components in a described system, architecture, device, or circuit are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Accordingly, other implementations are within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2010-0025915 | Mar 2010 | KR | national |