This application is based upon and claims the benefits of priority from the prior Japanese Patent Application No. 2004-186398, filed on Jun. 24, 2004, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
This invention relates to a processor and a semiconductor device, and more particularly to a processor and a semiconductor device that include reconfigurable processing circuits for performing predetermined processing.
2. Description of the Related Art
Conventionally, there has been proposed a processor comprising a CPU (Central Processing Unit) and a reconfigurable composite unit of multiple functional units. This processor analyzes a program described e.g. in the C language and divides the program into portions to be processed by the CPU and portions to be processed by the composite unit of multiple functional units, to thereby execute the program at high speed.
VLIW (Very Long Instruction Word) or superscalar processors incorporate a plurality of functional units, and process a single data flow using the functional units. Therefore, these processors are very tight in operational connections among the functional units. In contrast, reconfigurable processors have a group of functional units connected as in a simple pipeline or connected by a dedicated bus with a certain degree of freedom secured therefor, so as to enable a plurality of data flows to be processed. In the reconfigurable processors, it is of key importance how configuration data for determining the configuration of the functional unit group should be transferred for operations of the functional units.
A condition for switching the configuration of the composite unit of multiple functional units is generated e.g. when the functional units of the composite unit perform a certain computation and the result of the computation matches a predetermined condition. The switching of the configuration of the composite unit of multiple functional units is controlled by the CPU of the processor. The processor has a plurality of banks (caches) for storing configuration data, and achieves instantaneous switching of the configuration of the composite unit by switching between the caches (see e.g. International Publication No. WO01/016711 (Japanese Patent Application No. 2001-520598)).
It should be noted that there has also been proposed a processor which is capable of measuring the performance of modules for executing various processes and that of the processor itself, and changing the configuration of the modules or the processor based on the results of the measurement to thereby set configuration suitable for a program execution of which is instructed by a user (see e.g. Japanese Unexamined Patent Publication (Kokai) No. 2002-163150).
However, in the above-described conventional processor, the caches are controlled by middleware for the CPU (i.e. a function of the CPU), and therefore there is a problem that it is necessary for a user to set storage of configuration data in the caches, on a program in advance.
In a first aspect of the present invention, there is provided a processor that includes reconfigurable processing circits for performing predetermined processing. This process is characterized by comprising a cache operation information acquisition section that acquires cache operation information from configuration data that is currently selected, the configuration data defining a configuration of the processing circuits, the cache operation information defining an operation of a cache, and a cache control section that controls the operation of the cache storing the configuration data, based on the cache operation information.
In a second aspect of the present invention, there is provided a semiconductor device that includes reconfigurable processing circuits for performing predetermined processing. This semiconductor device is characterized by comprising a cache operation information acquisition section that acquires cache operation information from configuration data that is currently selected, the configuration data defining a configuration of the processing circuits, the cache operation information defining an operation of a cache, and a cache control section that controls the operation of the cache storing the configuration data, based on the cache operation information.
The above and other features and advantages of the present invention will become apparent from the following description when taken in conjunction with the accompanying drawings which illustrate preferred. embodiments of the present invention by way of example.
The present invention is to provide a processor and a semiconductor device in which a compiler is capable of determining storage of configuration data in caches.
Hereafter, the principles of the present invention will be described in detail with reference to
The processor shown in
The configuration data 1 contains circuit configuration information defining the configuration of the reconfigurable processing circuits 2a, 2b, 2c, 2d, . . . , and cache operation information defining the operation of the cache 5.
The cache operation information acquisition section 3 acquires the cache operation information from the configuration data 1 to be executed.
The cache control section 4 controls the operation of the cache 5 storing the configuration data 1, based on the cache operation information acquired by the cache operation information acquisition section 3. The storage device 6 stores the configuration data 1, and hence, for example, the cache control section 4 controls whether the configuration data 1 is to be read out from the cache 5 or from the storage device 6. Further, the cache control section 4 controls the cache 5 such that the configuration data 1 read out from the storage device 6 is stored in the cache 5.
As described above, according to the present invention, the configuration data is configured to contain the cache operation information, and the operation of the cache is controlled based on the cache operation information contained in the configuration data. With this configuration, a compiler is capable of causing the cache operation information to be contained in the configuration data, based on a prediction on the operation of the program, and determining storage of the configuration data in the cache.
Next, a preferred embodiment of the present invention will be described in detail with reference to drawings.
As shown in
As shown in the memory map 50, the program is divided into areas for commands and data to be executed by the CPU 40, and an area for configuration data, i.e. data of configuration to be executed by the sequence section 20 on the processing circuit group 30. The CPU 40 executes a program formed by commands and data shown in the memory map 50, and the sequence section 20 configures the processing circuits of the processing circuit group 30 into a predetermined manner based on the configuration data shown in the memory map 50, for execution of the program.
The processing circuit group 30 will now be described in detail.
As shown in
The sequence section 20 outputs configuration data defining the configuration of the processing circuit group 30 to the processing circuit group 30, in a predetermined sequence. The processing circuit group 30 changes the configuration of the processing circuits based on the configuration data output from the sequence section 20, and fixes the configuration of the processing circuits. The processing circuits of the processing circuit group 30 change their operations and connections based on the configuration data output from the sequence section 20, to thereby change the configuration thereof and fix the same.
For example, the functional units 31a, 31b . . . , the counters 32a, 32b . . . , the external interface 33, and the connection switch 34 of the processing circuit group 30 change their operations based on the configuration data. Further, the connection switch 34 changes connections between the functional units 31a, 31b . . . , the counters 32a, 32b . . . , and the external interface 33 based on the configuration data.
The processing circuit group 30 executes computations of a program, and then outputs a switching condition signal to the sequence section 20 when the result of the computations matches a predetermined condition. Let it be assumed that the processing circuit group 30 repeatedly performs a computation N times on data input via the external interface 33. The functional units 31a, 31b . . . repeatedly calculate the input data, and the counter 32a counts up the number of times of the operation. When the count of the counter 32a reaches N, the counter 32a outputs the switching condition signal to the sequence section 20.
When receiving the switching condition signal, the sequence section 20 outputs configuration data to be executed next to the processing circuit group 30, and the processing circuit group 30 reconfigures the processing circuits based on the configuration data. Thus, the processing circuits for executing a user program are configured in the processing circuit group 30 for high-speed execution of the program.
Next, the sequence section 20 will be described in detail.
As shown in
The next state-determining section 21 stores numbers (state numbers) indicative of configuration data (including a plurality of candidates) to be executed next. These state numbers are contained in configuration data, and the state number of configuration data to be executed next can be known by referring to configuration data currently being executed. Further, the next state-determining section 21 receives the switching condition signal from the processing circuit group 30 appearing in
The operation-determining section 22 stores an operation mode of configuration data currently being executed. The operation-determining section 22 controls operations of the cache section 25 according to the operation mode. The operation mode includes e.g. a simple cache mode in which configuration data previously cached in the cache section 25 is used, and a look-ahead mode in which configuration data of a next state number to be executed next is pre-read and stored in the cache section 25.
For example, in the simple cache mode, when the state number of configuration data to be executed is determined in response to the switching condition signal, the operation-determining section 22 determines whether the configuration data associated with the state number is stored in the cache section 25 (i.e. whether a cache hit occurs). If a cache hit occurs, the operation-determining section 22 controls the cache section 25 such that the configuration data is output from the cache section 25, whereas if no cache hit occurs, the operation-determining section 22 controls the address-generating section 23 such that the configuration data is output from the RAM 24. The configuration data output from the RAM 24 is delivered to the processing circuit group 30 via the cache section 25.
In the look-ahead mode, the operation-determining section 22 reads out a next state number stored in the next state-determining section 21, and determines whether a cache hit occurs as to configuration data associated with the next state number. If no cache hit occurs, the operation-determining section 22 reads out the configuration data from the RAM 24, and stores the same in the cache section 25 in advance, whereas if a cache hit occurs, the operation-determining section 22 controls the cache section 25 such that the configuration data is output therefrom. In the look-ahead mode, when processing of a program based on configuration data currently being executed takes a long time, candidate configuration data to be executed next is stored in the cache section 25 in advance during execution of the current program processing to thereby speed up program processing.
The address-generating section 23 receives a state number output from the operation-determining section 22 and a ready signal output from the cache section 25. The address-generating section 23 outputs the address of configuration data associated with the state number to the RAM 24 in response to the ready signal from the cache section 25.
The RAM 24 stores configuration data defining the configuration of the processing circuit group 30 in
The cache section 25 stores configuration data output from the RAM 24, under the control of the operation-determining section 22. Further, when the operation-determining section 22 determines that a cache hit occurs, the cache section 25 outputs cached configuration data associated with the cache hit to the processing circuit group 30. When a cache becomes free, the cache section 25 delivers to the address-generating section 23 a ready signal indicating that configuration data output from the RAM 24 can be written therein.
Next, the simple cache mode and the look-ahead mode will be described in detail. First, a description will be given of the simple cache mode.
In
The tag section 22a of the operation-determining section 22 stores state numbers associated with configuration data stored in the caches 25aa to 25ac of the cache section 25. When configuration data output from the RAM 24 is stored in one of the caches 25aa to 25ac, the state number of the configuration data is stored in the tag section 22a.
The judgment section 22b compares a state number associated with configuration data to be executed, which is determined in response to a switching condition signal, with each of the state numbers stored in the tag section 22a. When there occurs matching of the state numbers (i.e. when a cache hit occurs), the judgment section 22b controls the selector 25c such that the configuration data stored in one of the caches 25aa to 25ac in association with the state number is output. When there does not occur the matching of the state numbers, the judgment section 22b controls the address-generating section 23 to generate the address of the configuration data associated with the state number, and controls the selector 25c such that the configuration data is output from the RAM 24. More specifically, the judgment section 22b determines whether or not a cache hit occurs as to the configuration data to be executed, and when the cache hit occurs, the selector 25c is controlled such that the configuration data is output from one of the caches 25aa to 25ac storing the data, whereas when no cache hit occurs, the selector 25c is controlled such that the configuration data is output from the RAM 24.
Each of the caches 25aa to 25ac of the cache section 25 is a register that has the same bit width as that of configuration data and is implemented by flip-flops. For example, the caches 25aa to 25ac are formed by n (bit width of configuration data)×3 (number of caches) flip-flops.
The output section 25b delivers configuration data output from the RAM 24 to one of the caches 25aa to 25ac and the selector 25c.
Now, it is assumed that the simple cache mode is further divided into two modes. In one of the two modes, when a cache hit does not occur, configuration data output from the RAM 24 is stored in one of the caches 25aa to 25ac. In the other mode, when no cache hit occurs, configuration data output from the RAM 24 is not stored in any one of the caches 25aa to 25ac.
In the one mode, the output section 25b stores configuration data output from the RAM 24 in one of the caches 25aa to 25ac, and outputs the same to the selector 25c. In the other mode, the output section 25b outputs the configuration data output from the RAM 24 to the selector 25c, without storing the same in any one of the caches 25aa to 25ac. By thus dividing the simple cache mode into two, it is possible to prevent rewriting of data in the caches 25aa to 25ac from being performed frequently, when no cache hit occurs.
It should be noted that new configuration data is stored in one of the caches 25aa to 25ac which stores the oldest configuration data or configuration data with a low cache hit rate.
The selector 25c selectively outputs configuration data output from the caches 25aa to 25ac and configuration data output from the RAM 24 via the output section 25b, under the control of the judgement section. The caches 25aa to 25ac are registers, as described hereinabove, which are in a state constantly outputting configuration data to the selector 25c. The selector 25c selectively outputs one of configuration data constantly output from the caches 25aa to 25ac and configuration data output from the output section 25b. The selector 25c outputs configuration data without designating the address of a cache, which enables high-speed delivery of configuration data.
In
The judgment section 22b of the operation-determining section 22 compares between state numbers stored in the tag section 22a and the state number determined by the next state-determining section 21. If one of the stored state numbers matches the determined state number (i.e. if a cache hit occurs) , the selector 25c is controlled to output configuration data of the matching state number from one of the caches 25aa to 25ac storing the data. If none of the stored state numbers in the tag section 22a match the determined state number, the address-generating section 23 is controlled to output the address of configuration data of the determined state number.
The RAM 24 delivers the configuration data associated with the address output from the address-generating section 23 to the output section 25b of the cache section 25. When the current simple cache mode is the aforementioned one mode, the output section 25b delivers the configuration data to both of one of the caches 25aa to 25ac and the selector 25c, whereas when the current simple cache mode is the other mode, the output section 25b delivers the configuration data to the selector 25c alone. The selector 25c delivers the configuration data output from the output section 25b to the processing circuit group 30 shown in
Next, a description will be given of the look-ahead mode.
In performing cache operation in the look-ahead mode, the operation-determining section 22 is configured to have functional blocks shown in
When the operation mode of configuration data currently being executed is the look-ahead mode, the operation mode-setting section 22c outputs a prefetch request signal to the next state-determining section 21 so as to request the next state-determining section 21 to deliver a next state number stored in the same for next processing, to the judgment section 22b. Further, the operation mode-setting section 22c instructs the judgment section 22b to perform a pre-fetch operation. Then, when the look-ahead operation is completed, the operation mode-setting section 22c outputs a next state output completion signal to the judgment section 22b.
The judgment section 22b compares between state numbers stored in the tag section 22a and a next state number for look-ahead to thereby determine whether configuration data for look-ahead is stored in any of the caches 25aa to 25ac. If one of the state numbers stored in the tag section 22a matches the next state number for look-ahead, it can be judged that the configuration data for look-ahead is already stored in the one of the caches 25aa to 25ac, and therefore the operation mode-setting section 22c does nothing.
If no state number stored in the tag section 22a matches the next state number for look-ahead, it can be judged that the configuration data for look-ahead is not stored in any of the caches 25aa to 25ac. Therefore, the operation mode-setting section 22c acquires a free cache number, and outputs the cache number acquired by the prefetch operation to the output section 25b. The judgment section 22b outputs the next state number to the address-generating section 23, and the RAM 24 outputs configuration data associated with the next state number to the output section 25b. The output section 25b stores the configuration data received from the RAM 24 in one of the caches 25aa to 25ac associated with the cache number received from the operation mode-setting section 22c. The judgment section 22b stores the next state number associated with the pre-read configuration data in the tag section 22a.
It should be noted that when configuration data for look-ahead can be stored in one of the caches 25aa to 25ac , the output section 25b outputs the ready signal to the address-generating section 23, and in response to the ready signal, the address-generating section 23 outputs an address associated with a state number of configuration data to be prefetched, to the RAM 24.
When the next state number associated with configuration data to be executed next is determined in response to the switching condition signal, the judgment section 22b determines whether the configuration data associated with the next state number is stored in any of the caches 25aa to 25ac. If the configuration data is stored in one of the caches 25aa to 25ac, a cache number is output to the selector 25c. The selector 25c delivers the configuration data output from one of the caches 25aa to 25ac associated with the cache number to the processing circuit group 30.
In
The judgment section 22b compares between state numbers stored in the tag section 22a and the next state number for look-ahead to determine whether configuration data associated with the next state number for look-ahead is stored in any of the caches 25aa to 25ac. The judgment section 22b outputs the result of determination to the operation mode-setting section 22c.
When no cache hit occurs, the operation mode-setting section 22c operates such that the configuration data as to which no cache hit occurs is pre-read into one of the caches 25aa to 25ac. The operation for caching configuration data in the look-ahead mode is thus executed.
Next, a description will be given of configuration data and the operation modes.
The program shown in
As shown in the flowchart shown in
As shown in
The mode bit area stores information indicative of an operation mode. For example, each operation mode is represented by two bits as shown in
The circuit configuration information area stores information defining the configuration of the processing circuits of the processing circuit group 30 shown in FIG. 3. In other words, the circuit configuration of the processing circuit group 30 is determined by the circuit configuration information of the configuration data 61.
When the configuration data 61 is executed, the next state number area stores a next state number associated with configuration data to be executed next. For example, from the flow of processing shown in
From the flow of processing shown in
Then, the computation 1 or 2 is carried out in response to the switching condition signal. In this case, since the configuration data associated with the computations 1 and 2 has been pre-read into associated ones of the caches 25aa to 25ac, the processing circuit group 30 can be configured at high speed whichever of the computations 1 and 2 is executed, without accessing the RAM 24, irrespective of the result of the condition 2.
As described above, configuration data is configured to store information of a operation mode of a cache, and cache operation is controlled according to the operation mode. This enables a compiler to determine storage of the configuration data into a cache within a range of prediction on operations of a program that can be analyzed by the compiler.
More specifically, the compiler is capable of grasping through analysis of the program what process is to be executed and hence is capable of performing cache judgment automatically on a predetermined process repeatedly carried out e.g. by a loop description, to thereby add an operation mode thereto. Therefore, a user can obtain optimal performance without consciously designating the operation mode.
A portion which is not subjected to cache judgment by the compiler can be controlled by the user. This is achieved e.g. by operating the mode bit of compiled configuration data 61.
It should be noted that cache operation can be forcibly locked and unlocked by control of the CPU 40. Further, continuous execution of cache operations can be stopped by control of the CPU 40. It is also possible to lock and unlock configuration data stored in all or only a part of the caches 25aa to 25ac. Furthermore, configuration data can be forcibly stored in the caches 25aa to 25ac.
For example, a control area for the above-mentioned settings by the CPU 40 is provided in a part of the configuration data area of the memory map 50 shown in
According to the processor of the present invention, configuration data is configured to contain cache operation information, and cache operation is controlled based on the cache operation information contained in configuration data. This enables the compiler to store cache operation information in configuration data based on a prediction on operations of a program, and determine storage of the configuration data in a cache.
The foregoing is considered as illustrative only of the principles of the present invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and applications shown and described, and accordingly, all suitable modifications and equivalents may be regarded as falling within the scope of the invention in the appended claims and their equivalents.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2004-186398 | Jun 2004 | JP | national |