The present application claims priority from Japanese application JP 2004-379598 filed on Dec. 28, 2004, the content of which is hereby incorporated by reference into this application.
The present invention relates to a data processor represented by a microprocessor, and more particularly to a system for controlling and managing, by software, an associative memory for carrying out an associative operation, for example, a cache memory or a TLB (Translation Look-aside Buffer).
Conventionally, a processor system mounts a cache memory for being operated by copying a part of an instruction or data on to a high speed memory having a small capacity which is disposed in a main memory as means for enhancing a memory access performance. Since the cache memory has a smaller capacity than the capacity of the main memory, it is impossible to dispose all data in the main memory. However, a transfer to the main memory is automatically carried out on a hardware basis if necessary. Therefore, an ordinary program can be operated without a consciousness of the presence of the cache memory.
The cache memory carries out a data transfer together with the main memory on a greater unit than a data unit handled by a data processor which is referred to as a line. In a typical cache method, states of a line which are referred to as “invalidate”, “clean” and “dirty” are given. The “invalidate” indicates a state in which the data of the main memory are not allocated to a cache line, the “clean” indicates a state in which data are allocated to the cache line and are coincident with the data of the main memory, and the “dirty” indicates a state in which the data allocated to the cache line are rewritten by a processor but old data are left in the main memory.
Although it is not necessary to become conscious of the presence of the cache memory in relation to the ordinary program as described above, in the case of direct access to the main memory from an external device without using the cache memory, it is necessary to carry out an operation for invalidating the contents of the cache memory by software and forcibly writing contents written to the cache memory back into the main memory.
This is referred to as a cache coherency control. In order to carry out the cache coherency control, means for operating the cache memory is generally offered to the processor.
For more specific contents of the operation of the cache coherency control, it is possible to define a plurality of methods referred to as “purge”, “invalidate” and “write-back”. The “purge” can be defined as a method of carrying out a transition to an invalid state over a line set in a dirty and clean state and writing data on a line back into the main memory if an original state is dirty, the “invalidate” can be defined as a method of carrying out the transition to the invalid state in the same manner as in the “purge” and performing no write-back even if the original state is dirty, and the “write-back” can be defined as a method of carrying out a transition from “dirty” to “clean” and performing the write-back.
In the cache coherent operation a specific line is designated by software, and a plurality of line designating methods is provided. One of them is a method of directly designating a line and another method is a method of making a hit decision (associative operation) of the cache memory and designating the line as an operating object when the decision of hit is obtained. The former method will be referred to as “non-associative” and the latter method will be referred to as “associative”. In other words, it is possible to propose six combinations of associative/non-associative X purge/invalidate/write-back as the coherency operation described above. Referring to non-associative and associative, a processing efficiency is taken into consideration depending on a size (the number of lines) of a region to be operated. The software carries out a proper use, for example, the “non-associative” is set if the region is large and the “associative” is set if the region is small.
A coherency control designating method to be carried out by software is varied depending on a processor, and includes a method of carrying out a designation through an instruction and a method of writing specific data to a special address. For the former method, a one-to-one instruction code is allocated every operation type. For the latter method, a data transfer instruction is utilized to designate the contents of an operation in a combination of an address and data. This method has been described in Patent Document 1.
While the description has been given to the coherency operation intended for the cache memory, moreover, a page attribute operation for a TLB using an associative memory also has a similar operation to the cache coherency control operation. The page attribute operation indicates an operation for changing an address translation map by the TLB.
[Patent Document 1] JP-A-8-320829 Publication
As described above, the operations of the cache memory and the TLB have a plurality of variations. First of all, a method of designating an operation by software will be investigated. In a method of giving a one-to-one instruction code for each operation type, instruction codes are consumed corresponding to the number of the variations. It is hard to apply the same method to the case in which an instruction code space is limited in an architecture of an 8-bit or 16-bit fixed-length instruction code. On the other hand, although a method of designating the contents of an operation in a combination of an address and data by utilizing a data transfer instruction does not consume a new instruction code, it cannot specify whether the contents of the processing are a normal data transfer or a cache operation in an instruction decoding stage to be carried out in an early stage of a processor pipeline. It is impossible to specify whether the contents of the processing are the cache operation or not until the execution of an instruction proceeds to a memory access stage of the pipeline. The normal data transfer is a high-priority processing which greatly influences the performance of the processor. For this reason, the data transfer is operated preferentially without deciding whether the contents are the cache operation or not. As a result, the cache memory carries out a useless associative operation so that a consumed power is increased. Moreover, there is a problem in that the processing performance of the cache operation is deteriorated in a method of discriminating data which are determined in a late stage of a pipeline to determine the contents of the cache operation.
It is an object of the invention to suppress the consumption of an instruction code, a useless power consumption and a deterioration in the processing performance of the operation in an operation for a specific logical block such as a cache coherency operation or a TLB page attribute operation.
The above and other objects and novel features of the invention will be apparent from the description of the specification and the accompanying drawings.
Brief description will be given to the summary of the typical invention disclosed in the application.
[1] A data processor has a central processing unit and a plurality of logical blocks to be connected to the central processing unit, and the central processing unit sets a predetermined logical block to be a control object based on a result of decode of a predetermined instruction code, and a function of the predetermined logical block is selected based on the result of decode of the predetermined instruction code and a part of address information which is incidental to the predetermined instruction code.
As described above, it is not necessary to allocate an instruction code in a one-to-one correspondence to the operation of the predetermined logical block and it is possible to hold the number of the allocated instruction codes to be small. In particular, the result of decode of the instruction code and the address information which is incidental to the predetermined instruction code are used for selecting the function of the logical block. Consequently, at least two instruction codes are allocated to the operation of the predetermined logical block. Furthermore, it is possible to decide an operating object in an early stage before reaching the memory access stage of a pipeline and to suppress the operating power of a useless logical block, and to prevent the number of cycles required for the operation from being increased.
As a typical configuration of the invention, the predetermined logical block is a cache memory and the function to be selected is an associative mode using an associative retrieval for a cache coherency control or a non-associative mode which does not use the associative retrieval. The function to be selected is contents of the cache coherency control. The contents of the cache coherency control are purge, write-back and invalidate, for example.
As another typical configuration of the invention, the predetermined logical block is a TLB and the function to be selected is an associative mode using an associative retrieval in a page attribute operation control of the TLB or a non-associative mode which does not use the associative retrieval. The function to be selected is contents of the page attribute operation control. The contents of the page attribute operation control are making dirty, making clean and invalidate, for example.
[2] A data processor has a central processing unit and a plurality of logical blocks to be connected to the central processing unit, and the central processing unit sets a predetermined logical block as a control object based on a result of decode of a predetermined instruction code, and a function of the predetermined logical block is selected based on a part of address information which is incidental to the predetermined instruction code. In particular, the incidental address information to the predetermined instruction code is used for selecting the function of the logical block. Therefore, it is preferable to allocate at least one instruction code to the operation of the predetermined logical block. In this respect, it is possible to minimize the instruction code to be allocated to the operation of the predetermined logical block. In the same manner as described above, furthermore, it is possible to decide the operating object in an early stage before reaching the memory access stage of the pipeline, to suppress the operating power of a useless logical block and to prevent the number of cycles required for the operation from being increased.
As a typical configuration of the invention, the predetermined logical block is a cache memory and the function to be selected is an associative mode using an associative retrieval for a cache coherency control or a non-associative mode which does not use the associative retrieval, and contents of the cache coherency control. The contents of the cache coherency control are purge, write-back and invalidate, for example.
As another typical configuration of the invention, the predetermined logical block is a TLB and the function to be selected is an associative mode using an associative retrieval in a page attribute operation control of the TLB or a non-associative mode which does not use the associative retrieval, and contents of the page attribute operation control. The contents of the page attribute operation control are making dirty, making clean and invalidate, for example.
[3] A data processor according to yet another aspect of the invention has a logical block to be activated by using a predetermined instruction code, and a function of the logical block which is activated is selected by using the instruction code and a part of addresses which are incidental to the instruction code.
A data processor according to a further aspect of the invention has a logical block to be activated by using a predetermined instruction code, and a function of the logical block which is activated is selected by using apart of addresses which are incidental to the instruction code.
Next, description will be given to a first example of a cache operating method which can be applied to the data processor 1101.
As an example, description will be given to an operation in the case in which a “CBP@Rn” instruction is executed. First of all, an instruction code (OPCODE) 105 executed in an ID stage is identified by an instruction decoder (OPDEC) 106 and the coherency control portion (COHERENT CTRL) 108 is notified of an operation (OP) 107 indicating that the contents of a processing are the purge. Next, whether bits 31 to 24 of an address designated as Rn determined in the EX stage are H′F4 is decoded by the address decoder (ADRDEC) 109, and it is decided whether an associative mode or a non-associative mode is set and a result of the decision (ASC) 110 is output to the selector 117. In case of the non-associative mode, a status (dirty /clean) corresponding to four ways is read from the status array 102 in order to know a state of a line in which bits 12 to 5 of the address are indicated as indices. The way in the non-associative mode is designated by way designating information (WAY-NA) 111 corresponding to bits 14 to 13 of the address and is selected by the selector 117, and furthermore, a selection is carried out by the selector 118 in response to an output thereof. Consequently, the coherency control portion 108 is notified of a way (WAY) 112 to be an operating object and a status (STAT) 113 to be an object way. The coherency control portion 108 decides the contents of the cache operation from the information of the OP 107, the WAY 112 and the STAT 113, and a status of an object line is updated and data are written back if necessary.
In the case in which bits 31 to 24 of the address are not H′F4, an operation is carried out as an associative purge, and the address is first translated into a physical address by means of a TLB 1105. A tag and a valid bit are read from the tag and valid bit array 101 in accordance with the index designated by the addresses 12 to 5, and a comparison with a physical address PADR is carried out by the hit decision logic (CMP) 115. Furthermore, the status corresponding to four ways is read from the status array (STA) 102 and the coherency control portion 108 is notified of a hit way (WAY-A) 116 and a hit way status. The coherency control portion 108 carries out an operation of an object line based on the OP 107, the WAY 112 and the STAT 113 which are obtained in the same manner as in the non-associative mode.
The CBWB and CBI instructions are executed in the same procedure and the execution is different in that the contents of the operation of the coherency control portion 108 are the write-back and the invalidate based on a result of decode of an instruction in the OPDEC (106).
Although a second example shown in
Next, description will be given to an example of a page attribute operating method of a TLB which can be applied to the data processor 1101.
With reference to
Referring to the page attribute operation of the TLB, similarly, it is possible to carry out many TLB operations by addressing while assigning a plurality of TLB operations to a small number of instruction codes to reduce a consumption of an instruction space. As compared with the case in which the TLB operation is carried out by using a data transfer instruction, accordingly, it is possible to implement a lower power operation. Moreover, the store data are not used. By starting the TLB operation in an early stage of a pipeline, therefore, it is possible to contribute to an enhancement in a processing performance.
According to various embodiments described above, it is possible to obtain the following functions and advantages.
[1] It is possible to reduce the number of instruction codes required for the operations of the cache memory 1104 and the TLB 1105 and to effectively utilize an instruction code space, and to enhance an instruction code efficiency in a data processor in which the number of bits of a basic instruction is an instruction set of a fixed-length instruction having a small number of bits, for example, 8 bits or 16 bits.
[2] As compared with a method of designating the operations of the cache memory 1104 and the TLB 1105 in a combination of a transfer instruction, a special address and data, whether the contents of a processing are a normal data transfer or a cache and TLB operation can be determined in an earlier stage. Consequently, it is possible to stop an unnecessary logical operation, thereby contributing to a reduction in a power.
[3] As compared with a conventional technique for determining the contents of the operations of the cache memory 1104 and the TLB 1105 by using stored at a designated to a transfer instruction, it is possible to start the operation processings of the cache memory and the TLB in an earlier stage. Consequently, it is possible to expect an enhancement in a processing performance.
While the invention made by the inventor has been specifically described above based on the embodiment, it is apparent that the invention is not restricted thereto but various changes can be made without departing from the scope of the invention.
For example, the cache memory is not restricted to a set associative configuration but may be a direct map or full associative configuration. The data processor may have such a structure as to include only one of the cache memory and the TLB. The object of the invention is not restricted to the cache memory and the TLB but may be another logical block which is activated by using a predetermined instruction code. The invention can be widely applied to a condition that the function of the activated logical block is selected by using an instruction code, a part of addresses which are incidental to the instruction code or a part of addresses which are incidental to the instruction code.
Number | Date | Country | Kind |
---|---|---|---|
2004-379598 | Dec 2004 | JP | national |