This invention relates to processing systems and more particularly to processing systems that use address translation.
Processing systems commonly use a virtual addressing scheme in order to provide protection and flexibility in the use of main memory. A memory management unit (MMU) provides control of the translation from the virtual address to the physical (also called real) address used to access main memory (also called system memory). The particular way in which the virtual address is converted to a physical address varies. The particular translation being used varies with the application. One way this is handled is to have what is called a page table entry (PTE) for each translation. Thus for any given virtual address there is a corresponding PTE. Some PTEs are held in a cache portion of the MMU for quick identification of the PTE that goes with the particular virtual address. If the PTE is not present in the MMU cache, the PTE is identified through a tablewalk operation. This is achieved by obtaining from main memory a page table entry group (PTEG) that is a group, commonly 8 or 16, of PTEs. The PTEGs may be in a data cache, but that is not typically the case. The address of the PTEG is identified by an operation on the virtual address called “hashing.” Thus, the virtual address is hashed and used to obtain the physical address of the PTEG. Each PTE in the PTEG is tested in relation to the virtual address to determine if the PTE for that address is present. If there is no match to any of the PTEs in the PTEG, either an exception is initiated or a secondary PTEG is then obtained from main memory and the PTEs of the secondary PTEG are compared to the virtual address.
The MMU cache is generally in two portions, L1 and L2, and intentionally small in order to provide fast access. A hit in the L1 MMU cache typically takes on the order of 3 cycles, while a hit in the L2 MMU cache, which is larger than L1, takes on the order of 12 cycles. When there is a miss in the MMU cache for the virtual address, there is then a comparatively lengthy process of obtaining the PTEGs and performing the table lookup. This can easily take 100 cycles. One approach has been to immediately begin to execute the table walk after determining there is a miss in the MMU cache. One difficulty with this approach is that the lookup operation is performed, causing a portion of the MMU cache to be overwritten even if request for the data at the virtual address turns out to be in error. Overwriting any portion of the MMU cache with a location that is not going to be used increases the risk of a subsequent miss in the MMU cache, which is a penalty of over 100 cycles.
Thus there is a need for address translation that overcomes or reduces one or more of the issues raised above.
The foregoing and further and more specific objects and advantages of the instant invention will become readily apparent to those skilled in the art from the following detailed description of a preferred embodiment thereof taken in conjunction with the following drawings:
In one aspect, a processing system has a memory management unit that has a cache for storing address translation entries corresponding to virtual addresses. If the address translation entry is present for a requested virtual address, then the virtual address is translated to the physical address and sent to memory to obtain the data at that physical address. If there is a miss in the MMU cache, the virtual address is hashed to obtain the physical address for a group of address translation entries. After obtaining this hashed address, a decision is made as to whether the group of address translation entries is to be prefetched. If so, the group is loaded into the data cache. Another determination is made as to whether to continue or not. If request for data is not valid, the process is terminated. If the request for data is still valid, then a tablewalk is performed on the group of address translation entries stored in the data cache until the matching entry is found. The matching entry is loaded into the MMU cache and the virtual address is translated to obtain the physical address and that physical address is sent to main memory to obtain the data at that address. This is better understood with reference to the drawings and the following description.
Shown in
Shown in
In operation, processor 14 functions according to instructions from instruction cache 26 under the control of execution units 30. As is known for processor systems, the front-end pipeline works in conjunction with the execution units in preparation for operations and back-end pipeline 34 similarly works in conjunction with the execution units 30 for handling results from the operations. The combination of front-end pipeline 28, execution units 30, back-end pipeline 34, and memory access sub-pipeline can be considered an instruction pipeline that buffers and executes data processing instructions.
A method 100, which is comprised of steps 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, and 126, of operating processor 14 is shown in
For the case in which the MMU cache does not have the corresponding PTE, which in this example means that the corresponding PTE is present in neither L1 MMU 38 nor L2 MMU 40, then the virtual address is hashed to obtain the physical address for a group of PTEs from which the corresponding PTE may be found. The group may itself comprise groups. A group of PTEs is called a page table entry group (PTEG). Rather than automatically proceeding with prefetching the PTEG from the physical address that was obtained by hashing, there is a decision to proceed or not, which corresponds to step 112. This decision is made by the filter limiter and is based on factors such as how speculative is the prefetch and how many PTEG fetches are pending. A prefetch of a PTEG will result in data cache 16 being loaded and that may be undesirable to alter the cache if the prefetch is highly speculative.
If the decision is to wait, then other operations will continue without prefetching the PTEG. If the decision is to move forward with the prefetching of the PTEG, then that request is loaded into prefetch queue 44. This decision is made prior to the opportunity to load the prefetch queue so that there is no delay in loading prefetch queue if the decision is to do so. Upon a miss in the MMU cache, memory access pipeline 36 will be flushed. The loading of prefetch queue 44 can occur prior to this flushing being completed. Prefetch queue is used for storing prefetch requests of data and instructions from execution units 30, which is known to one of ordinary skill in the art. The additional use of prefetch queue 44 for PTEG prefetches is, however, beneficial because it does not automatically result in the overwriting of data cache 16 and L1 MMU 38 and L2 MMU 40. Under the control of prefetch queue 44, the PTEG is obtained by putting the physical address thereof out on interface bus 22, which corresponds to step 114.
After receiving the PTEG, a determination of the validity of the request for the virtual address is made, which corresponds to step 116. This decision point is also advantageous because if the data request is not valid, the writing of L1 MMU and L2 MMU can be avoided. If the data request is no longer valid, the operation is ended, which corresponds to step 118.
If the data request is still valid, then the table walk of the PTEG is performed, which corresponds to step 120, to obtain the corresponding PTE. This may involve tablewalking through more than one group. Also, the acquisition of the PTEG has been characterized as requiring a single physical address, but there may be a requirement for one or more additional physical addresses to obtain the complete PTEG. This possibility of more than one group of PTEs is known to one of ordinary skill in the art. The tablewalking is performed by tablewalk state machine 46.
After the corresponding PTE has been found, it is loaded into the MMU cache which in this case is both L1 MMU 38 and L2 MMU 40. This corresponds to step 122. The corresponding PTE is then used by the load/store control to convert the virtual address to the physical address, which corresponds to step 124. The physical address is then put onto interface bus 22 via memory access sub-pipeline 36 to obtain the requested data from memory, either main memory 18 or cache 16.
Various changes and modifications to the embodiments herein chosen for purposes of illustration will readily occur to those skilled in the art. For example, other MMU arrangements for the MMU cache could be used. Prefetching PTEGs could be performed for misses in the instruction MMU as well as the data MMU. Different filtering criteria could be used to decide whether or not to proceed with a prefetch of the PTEG. The arrangement of the PTEs within PTEGs could be altered. The tablewalk could be performed by software instead of hardware. To the extent that such modifications and variations do not depart from the spirit of the invention, they are intended to be included within the scope thereof which is assessed only by a fair interpretation of the following claims.