The present application is based on, and claims priority from India Application Number IN2873/CHE/2005, filed Oct. 27, 2005, the disclosure of which is hereby incorporated by reference herein in its entirety.
A virtually indexed cache based system 1 is shown in
For simple clarification of cache operations, we discuss below an example in which the cache 3 is a direct mapped cache. Direct mapped caches have a one to one correspondence between the cache index and cached data, whereas n-way set associate caches can have a 1 to n relationship between the cache index and cached data. For example 1 to 2 for 2-way set associate caches, 1 to 4 for 4-way set associate caches and so on.
To make cache searching faster, the cache 3 is divided into a number of lines of defined equal size. For example, for a 32 bit system with a 16 KB cache, the cache 3 can be divided into 256 lines of size 64 bytes. Such an organization can be compared with an array of fixed size data elements. The line numbers 0 to 255 are the cache index and the size 64 bytes is the cache line size. When the CPU 2 wishes to read to or write from memory, it generates a virtual address 20 with the format illustrated in
Bits 0 to K−1 of the hashed address 20′ comprise an index 21, and bits K to N comprise a tag 22. P and K may have the same value or different values. In this case the number of cache lines is 256 so K has a value of 8, and the system is a 32 bit system so N has a value of 32. Referring now to
The data structure of the MMU 4 is shown in
The VPN of the virtual address is first compared with the VPNs stored in the TLB 30. If the TLB contains the VPN, then the associated physical address is calculated from the tuple <PPN, Page Offset 36> and this physical address is sent to the main memory 5. If the TLB does not contain the VPN, then the VPN is looked up in the Page Table 31, and the associated physical address is calculated from the tuple <PPN, Page Offset 36> and this physical address is sent to the main memory 5. On receipt of the tuple <PPN, Page Offset 36>, the main memory 5 returns the data stored at that physical address, and that data is recorded in the cache 3 so that the CPU 2 can read the data from the cache 3.
The process of ensuring that the contents of a cache location is the same as its corresponding main memory location is known as “validation”. The process of removing the mapping between a cache location (or consecutive cache locations) and the corresponding main memory location (or locations) is known as “invalidation”.
When two or more virtual addresses translate to the same location in main memory 5, the two virtual addresses are known as aliases. Aliases are used when applications need to share memory.
The following are the possible cache scenarios if aliases are used.
Case 1 does not create any cache coherence issues, as both addresses will point to the same cache line.
Case 2 also creates no cache coherency issues, as illustrated by the following example. Virtual addresses VPN1 and VPN2 are aliases, as follows:
The cache 3 contains a line corresponding with VPN1, as follows:
If VPN2 is then used to read to or write from the memory location associated with VPN1 and VPN2, then the cache 3 will be updated as follows:
Thus it can be seen that the cache line with index XXX alternates between VPN1 and VPN2. This is known as a “ping-pong” situation. This creates no cache coherency issues, but does create performance issues since only one alias can occupy cache at a time.
Case 3 and Case 4 create cache coherency problems, as demonstrated through the following example. Taking Case 3 first: virtual addresses VPN1 and VPN2 are aliases, as follows:
The cache 3 contains a line corresponding with VPN1, as follows:
If VPN2 is then used to access the memory location associated with VPN1 and VPN2, then the cache 3 will be updated as follows:
At this point the cache contains two different entries, each associated with the same main memory location. When accessing the same memory location through VPN1, the CPU will not see any changes made through a previous access by the alias VPN2 (and vice versa). This is an example of a cache coherency problem.
Another problem that is observed on virtually indexed cache systems is that of supporting private mapping of shared memory areas and files. Generally sharing of memory between processes is done through global virtual memory. This global virtual memory is accessible through virtual addresses, which are the same for all processes. This means that all processes will use the same address to access the shared area.
Suppose one process needs to map an area of memory or file that is already mapped in the shared region. This process needs to map a whole or part of this shared area or file into its private area. This would result in a case similar to an alias. The Unix system call mmap with option MAP_PRIVATE needs alias support to provide its intended functionality. In this case, a virtually indexed cache system will run into the same cache coherency problem that is associated with aliases.
The root cause behind the cache coherency problem is that aliases can occupy two different cache lines. If this situation can be avoided, cache coherency problems can be ruled out and hence true support for aliases can be provided. One advantage of a virtually indexed cache is that it can provide data faster by avoiding address translation or overlapping caches access with address translation and have less latency than physical caches.
Operating systems written for virtually indexed caches are responsible for addressing cache coherency problems such as the one described above. One conventional approach is to perform a ping-pong operation. In a ping-pong operation, a check is first made whether a virtual address has any aliases. If so, a check is made of the cache to determine whether the cache contains a line corresponding with the alias(es). If so, then the cache entry for each one of the aliases is removed. An example of a ping-pong operation can be illustrated with reference to the example given above. A memory access using VPN1 first checks whether VPN1 has any aliases. This returns a single alias VPN2. A check is made of the cache to determine whether the cache contains a line corresponding with VPN2. The cache entry for VPN2 is then removed. Similarly, if VPN2 is accessed, then the cache entry for VPN1 is removed. This ping-pong operation ensures that only a single alias is cached (although, in contrast with Case 2, the cache line index will vary depending on the last alias that was used to access the memory location).
The ping-ping operation described above creates performance issues. As a result, the use of aliases in virtually indexed cache systems is generally restricted to situations such as Case 1 and Case 2. As the chances for Case 1 and Case 2 are very limited, conventional virtual cache systems are mediocre in terms of alias support capability.
A second conventional solution is described in EP-A-0729102, in which cache coherency issues are avoided by disabling caching when aliases are used. A CV (cachable-in-virtual-cache) entry is added to the Page Table and TLB entries so that virtual addresses that have aliases are not cached, or are cached only when they are accessed for a read operation.
This solution does not provide full support for aliases on virtually indexed cache systems.
A third conventional solution is described in “Consistency Management for virtually indexed caches” Bob Wheeler Brian N. Bershad published in Architectural Support for Programming Languages and Operating Systems, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, Boston, Mass., United States Pages: 124-136 (1992). This ACM paper describes a way to ensure cache coherency by reverse translation. Since all aliases get translated to the same physical address, the reverse translation of all aliases will point to the same physical page. A software cache table is indexed by physical page number. This table contains the cache state (dirty or clean) and the virtual address that owns the cache entry. With the help of this table it is possible to determine any coherency issues because of concurrent access via alias by invalidating or validating and invalidating of caches.
Every memory transaction (read or write or DMA) needs to go through this algorithm in order to achieve cache coherency. It needs memory management hardware support to enable exceptions to run the algorithm when simultaneous accesses through alias. The performance penalty of this approach is very heavy because of the traps generated during memory access.
Embodiments of the invention will now be described by way of example with reference to the accompanying drawings in which:
A first method constituting an embodiment of the present technique provides a modified TLB/Page Table which is updated according to the method illustrated in
In a first step 50, a virtual address is generated by the CPU 2. The format of the virtual address is illustrated at 51, and corresponds with the format for virtual address 20 shown in
If the virtual address is determined to be an alias at step 54, then the PTE/TLB are updated in step 63 to create an entry with the format shown at 64. In this case, the VPN field is filled with the VPN of the alias, designated in
Thus the method of
The CPU 2 and MMU 4 are configured to handle a READ process as illustrated in
If there is no cache hit, then the VA is translated by the MMU 4 in step 74. If the V bit in the PTE/TLB entry is not set (step 75), then the PTE/TLB entry must be associated with a FRVA. In this case, the PPN and Page Offset are used to access the main memory 5 in step 76. The cache is synchronized in step 77 by writing the data accessed in step 76 into the cache line associated with FRVA. The data is then sent to the CPU in step 73.
If the V bit in the PTE/TLB entry is set (step 75) then the PTE/TLB entry must be associated with an alias which is not an FRVA. Therefore in this case, the FRVP (which is stored in the PPN/FRVP field of the PTE/TLB entry), and the Page Offset (from the virtual address of the alias) are hashed in step 79, and the hashed address is input to the cache in step 71.
PTE/TLB granularity is decided by Page Size, and Cache line size is the factor that decides cache entry granularity. Therefore, there will be only one PTE/TLB entry for a set of addresses if their VPN is the same. Similarly, cache entries can be shared by a set of addresses if they are contiguous and fall within the cache line size boundary. Hence the V bit is set at page granularity as PTE/TLB works at page level.
A second method of updating the PTE/TLB is to retain the physical page number in the PTE/TLB and add an FRVP field such as shown below.
A flow diagram for the second method is shown in
It can be seen that this second method helps to avoid the overhead of additional translation, as translation step 74 will only be performed once.
A third method of updating the PTE/TLB (similar to the method of
This algorithm is illustrated in
An algorithm for handling the access trap when an alias (VA) is accessed is shown below. This Algorithm does not try to replace FRVP very often. It assumes that FRVP is the master alias which is being referenced more often than the other aliases. There will not be any access traps while accessing the memory using the virtual page FRVP. At the same time, every time memory is accessed through any of the aliases, an access trap is generated. This algorithm requires a supplementary algorithm to promote any of the aliases to FRVA. Examples of both algorithms are given below.
Suppose we have two aliases V1 and V2 that access the same physical page P. We designated V1 as FRVP as it was the first one to be accessed. As a result, the cache would contain the data corresponding to V1. Suppose the program accessed the address V1+16 and got data loaded into cache. Now the same program is trying to access the same memory through V2+16. It will experience a trap and as a result it will enter into the trap routine given above. It will find FRVP for the page V2 (in this case, the translation for V2 is V1). It will compute a new address as V1+16 (that is, V2+16 is translated to V1+16).
This mechanism always ensures only FRVAs are cached and can be accessed directly. Each slave alias needs to be interpreted to FRVA for access by the formula {Vk (k=1 . . . n)+<offset>}=>{V1+<offset>}.
If the current FRVP is no longer the most frequently referenced alias, it can be replaced with an alias that is being referenced more frequently. This requirement also arises when FRVP gets retired (either due to an owning process expiring or the owning process needing to release the memory).
The essence of this solution is similar to the solution of
The three methods according to the embodiments described above provide the following advantages:
Although the technique has been described by way of example and with reference to particular embodiments it is to be understood that modification and/or improvements may be made without departing from the scope of the appended claims.
Where in the foregoing description reference has been made to integers or elements having known equivalents, then such equivalents are herein incorporated as if individually set forth.
Number | Date | Country | Kind |
---|---|---|---|
IN2873/CHE/2005 | Oct 2005 | IN | national |