Cache reconfiguration based on run-time performance data or software hint

Description

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow diagram illustrating a method of the present disclosure in one embodiment of the present disclosure.

FIG. 2 is a block diagram illustrating cache memory that may be reconfigured in one embodiment of the present disclosure.

FIG. 3 illustrates an example of a cache line with associated granularity flags in one embodiment of the present disclosure.

FIG. 4 illustrates an example of a cache line with associated access flags in one embodiment of the present disclosure.

DETAILED DESCRIPTION

FIG. 1 is a flow diagram illustrating a method of the present disclosure in one embodiment. At 102, analysis is performed of an application for its cache memory access patterns and behavior. This analysis may be performed off-line in one embodiment. In another embodiment, the analysis is performed on-line while the application is running. In one embodiment, software such as an operating system may perform the analysis. The analyzed application's characteristics are evaluated and used to make a determination as to the type of reconfigurations that would optimize the cache usage during application's execution. The characteristics, for example, may include but are not limited to the data structure that the application is using, the expected reference pattern of the cache memory, whether the application references sparse or clustered chunk of code and data, the type of an application, heat and power consumption of the application, etc.

For instance, long cache lines typically perform better with large data structures. Thus, at 104, if it is determined that the application uses large data structures or a large region of allocated memory, the larger cache line configuration, is selected for this application. Data structure layout and sharing pattern may be analyzed on a multiprocessor to determine the optimal coherence granularity if for example a programmer has not performed cache alignment. By varying the coherence granularity the application behavior can be significantly improved. Many operating system data structures are small. Thus, when an operating system is executing, or any other application that uses smaller data structure is executing, the cache may be reconfigured to have smaller cache lines or smaller coherence granularity at 106.

In addition, if it is determined that the application is consuming high power or generating high heat, a part of the cache may be reconfigured to be disabled at 108, so that, for example, the chip will not get too hot. In some applications the critical working set is not the entire cache, so that performance need not be sacrificed to achieve a reduction in power usage or to reduce the temperature. Temperature sensors placed on or near processor cores may be used to determine how much heat an application is generating. A part of the cache may be disabled, for example, by setting enable/disable bit or bits to be associated with cache memory locations. The hardware or software accessing the cache then may read those bits to determine whether that part of the cache may be used or not used.

At 110, cache associativity may be reconfigured based on the type of execution entity. For instance, different types of applications may perform better with certain associativity. Associativity may be reconfigured, for example, by modifying a hashing algorithm or masking-off more or less number of bits in virtual or physical addresses for determining index and tag portions of associative cache memory. Although a higher associativity may mean a power cost and potentially a cycle cost, for applications that can benefit from a higher degree of associativity, there is potentially a considerable performance advantage that outweighs the power and cycle cost. For other applications a lower associativity will still achieve good performance, thus for those applications, the software or operating system can reduce the associativity and save power.

Reconfigurations with respect to other characteristics of cache memory are possible. At 112, cache memory is reconfigured, for instance, based on the determinations made above. The reconfiguration, in one embodiment, may be done by the hardware on the processor. The hardware, for instance, takes the information determined as above from the software and performs the modifications. A register may be set up per cache where the software may provide the hints for reconfiguring the cache memory.

FIG. 2 is a block diagram illustrating cache memory that may be reconfigured. Briefly, a cache line refers to a unit of data that can be transferred to and from cache memory. Thus, cache line size determines the coherence granularity and what is fetched from memory. Different applications may perform better with different cache line sizes or coherence granularities. For instance, applications that use small data structure may only need to access small portions of a cache line and need not perform coherence on the entire line, while those that have larger data structure may perform better accessing the entire line. In an exemplary embodiment of the present disclosure, a cache line may be further divided into a plurality of sectors. In this embodiment, cache accesses or cache coherence granularity are performed by sectors.

Referring to FIG. 2, a computer system may comprise one or more processors and each processor 200 may comprise a central processing unit (CPU) 202 or the like, and a multi-level memory such as L1 cache 204 and L2 cache 206. In the example shown in FIG. 2, each cache line 210 in L2 cache 206 comprises 4 sectors, namely sector 0 (212), sector 1 (214), sector 2 (216) and sector 3 (218). While this example is shown with 4 sectors, any other number of sectors in a cache line is possible in the present disclosure. That is, a cache line may have any number of sectors greater than or equal to two.

In one embodiment, software may provide appropriate granularity information indicating that requested data in a memory region should be fetched with a suggested granularity. For example, software may indicate that, in case of an L1 cache miss on any address in a memory region, only the requested sector to be fetched from the corresponding L2 cache 206. As another example, software may also indicate that, in case of an L1 cache miss on any address in another memory region, the whole cache line, that is, all four sectors should be fetched from the corresponding L2 cache 206. In one embodiment, the granularity information may be maintained in tables such as a TLB (Translation Lookaside Buffer), the page table or the like, for instance, if a memory region is defined as one or more memory pages.

In another embodiment, each L2 cache line maintains a granularity flag (GF) that indicates which one or more sectors of the requested cache line should be supplied to the L1 cache when the L2 cache 206 receives a cache request from the corresponding L1 cache 202. FIG. 3 illustrates an example of a cache line with associated granularity flags in one embodiment. For example, each L2 cache line 300 may maintain a GF bit per sector. FIG. 3 shows 4 GF bits 302, 304, 306, 308. Each GF bit (302, 304, 306, 308) corresponds to one sector (310, 312, 314, 316 respectively), indicating whether data of that sector should be supplied if data of another sector in the same cache line is requested. For instance, if GF bit 302 associated with sector 0310 is set (for example, set to 1), when data in any one of sectors 1-3 (312, 314, 316) are requested, data in sector 0 is also supplied. Conversely, if GF bit 304 associated with sector 1312 is not set (for example, set to 0), sector 1312 would not be supplied when one or more of the other sectors 310, 314, 316 are requested and supplied.

At the L1 cache side, each L1 cache line maintains an access flag (AF) for each sector, indicating whether the corresponding cache sector has been accessed by the corresponding CPU since the time the data was cached. FIG. 4 illustrates an example of a cache line with associated access flags in one embodiment. Each sector 410, 412, 414, 416 may include correspondingly associated AF bits 402, 404, 406, 408. When data of a sector is brought into L1 cache, for example, from the corresponding L2 cache, the AF associated with that sector is set to 0. When the CPU accesses the data of a sector, the corresponding AF associated with that sector is set to 1. For instance, if sector 0410 is brought into the cache line 400, the AF bit 402 associated with sector 0410 is reset, for example, set to 0. The value of 0 in AF bit represents that the data of this sector has not been used yet. When the CPU or the like accesses the data of sector 0410, the AF bit 402 associated with sector 0410 is set to 1. The AF bit being set to 1 represents that the data of the sector associated with that AF bit was used.

Further, when an L1 cache line is replaced, the AF flags can be used to update the corresponding GF flags in the L2 cache. For instance, take for example, the cache line 400 of FIG. 4. If AF bit 404 associated with sector 1412 is set to 1, representing that the data of sector 1412 was used, when the cache line 400 is replaced, the GF bit value in the L2 cache for the corresponding sector may be updated to 1. With this simple adaptive granularity scheme, when an L2 cache receives a cache miss request from the corresponding L1 cache, the L2 cache can supply not only data of the requested cache sector, but also data of other sectors in the same cache line provided that those sectors were accessed by the CPU the last time they were supplied to the corresponding L1 cache. In one embodiment of the present disclosure, the addressing into the individual sectors in a cache line may be performed using an address of the cache line and offsetting the number of address bytes from the cache line address.

Although the above examples were described with respect to L1 cache as a requester and L2 cache as a supplier of data, it should be understood that the scheme explained above may be used between any levels of caches, for instance, between L2 and L3 caches, L3 cache and main memory, L2 cache and main memory, etc.

Splitting a cache line into a plurality of sectors helps in reducing the number of cache misses as well as the number of operations required to maintain cache coherence. For instance, two processors that access the same cache line, but different sectors in the line may perform updates to the respective sectors independently from one another without having to invalidate the other's cache line.

The embodiments described above are illustrative examples and it should not be construed that the present invention is limited to these particular embodiments. Thus, various changes and modifications may be effected by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.

Claims

1. A method for reconfiguring cache memory, comprising: analyzing one or more characteristics of an execution entity accessing a cache memory;and reconfiguring the cache dynamically based on the one or more characteristics analyzed, the step of reconfiguring including modifying associativity of the cache memory, modifying amount of the cache memory available to store data, changing coherence granularity of the cache memory, or modifying line size of the cache memory, or combination thereof.
2. The method of claim 1, wherein the one or more characteristics of an execution entity include size of data structure used by the execution entity, expected reference pattern of the execution entity, heat generated by the execution entity, or combination thereof.
3. The method of claim 1, wherein the step of analyzing includes reading temperature data associated with the execution entity to determine amount of heat generated by the execution entity.
4. The method of claim 1, wherein the step of reconfiguring includes changing number of masked bits for mapping into cache memory to modify cache memory associativity.
5. The method of claim 1, wherein the step of reconfiguring includes: dividing a cache line in the cache memory into a plurality of sectors; andaccessing data of the cache line by one or more sectors.
6. The method of claim 5, further including: instructing hardware as to which memory region should be cached by sectors and which memory region should be cached by entire cache lines.
7. The method of claim 5, further including: associating an access bit with each sector of a cache line; andsetting an access bit to true if a processing element uses data of a sector associated with the access bit.
8. The method of claim 7, wherein the step of associating includes: associating an access bit with each sector of a cache line in level-1 cache.
9. The method of claim 5, further including: associating a granularity bit with each sector of a cache line, the granularity bit for indicating whether a sector should be cached when one or more other sectors in the cache line are cached.
10. The method of claim 9, wherein the step of associating includes: associating a granularity bit with each sector of a cache line in level-2 cache, the granularity bit for indicating whether a sector should be cached when one or more other sectors in the cache line are cached.
11. The method of claim 1, wherein the step of analyzing is performed on-line while the execution entity is being run.
12. The method of claim 1, wherein the step of analyzing is performed off-line.
13. The method of claim 1, wherein the step of analyzing is performed by software.
14. The method of claim 1, wherein the step of analyzing is performed by an operating system.
15. A system for reconfiguring cache memory, comprising: a means for analyzing one or more characteristics of an execution entity accessing a cache memory; and a means for reconfiguring the cache dynamically based on the one or more characteristics analyzed, the means for reconfiguring including a means for modifying associativity of the cache memory, modifying amount of the cache memory available to store data, changing coherence granularity of the cache memory, or modifying line size of the cache memory, or combination thereof.
16. The system of claim 15, wherein the one or more characteristics of an execution entity include size of data structure used by the execution entity, expected reference pattern of the execution entity, heat generated by the execution entity, or combination thereof.
17. The system of claim 15, wherein the means for analyzing includes a means for reading temperature data associated with the execution entity to determine amount of heat generated by the execution entity.
18. A system for reconfiguring cache memory, comprising: lower-level cache memory comprising at least a plurality of cache lines, at least one of the cache lines divided into a plurality of sectors;an access bit associated with each of the plurality of sectors of the lower-level cache memory, the access bit representing whether data of a sector associated with the access bit was used;higher-level cache memory comprising at least a plurality of cache lines, at least one of the cache lines divided into a plurality of sectors;a granularity bit associated with each of the plurality of sectors of the higher-level cache memory, the granularity bit representing whether data of a sector associated with the granularity bit should be cached when one or more of other sectors in the same cache line are cached into the lower-level cache memory;a processor operable to use data of one or more sectors of the lower-level cache memory, the processor further operable to update one or more access bit respectively associated with the one or more sectors; andmeans operable to update one or more granularity bits.
19. The system of claim 18, wherein the means operable to update one or more granularity bits includes software.
20. The system of claim 18, wherein the means operable to update one or more granularity bit is operable to update one or more granularity bits based on an analysis performed on an execution entity.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with Government support under Contract No.: NBCH020056 (DARPA) awarded by Defense, Advanced Research Projects Agency. The Government has certain rights in this invention.

Cache reconfiguration based on run-time performance data or software hint

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT