The novel features believed characteristic of the illustrative embodiments are set forth in the appended claims. The illustrative embodiments themselves, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of the illustrative embodiments when read in conjunction with the accompanying drawings, wherein:
With reference now to
Graphics processor 110 may be connected to the MCH through an accelerated graphics port (AGP), for example. Processor unit 106 contains a set of one or more processors. When more than one processor is present, these processors may be separate processors in separate packages. Alternatively, the processors may be multiple cores in a package. Further, the processors may be multiple multi-core units.
An example of this type of processor is a Cell Broadband Engine™ processor, which is a heterogeneous processor. This process has a processor architecture that is directed toward distributed processing. This structure enables implementation of a wide range of single or multiple processor and memory configurations, in order to optimally address many different systems and application requirements. This type of processor can consist of a single chip, a multi-chip module (or modules), or multiple single-chip modules on a motherboard or other second-level package, depending on the technology used and the cost/performance characteristics of the intended implementation. A Cell Broadband Engine™ has a PowerPC Processor Element (PPE) and a number of Synergistic Processor Units (SPU). The PPE is a general purpose processing unit that can perform system management functions, like addressing memory-protection tables. SPUs are less complex computation units that do not have the system management functions. Instead, the SPUs provide computational processing to applications and are managed by the PPE.
In the depicted example, local area network (LAN) adapter 112 connects to south bridge and I/O controller hub 104 and audio adapter 116, keyboard and mouse adapter 120, modem 122, read only memory (ROM) 124, hard disk drive (HDD) 126, CD-ROM drive 130, universal serial bus (USB) ports and other communications ports 132, and PCI/PCIe devices 134 connect to south bridge and I/O controller hub 104 through bus 138 and bus 140. PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, and PC cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 124 may be, for example, a flash binary input/output system (BIOS). Hard disk drive 126 and CD-ROM drive 130 may use, for example, an integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interface. A super I/O (SIO) device 136 may be connected to south bridge and I/O controller hub 104.
An operating system runs on processor unit 106 and coordinates and provides control of various components within data processing system 100 in
Instructions for the operating system, the object-oriented programming system, and applications or programs are located on storage devices, such as hard disk drive 126, and may be loaded into main memory 108 for execution by processor unit 106. The processes of the illustrative embodiments are performed by processor unit 106 using computer implemented instructions, which may be located in a memory such as, for example, main memory 108, read only memory 124, or in one or more peripheral devices.
Those of ordinary skill in the art will appreciate that the hardware may vary depending on the implementation. Other internal hardware or peripheral devices, such as flash memory, equivalent non-volatile memory, or optical disk drives and the like, may be used in addition to or in place of the hardware. Also, the processes of the illustrative embodiments may be applied to a multiprocessor data processing system.
In some illustrative examples, data processing system 100 may be a personal digital assistant (PDA), which is configured with flash memory to provide non-volatile memory for storing operating system files and/or user-generated data. A bus system may be comprised of one or more buses, such as a system bus, an I/O bus and a PCI bus. Of course the bus system may be implemented using any type of communications fabric or architecture that provides for a transfer of data between different components or devices attached to the fabric or architecture. A communications unit may include one or more devices used to transmit and receive data, such as a modem or a network adapter. A memory may be, for example, main memory 108 or a cache such as found in north bridge and memory controller hub 102. A processing unit may include one or more processors or CPUs. The depicted examples in
The illustrative embodiments provide a computer implemented method, apparatus, and computer usable program code for managing data in a cache. A priority level is established for critical data structures within an application. The priority level prolongs the time that the data of the critical data structures remain in the cache. Critical data structures are data structures that are critical for the performance of the application. Critical data structures may include data that is frequently used or data that needs to be accessed efficiently at any given time. By keeping the data from the critical data structures in cache for prolonged amounts of time, the application is able to achieve optimal performance.
Turning now to
In these examples, fetch unit 202 fetches instructions from memory subsystem 214 or main memory unit 230 to speed up execution of a program. Fetch unit 202 retrieves an instruction from memory before that instruction is needed to avoid the processor having to wait for the memory, such as memory subsystem 214 or main memory unit 230 to answer a request for the instruction. Decode unit 204 decodes an instruction for execution. In other words, decode unit 204 identifies the command to be performed, as well as operands on which the command is to be applied. Issue unit 206 sends the decoded instruction to a unit for execution such as, for example, execution unit 210.
Execution unit 210 is an example of a unit that executes the instruction received from issue unit 206. Execution unit 210 performs operations and calculations called for by the instruction. For example, execution unit 210 may include internal units, such as a floating point unit, an arithmetic logic unit (ALU), or some other unit. Completion unit 212 validates the operations in the program order for instructions that may be executed out of order by execution unit 210. Branch unit 208 handles branches in instructions.
Cache array 216 contains sets for data needed by processor system 200. These sets are also called ways and are also like columns in the array. In these examples, cache array 216 is an L2 cache. LRU array 218 holds bits for an N-way set associative cache. Set associative cache is a cache that has different data in a secondary memory that can map to the same cache entry. In an 8-way set associative cache, there are 8 different ways or sets per entry. Therefore, there can be 8 different data that map to the same entry. Each bit in LRU array 218 represents one interior node of a binary tree with leaves that represent the least recently used information for each way or set for the corresponding cache entry.
The illustrative embodiments are used to prolong the amount of time critical data structures and the corresponding data is retained in cache array 216. LRU control 220 controls aspects of the illustrative embodiments used to manage the data stored in cache array 216. Critical structure logic 226 contains the cache priority table which lists the critical data structures address, size, and starting priority. LRU array 218 includes a priority level or value, which starts at zero for non-critical data structures and uses the starting value from the cache priority table for critical data structures. For the addresses identified as critical by an application, LRU control 220 ages the data more slowly than normal. As a result, the critical data remains in cache array 216 longer than if the age of the data was increased at the normal rate.
In the illustrative embodiments, the information used to age critical data structures, such as address, size, and starting priority level of critical structures, may be specified by a cache priority subroutine and cache priority table as described in
Directory array 224 stores the cache coherence information, real address, and valid bit for the data in the corresponding cache entry in cache array 216. This array also has the same set-associative structure as cache array 216. For example, in an 8-way set associative cache, directory array 224 also has 8 ways. A way is also referred to as a set. This directory has a one-to-one match. Each time cache array 216 is accessed, directory array 224 will be accessed at the same time to determine if a cache hit or miss occurs and if the entry is valid.
Main memory unit 230 contains instructions and data that may be fetched or retrieved by processor system 200 for execution. In a case in which the data has not been fetched to cache array 216, bus control unit 232 performs as the traffic controller for the bus to arbiter requests and responses from the devices attached to the bus. For example, execution unit 210 may send a request and an address to memory subsystem 214 when a miss occurs in a L1 data cache (not shown) in execution unit 210. As a result, execution unit 210 causes L2 load and store queue control 222 to access LRU array 218, directory array 224 and cache array 216. The data in directory array 224 can be brought in by a cache miss in the L1 cache. Directory array 224 returns the data to indicate whether the data requested in the miss in the L1 cache is located in cache array 216, which serves as an L2 cache in this example. The data returned from directory array 224 includes a hit or miss; the data in the way of the cache entry is valid or invalid; and what memory coherence state of the entry, such as share, exclusive, modify. LRU array 218 returns LRU data to LRU control 220.
If a request for data results in a hit in directory array 224, LRU control 220 updates the LRU data stored in LRU array 218. In this case, cache array 216 contains the data and has no other information. Directory array 224 can be viewed as the array holding all other information in the cache array, such as address, validity, and cache coherence state. When there is an L1 cache miss request with address to access the directory and cache array, if the address matches with the address that is stored in the corresponding entry in directory array 224, that means a hit is present in the L2 cache array. Otherwise, a miss occurs. This update to the LRU data is the most and least recently used set in the L2 cache, cache array 216. LRU control 220 updates the LRU data from a binary tree scheme, described herein, by writing back to LRU array 218. Cache array 216 returns data to execution unit 210 in response to the hit on directory array 224.
A miss in directory array 224 results in execution unit 210 placing the request into L2 load and store control 222. Requests remain in this component until L2 load and store queue control 222 retrieves data from host bus 228. In response to this miss, LRU control 220 updates the LRU data from the binary tree scheme by writing back to LRU array 218. This update of LRU data contains the most and least recently used cache set in cache array 216. Once miss data returns to the L2 cache from host bus 228, LRU control 220 also forwards this data back to the L1 cache and execution unit 210.
In another illustrative embodiment, LRU control 220 and critical structure logic 226 may be implemented in a single LRU control element.
Turning to
Application programming interface (API) 306 allows the user of the system, an individual, or a software routine to invoke system capabilities using a standard consistent interface without concern for how the particular functionality is implemented. Network access software 308 represents any software available for allowing the system to access a network. This access may be to a network, such as a local area network (LAN), wide area network (WAN), or the Internet. With the Internet, this software may include programs, such as Web browsers.
Application software 310 represents any number of software applications designed to react to data through the communications port to provide the desired functionality the user seeks. Applications at this level may include those necessary to handle data, video, graphics, photos, or text which can be accessed by users of the Internet. The mechanism of the illustrative embodiments may be implemented within communication software 304 in these examples.
Application software 310 includes data structures 312. Some of data structures 312 are critical data structures 314. Critical data structures 314 are data structures that are critical for the performance of application software 310. As a result, critical data structures 314 need to stay in a cache, such as cache array 216 of
In one illustrative embodiment, a software application developer may specify critical data structures 314 within data structures 312 of application software 310. Information regarding the address, size, and priority level or critical rating of each of critical data structures 314 is stored in cache priority table 316. Application software 310 also includes a code or call to initiate cache priority subroutine 318 when application software 310 is started so that the values of cache priority table 316 may be stored in a hardware cache priority table. The hardware cache priority table may be part of LRU control 220 or critical structure logic 226 of
Operating system 302 includes cache priority subroutine 320 for calling the new cache priority hardware instruction. Syntax for cache priority subroutines 318 and 320 may be specified by:
The parameters of cache priority subroutines 318 and 320 may include address, size, and starting_priority, information which may be stored in cache priority table 316, which is further described in
Data structure address 402 is the starting address of a critical data structure, such as critical data structures 314 of
In one example, if the data is assigned starting priority 406 of two, the cache would age the data at half the rate as non-critical data. If the data were given a starting priority 406 or critical rating of ten, the cache would age the data at 1/10th the rate of non-critical data. If the data is assigned a starting priority 406 of one, the data may be aged like all other data in the cache without any preferential aging treatment. Correspondingly, cache priority level of zero may be used to indicate that the cache will be aged according to normal or default settings.
Next, the process determines whether any of the data is in an application cache priority table (step 504). Step 504 may be implemented by application software 310 using cache priority table 316, both of
If any of the data is in the application cache priority table, the process sets i=1 (step 508). Data in the application cache priority table indicates that the data has a cache priority level and is designated to be aged at a rate different from data that is not in the application priority table. Next, the process calls a “set_cache_priority” subroutine for row i (step 510). The cache priority subroutine sets the cache priority level for all critical data structures, such as critical data structures 314 of
Step 510 may be implemented by application software 310 using cache priority subroutine 318, both of
The illustrative embodiments provide a computer implemented method, apparatus, and computer usable program code for managing data in a cache. A priority level is established for critical data structures within an application. If the data has a priority level the priority level is used to prolong the time that the data of the critical data structures remain in the cache. By keeping the data from the critical data structures in cache for prolonged amounts of time, the application is able to achieve optimal performance by having more lasting access to critical data.
The illustrative embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. The illustrative embodiments are implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, the illustrative embodiments can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any tangible apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
The description of the illustrative embodiments have been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the illustrative embodiments in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the illustrative embodiments, the practical application, and to enable others of ordinary skill in the art to understand the illustrative embodiments for various embodiments with various modifications as are suited to the particular use contemplated.