A recent trend to pack more functions into a small form factor is a so-called system-in-package (SiP) technology which is to enclose a number of integrated circuit (IC) dies in a single package or module. The dies may be stacked vertically or placed horizontally alongside one another inside the package. They are internally connected by fine wires that are buried in the package, or joined by solder bumps through a flip-chip technology.
Referring to
These SiPs can greatly extend cache capacity in a computer system. But with added levels of caches, memory management becomes more complicated.
Cache memories work like temporary storages. When the processing unit 246 wishes to read or write to a location in the main memory 220, it first checks whether that memory location is in the Level 1 cache 242. This is accomplished by comparing the address of the memory location to all tags stored in the Level 1 cache 242 that might contain that address. If the processing unit 246 finds that the memory location is in the cache, then the data corresponding to the address will be accessed directly from the Level 1 cache 242, and a cache hit will have occurred. Otherwise the data is not in the Level 1 cache 242, and it is a cache miss.
SiP extends computer cache capacity; however, with the aforementioned hierarchical memory management approach, the Level 2 cache 230 cannot be simultaneously checked with the Level 1 cache 242. The execution unit 246 can only check the Level 1 cache 242 directly. For Data to be accessed, they have to be transferred to the lower memories in the hierarchy. This lowers memory management efficiency.
As such, what is desired is a memory management system and method that can simultaneously check multiple memories either in the same or different levels, and hence directly accesses data stored in those memories.
A memory system for use in a system-in-package device (SiP) is disclosed. The memory system includes two cache memories. The first cache memory is on a first die of the SiP and the second cache memory is on a second die of the SiP. Both cache memories include tag random access memories (RAMs) corresponding to data stored in the corresponding cache memories. The second cache memory is of a different cache level from the first cache memories. Also, the first cache memory is on a first die of the SiP, and the second cache memory includes a first portion on the first die of the SiP, and a second portion on a second die of the SiP. Both cache memories can be checked concurrently for data availability by a single physical address.
The construction and method of operation of the memory system, however, together with additional objectives and advantages thereof will be best understood from the following description of specific embodiments when read in connection with the accompanying drawings.
The drawings accompanying and forming part of this specification are included to depict certain aspects of the invention. A clearer conception of the invention, and of the components and operation of systems provided with the invention, will become more readily apparent by referring to the exemplary, and therefore non-limiting, embodiments illustrated in the drawings, wherein like reference numbers (if they occur in more than one view) designate the same elements. The invention may be better understood by reference to one or more of these drawings in combination with the description presented herein. It should be noted that the features illustrated in the drawings are not necessarily drawn to scale.
The present disclosure describes a memory management system and method that can simultaneously check multiple caches either in the same level or in different levels, and hence directly accesses data stored in the caches.
When the physical address 302 is checked against the cache 308, the 9 index bits 304 are used to select a tag line 322 in the tag RAM 310. First is to check the attribute bits 324 of the selected tag line by a block 330. The modified bit may indicate whether this line of data has been modified or not and determines any line update when it is swapped back to a hard disk. Any match result may be ignored if the invalidate bit is set. The block 330 may be implemented as a multi-bit comparator circuit. After all the attribute bits are checked, the output of the tag portion may be compared with the tag bits 303 of the physical address 302 also at the block 330. If the comparison produces a match, then a chunk of data the physical address 302 intends to address is stored in the cache 308 and can be fetched directly, i.e., a cache hit has occurred.
In fact, the cache 308 illustrated in
Because different bit fields of the physical address 402 are used by different caches 410 and 420, the same physical address can reach completely different line of tag RAMs with totally different tags, in such a way, the two caches 410 and 420 can be checked concurrently for data availability by the single physical address 402.
As both the first and second caches 410 and 420 are implemented in two-way set association, two pairs of hit signals, Hit0[1:2] and Hit1[1:2] may be produced between them, and are sent to a control logic circuit 430 which controls a multiplexer 440. If one of the signals Hit0[1] and Hit1[1] is hit, then the multiplexer 440 will output a chunk of line[1] data from the first cache 410. Similarly, if one of the signals Hit0[2] and Hit1[2] is hit, then the multiplexer 440 will output a chunk of line[2] data from the second cache 420.
Although only two-way set association is described here, one having skill in the art would recognize that any other way set association may work with the present invention.
Referring to
There should be internal/external cache placement algorithms to prevent both caches 410 and 420 from storing the same line. One embodiment is to use random replacement, namely, the physical address is randomized through LFSR (Linear Feedback Shift Register) algorithm to generate a bit. Select internal cache occurs when this bit is set or external if not set. Another embodiment is to use a portion of physical addresses to determine accessing internal or external caches. For example, according to the physical address, the lowest 8 KB in a page will be assigned to internal cache. Others will be assigned to external cache.
Since off-chip memories have longer inter-connects to a mother die, a stacked cache may be slower than an on-die cache. Therefore, the stacked cache may need longer latency than the on-die cache.
The controls of stacked caches remain better on die, while the stacked memory only provides additional data storage. The tag for the stacked memory may or may not be on die, though it makes more sense to remain on die due to the number of logic involved in cache operations. With this concurrent accessing method, there is more freedom in the way of building a SiP chip.
Referring to
Referring to
Although the present disclosure uses cache memories as an embodiment of the present invention, one having skill in the art would appreciate the present invention can be applied to memory systems where multiple modules exist and tags are used for keeping track of the data stored in the modules.
The above illustration provides many different embodiments or embodiments for implementing different features of the invention. Specific embodiments of components and processes are described to help clarify the invention. These are, of course, merely embodiments and are not intended to limit the invention from that described in the claims.
Although the invention is illustrated and described herein as embodied in one or more specific examples, it is nevertheless not intended to be limited to the details shown, since various modifications and structural changes may be made therein without departing from the spirit of the invention and within the scope and range of equivalents of the claims. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the invention, as set forth in the following claims.
This is a continuation of U.S. Ser. No. 11/724,568 filed Mar. 15, 2007, the entire disclosure of which is hereby incorporated by reference. This invention relates generally to computer memory architectures, and, more particularly, to a system and method for extending memories in stacked chips with multicore microprocessors.
Number | Name | Date | Kind |
---|---|---|---|
5276826 | Rau | Jan 1994 | A |
5649154 | Kumar et al. | Jul 1997 | A |
5678020 | Singh | Oct 1997 | A |
5903908 | Singh et al. | May 1999 | A |
6282614 | Musoll | Aug 2001 | B1 |
6397296 | Werner | May 2002 | B1 |
6412038 | Mehalel | Jun 2002 | B1 |
6427188 | Lyon | Jul 2002 | B1 |
6430655 | Courtright | Aug 2002 | B1 |
6848031 | Jourdan | Jan 2005 | B2 |
20030154345 | Lyon | Aug 2003 | A1 |
20040098540 | Itoh et al. | May 2004 | A1 |
20040162971 | Joy et al. | Aug 2004 | A1 |
20040268048 | Homewood | Dec 2004 | A1 |
20050033920 | DeLan | Feb 2005 | A1 |
Entry |
---|
B. Black et al., “Die Stacking (3D) Microarchitecture,” 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), Orlando, FL, 2006, pp. 469-479. (Year: 2006). |
“Chapter 7 Large and Fast: Exploiting Memory Hierarchy.” in: David A. Patterson and John L. Hennessy, Computer Organization and Design, 3rd Edition (Elsevier, 2005), pp. 466-561. (Year: 2005). |
Bryan Black et al., “Die Stacking (3D) Microarchitecture,” 2006, IEEE Computer Society, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 469-479. |
Number | Date | Country | |
---|---|---|---|
20150363314 A1 | Dec 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11724568 | Mar 2007 | US |
Child | 14835988 | US |