The field of the invention pertains generally to the computing sciences and, more specifically, to an Input/Output layout footprint for multiple 1Level Memory/2Level Memory configurations.
System memory, also referred to as main memory, is a pertinent part of a computing system as it holds the program code instructions and data of the software that is actively being executed by the computing system's Central Processing Units and may also be used to store other pertinent data that is actively being used or may soon be actively used by another component within the computing system. As such, system designers are highly motivated to improve system memory raw performance, cost performance and/or power consumption performance.
A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:
1.0 Multi-Level System Memory
One of the ways to improve system memory performance is to have a multi-level system memory.
Here, architectural closeness or farness may be implemented in various ways. In one instance, the near memory devices 113 are allocated their own unique system memory address space which is understood (e.g., by lower level software such as a basic input/output system (BIOS), firmware, a virtual machine monitor, one or more virtual machines, one or more operating system instances, etc.) to be higher priority than other system memory addresses.
Here, the lower access times of the near memory 113 causes higher priority and/or higher performance program code to be allocated system memory address space in near memory 113 so that the program code executes from the faster memory device. By contrast, the slower far memory 114 is allocated system memory address space that is not higher priority and therefore lesser priority and/or lower performance program code is allocated far memory system memory address space so that such program code executes from the slower memory devices.
Another example of architectural closeness or farness is realized if near memory 113 is implemented as a cache. According to various embodiments, a smaller near memory 113 may be utilized as a cache for a larger far memory 114. Here, near memory 113 is used to keep, e.g., an additional copy of those data items stored in far memory 114 that are expected to be more frequently called upon by the computing system. With the near memory cache 113 having lower access times than the slower far memory 114 region, the multi-level system memory 112 will be observed as faster because the system will often read items or write to items that are being stored in the faster near memory cache 113.
In the case where near memory 113 acts as a “memory side” cache for far memory 114, the near memory 113 may be used slightly differently than a CPU level cache in that the entries (e.g., cache lines) of data that are more frequently written to and/or read from system memory 112 are present in near memory 113, whereas, in the case of a CPU level cache, entries of data that are more frequently written to and/or read by the processing cores/CPU 117 are present in a CPU level cache (computing systems often have multiple levels of CPU caches where cache lines that are most frequently accessed by the cores/CPU 117 are kept in the highest level CPU caches while lesser accessed cache lines are kept in lower level CPU caches). The distinction between a memory side cache and a CPU level cache is particularly noticeable if the computing system includes components other than the cores/CPU 117 that heavily use system memory (e.g., a graphics processor). Conceivably, some or all of near memory 113 could be used to implement a last level CPU cache. Different types of near memory caching architectures are possible (e.g., direct mapped, set associative, etc.).
In yet other approaches, the multi-level system memory 112 embraces both techniques above in which some of the storage space of near memory 113 is allocated unique (e.g., higher priority) system memory address space whereas other storage space of near memory 113 is used to implement a memory side cache for far memory 114.
According to various embodiments, near memory 113 exhibits reduced access times by having a faster clock speed than far memory 114. Here, near memory 113 may be implemented with faster (e.g., lower access time), volatile system memory technology (e.g., high performance dynamic random access memory (DRAM)) and/or static random access memory (SRAM) memory). By contrast, far memory 114 may be implemented with a volatile memory technology having a slower clock speed (e.g., a DRAM component that receives a slower clock) or, e.g., a non-volatile memory technology that may be slower (e.g., longer access time) than volatile/DRAM memory (or whatever technology is used for near memory).
For example, far memory 114 may be comprised of an emerging non-volatile random access memory technology such as, to name a few possibilities, a phase change based memory, a three dimensional crosspoint memory technology, or other byte addressable nonvolatile memory devices such as memory devices that use chalcogenide glass, multi-threshold level NAND flash memory, NOR flash memory, single or multi-level Phase Change Memory (PCM), a resistive memory, nanowire memory, ferroelectric transistor random access memory (FeTRAM), anti-ferroelectric memory, magnetoresistive random access memory (MRAM) memory that incorporates memristor technology, resistive memory including the metal oxide base, the oxygen vacancy base and the conductive bridge Random Access Memory (CB-RAM), or spin transfer torque (STT)-MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thiristor based memory device, or a combination of any of the above, or other memory. The memory device may refer to the die itself and/or to a packaged memory product.
In some embodiments, three dimensional crosspoint memory may comprise a transistor-less stackable cross point architecture in which memory cells sit at the intersection of word lines and bit lines and are individually addressable and in which bit storage is based on a change in bulk resistance.
Such emerging non-volatile random access memory technologies typically have some combination of the following: 1) higher storage densities than DRAM (e.g., by being constructed in three-dimensional (3D) circuit structures (e.g., a crosspoint 3D circuit structure)); 2) lower power consumption densities than DRAM (e.g., because they do not need refreshing); and/or, 3) access latency that is slower than DRAM yet still faster than traditional non-volatile memory technologies such as FLASH. The latter characteristic in particular permits various emerging byte addressable write-in-place non-volatile memory technologies to be used in a main system memory role rather than a traditional mass storage role (which is the traditional architectural location of non-volatile storage). Being system memory devices rather than traditional mass storage devices, such emerging non-volatile random access memory devices are also byte addressable (e.g., a cache line of data can be updated/written to with just a byte of information) instead of being limited to block based accesses. In various embodiments, in the case where far memory 114 is non-volatile, battery backed up DRAM may also be used (but may have, e.g., a slower clock than near memory DRAM devices).
Regardless of whether far memory 114 is composed of a volatile or non-volatile memory technology, in various embodiments far memory 114 acts as a true system memory in that it supports finer grained data accesses (e.g., cache lines) rather than larger based accesses associated with traditional, non-volatile mass storage (e.g., solid state drive (SSD), hard disk drive (HDD)), and/or, otherwise acts as an (e.g., byte) addressable memory that the program code being executed by processor(s) of the CPU operate out of.
Far memory devices may be coupled to their own unique memory channel that emanates from the main memory controller 116, or, both near memory devices and far memory devices may be coupled to a same memory channel that emanates from the main memory controller 116. In either of these approaches, the memory channel may be an industry standard system memory channel such as a double data rate (DDR) memory channel published by an industry standards organization (such as the Joint Electron Device Engineering Council (JEDEC) s uch as JESD79F for DDR SDRAM, JESD79-2F for DDR2 SDRAM, JESD79-3F for DDR3 SDRAM, JESD79-4A for DDR4 SDRAM, JESD209 for Low Power DDR (LPDDR), JESD209-2 for LPDDR2, JESD209-3 for LPDDR3, and JESD209-4 for LPDDR4 (these standards are available at www.jedec.ord). Such standards (and similar standards) may be referred to as DDR-based standards and communication interfaces of the storage devices that implement such standards may be referred to as DDR-based interfaces. The near memory and/or far memory devices may be integrated in a same multi-chip module as the processing cores/CPU 117 and/or main memory controller 116, or, may be integrated outside such a module or other package that contains the cores/CPU 117 and/or main memory controller 116.
The far memory devices 114 may be coupled directly to a far memory controller (not shown in
The host side memory controller may be able to communicate to multiple far memory controllers and corresponding far memory devices as memory expansion “plug-ins”. In various embodiments, the memory expansion plug-in solutions may be implemented with point-to-point links (e.g., one PCIe link per plug-in) or multi-drop bus (e.g., industry standard DDR memory channel). Non expanded far memory (provided as part of the basic original system) may or may not be implemented with point-to-point links.
The same types of non-volatile memory devices that are used to implement far memory 114 may also be used to implement block accessible mass storage, e.g., as a form of solid state drive (SSD) within the larger computing system.
2.0 I/O Layout Footprint for Multiple 1LM/2LM Configurations
Although there exist various motivations for a multi-tiered system memory, as the technology is currently emerging, there remains substantial expected demand for traditional single level main memory (1LM) systems. As such, system designers may desire to develop underlying hardware platforms that support either 1LM or 2LM configurations. For example, a computing system manufacturer or computing system component manufacturer may desire to have a common 1LM/2LM platform that can be retrofitted to implement either a 1LM or 2LM system. Here, for instance, customers may order 1LM or 2LM systems and the manufacturer need only populate 1LM components or 2LM components on the same common platform to serve the customer orders.
An upper package 302 is mounted on the bottom package 301. The upper package 302 is composed of an upper substrate 302_1 and semiconductor chip 302_2 that is hermetically sealed in an upper encapsulant 302_3. Typically, pads on the upper surface of the upper substrate 302_1 align and are soldered to pads or micro-balls on the underside of the upper semiconductor chip 302_2 to form electrical I/O connections between the upper substrate 302_1 and the upper semiconductor chip 302_2. As such, both the upper semiconductor chip 302_2 and the upper encapsulant 302_3 are fixed to the upper substrate 302_1.
The substrates for both the upper and lower packages 301, 302 are typically composed of PC board material (e.g., FR4, ceramic, phenolic, etc.). The encapsulants for both packages 301, 302 are typically composed of a plastic or epoxy.
I/O solder balls 303 on the underside of the upper substrate 302_1 bond to pads that are formed on the top side of the lower substrate 301_1. Solder balls 304 to implement I/O connections to a lower system substrate 305 for both the upper and lower packages 301, 302 are mounted on the underside of the lower substrate 301_1. The system substrate 305 may correspond, e.g., to a computing system motherboard, riser plane for a CPU and/or system memory complex, substrate of a multi-chip module, etc. The system substrate 305 is typically composed of PC board material.
The wiring within the lower substrate 301_1 of the POP structure 300 is designed to not only carry signals between the system substrate 305 and the lower semiconductor die 301_1 but also the system substrate 305 and the upper semiconductor die 301_2. Importantly, the wiring is viewed from the perspective that the entire POP structure 300 is a single component. As such, it is common to find I/O connections amongst the lower solder balls 304 for both the upper and lower packages 301, 302 to be uniformly dispersed on the underside of the lower substrate 301_1.
That is, for example, lower I/O solder balls 304 that connect to the system substrate 305 for establishing an electrical connection to the upper die 302_2 may reside directly beneath the lower die 301_2 and/or, lower I/O solder balls 304 that connect to the system substrate 305 for establishing an electrical connection to the lower die 301_2 may reside outside the periphery of the lower die 301_2 (e.g., around the outer periphery of the lower substrate 301_2). For example, lower solder ball 304_1 may be an I/O connection for the upper die 302_2 and lower solder ball 304_2 may be an I/O connection for the lower die 301_2.
By contrast,
For ease of drawing
The system substrate 405 may correspond, e.g., to a computing system motherboard, riser plane for a CPU and/or system memory complex, substrate of a multi-chip module, etc. The system substrate 405 may also correspond to the card/board material for a dual in line memory module (DIMM) that, e.g., plugs into a computing system main memory channel. The system substrate 405 is composed of PC board material in various embodiments.
According to the particular embodiment of
Regardless of the particular physical layout approach that is undertaken, it is pertinent to point out that lower packages and/or upper packages can be fitted to the layout of
Referring to
However, the overall layout and design of
Here, in the case of
The same principles as described above for the configuration of
Systems having different ratios of near memory space to far memory space can be implemented by populating more or less of the available memory channels of the system substrate with DRAM devices or non-volatile memory devices. For example, a system substrate having five unpopulated memory channels can populate one of the channels with DRAM devices and four of the channels with non-volatile memory devices, whereas, another instance of the same system substrate can populate two of the channels with the DRAM devices and three of the channels with non-volatile memory devices to effect a different ratio of near memory space to far memory space.
In yet another approach, such different ratios may be effected by populating more or less DRAM devices on a same memory channel. For example, a first footprint on a memory channel may be populated with a full combined POP structure as depicted in
In some implementations, less stacked non-volatile memory chips and/or stacked DRAM memory chips may be achievable in the upper and/or lower package structures respectively for the POP structure than for either package structure alone for thermal constraint reasons. Nevertheless, the I/O layouts for the system substrate and package substrates need not be affected. Also, although in embodiments above the DRAM devices were integrated within the lower package structure and the non-volatile memory devices were integrated within the upper package structure, by contrast, in alternative implementations the memory technologies may be swapped such that non-volatile memory devices reside in the lower package structure and DRAM devices reside in the upper package structure.
As observed in
An applications processor or multi-core processor 550 may include one or more general purpose processing cores 515 within its CPU 501, one or more graphical processing units 516, a memory management function 517 (e.g., a memory controller) and an I/O control function 518. The general purpose processing cores 515 typically execute the operating system and application software of the computing system. The graphics processing units 516 typically execute graphics intensive functions to, e.g., generate graphics information that is presented on the display 503. The memory control function 517, which may be referred to as a main memory controller or system memory controller, interfaces with the system memory 502. The system memory 502 may be a multi-level system memory, which, further, uses a common system substrate footprint and corresponding POP package structure as described at length above to implement various 2LM configurations and even 1LM configurations.
Each of the touchscreen display 503, the communication interfaces 504-507, the GPS interface 508, the sensors 509, the camera 510, and the speaker/microphone codec 513, 514 all can be viewed as various forms of I/O (input and/or output) relative to the overall computing system including, where appropriate, an integrated peripheral device as well (e.g., the camera 510). Depending on implementation, various ones of these I/O components may be integrated on the applications processor/multi-core processor 550 or may be located off the die or outside the package of the applications processor/multi-core processor 550. Non-volatile storage 520 may hold the BIOS and/or firmware of the computing system.
One or more various signal wires within the computing system, e.g., a data or address wire of a memory bus that couples the main memory controller to the system memory, may include a receiver that is implemented as decision feedback equalizer circuit that internally compensates for changes in electron mobility as described above.
Embodiments of the invention may include various processes as set forth above. The processes may be embodied in machine-executable instructions. The instructions can be used to cause a general-purpose or special-purpose processor to perform certain processes. Alternatively, these processes may be performed by specific hardware components that contain hardwired logic for performing the processes, or by any combination of programmed computer components and custom hardware components.
Elements of the present invention may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, FLASH memory, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of media/machine-readable medium suitable for storing electronic instructions. For example, the present invention may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
Number | Name | Date | Kind |
---|---|---|---|
5912839 | Ovshinsky et al. | Jun 1999 | A |
6035432 | Jeddeloh | Mar 2000 | A |
6292874 | Barnett | Sep 2001 | B1 |
7590918 | Parkinson | Sep 2009 | B2 |
7600078 | Cen et al. | Oct 2009 | B1 |
7756053 | Thomas et al. | Jul 2010 | B2 |
7913147 | Swaminathan et al. | Mar 2011 | B2 |
8051253 | Okin et al. | Nov 2011 | B2 |
8462537 | Karpov et al. | Jun 2013 | B2 |
8462577 | Zeng et al. | Jun 2013 | B2 |
8463948 | Qawami et al. | Jun 2013 | B1 |
8605531 | Kau | Dec 2013 | B2 |
8607089 | Qawami et al. | Dec 2013 | B2 |
8612676 | Dahlen et al. | Dec 2013 | B2 |
8612809 | Casper et al. | Dec 2013 | B2 |
8626997 | Qawami et al. | Jan 2014 | B2 |
8649212 | Kau et al. | Feb 2014 | B2 |
8838885 | Kwak | Sep 2014 | B2 |
8838935 | Hinton et al. | Sep 2014 | B2 |
9087584 | Dahlen et al. | Jul 2015 | B2 |
9342453 | Nale et al. | May 2016 | B2 |
9378133 | Nachimuthu et al. | Jun 2016 | B2 |
9378142 | Ramanujan et al. | Jun 2016 | B2 |
9430372 | Nachimuthu et al. | Aug 2016 | B2 |
9529708 | Puthiyedath et al. | Dec 2016 | B2 |
9600407 | Faber | Mar 2017 | B2 |
9600416 | Ramanujan et al. | Mar 2017 | B2 |
9619408 | Nale et al. | Apr 2017 | B2 |
9690493 | Dahlen et al. | Jun 2017 | B2 |
20050273584 | Wisecup et al. | Dec 2005 | A1 |
20070005922 | Swaminathan et al. | Jan 2007 | A1 |
20070255891 | Chow et al. | Nov 2007 | A1 |
20080016269 | Chow et al. | Jan 2008 | A1 |
20080034148 | Gower et al. | Feb 2008 | A1 |
20080082766 | Okin et al. | Apr 2008 | A1 |
20080235443 | Chow et al. | Sep 2008 | A1 |
20080270811 | Chow et al. | Oct 2008 | A1 |
20090119498 | Narayanan | May 2009 | A1 |
20090313416 | Nation | Dec 2009 | A1 |
20100131827 | Sokolov et al. | May 2010 | A1 |
20100291867 | Abdulla et al. | Nov 2010 | A1 |
20100293317 | Confalonieri et al. | Nov 2010 | A1 |
20100306446 | Villa et al. | Dec 2010 | A1 |
20100306453 | Doller | Dec 2010 | A1 |
20100318718 | Eilert et al. | Dec 2010 | A1 |
20110047365 | Hentosh et al. | Feb 2011 | A1 |
20110060869 | Schuette | Mar 2011 | A1 |
20110153916 | Chinnaswamy et al. | Jun 2011 | A1 |
20110291884 | Oh et al. | Dec 2011 | A1 |
20120317332 | Kwak | Dec 2012 | A1 |
20130268741 | Daly et al. | Oct 2013 | A1 |
20130275661 | Zimmer et al. | Oct 2013 | A1 |
20130282967 | Ramanujan | Oct 2013 | A1 |
20130290597 | Faber | Oct 2013 | A1 |
20140129767 | Ramanujan et al. | May 2014 | A1 |
20140297938 | Puthiyedath et al. | Oct 2014 | A1 |
20140367839 | Choi | Dec 2014 | A1 |
20170031821 | Ramanujan et al. | Feb 2017 | A1 |
20170139649 | Puthiyedath et al. | May 2017 | A1 |
20170249250 | Ramanujan et al. | Aug 2017 | A1 |
20170249266 | Nale et al. | Aug 2017 | A1 |
Number | Date | Country |
---|---|---|
1100540 | Mar 1995 | CN |
101079003 | Nov 2007 | CN |
101620539 | Dec 2013 | CN |
2005002060 | Jan 2005 | WO |
Entry |
---|
“Phase change memory-based ‘moneta’ system points to the future of computer storage”, ScienceBlog, Jun. 2, 2011, 7 pgs. |
“The Non-Volatile Systems Laboratory Coding for non-volatile memories”, http://nvsl.ucsd.edu/ecc, printed Sep. 1, 2011. 2 pgs. |
“The Non-Volatile Systems Laboratory Moneta and Onyx: Very Fast SS”, http://nvsl.ucsd.edu/moneta/, 3 pgs., Sep. 1, 2011. |
“The Non-Volatile Systems Laboratory NV-Heaps: Fast and Safe Persistent Objects”, http://nvsl.ucsd.edu/nvuheaps/, 2 pgs., Sep. 1, 2011. |
Akel et al., “Onyx: A Prototype Phase Change Memory Storage Array,” https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2011/Pr-oceedings.sub.--Chrono.sub.--2011.html, Flash Memory Summit 2011 Proceedings, Aug. 11, 2011. |
Bailey et al., “Operating System Implications of Fast, Cheap, Non-Volatile Memory” 13th USENIX, HOTOS11 2011, May 9-11, 2011, 5 pages. |
Caulfield et al., “Moneta: A High-performance Storage Array Architecture for Next-generation, Non-volatile Memories”, MICRO 43: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Atlanta, GA Dec. 2010 pp. 385-395. |
Chen et al, “Rethinking Database Algorithms for Phase Change Memory”, 5th Biennial Conference on Innovative Data Systems Research {CIDR '11 }, Jan. 9 2011, 11 pgs., Asilomar, California, USA. |
Condit et al, “Better 1/0 Through Byte-Addressable, Persistent Memory”, SOSP '09, Oct. 11, 2009, pp. 133-146. Big Sky, Montana, USA. |
Dhiman, et al.“PDRAM: A Hybrid PRAM and DRAM Main Memory System”, Jul. 26, 2009, Department of Computer Science and Engineering, 6 pages. |
Freitas et al., “Storage-class memory: The next storage system technology”, IBM J. Res. & Dev., Jul./Sep. 2008, pp. 439-447, vol. 52, No. 4/5. |
Jacob, “The Memory System You Can't Avoid It, You Can't Ignore It, You Can't Fake It,” Morgan & Claypool, Synthesis Lectures on Computer Architecture, vol. 4, No. 1, pp. 1-77, Jun. 2009. |
Kant, Dr. Krishna, “Exploiting NVRAM for Building Multi-Level Memory Systems”, InternationalWorkshop on Operating System Technologies for Large Scale NVRAM, Oct. 21, 2008, Jeju, Korea, 19 pages. |
Lee et al., “Architecting Phase Change Memory as a Scalable DRAM Alternative”, ISCA '09 Proceedings of the 36th Annual International Symposium on Computer Architecture, pp. 2-13, Jun. 20-24, 2009. |
Mearian, “IBM announces computer memory breakthrough Phase-change memory offers 100 times the write performance of NANO flash”, Jun. 30, 2011, 3 pgs. |
Mogul et al., “Operating System Support for NVM+DRAM Hybrid Main Memory”, 12th Workshop on Hot Topics in Operating Systems {HatOS XII), May 18, 2009, 9 pgs. |
Quereshi et al., “Scalable High Performance Main Memory System Using Phase-Change Memory Technology”, ISCA '09, Jun. 20, 2009, 10 pgs., Austin, Texas, USA. |
Raoux et al., “Phase-Change Random Access Memory: A Scalable Technology,” IBM Journal of Research and Development, vol. 52, Issue 4, pp. 465-479, Jul. 2008. |
Wu et al., “eNVy: A Non-Volatile, Main Memory Storage System,” ASPLOS VI Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, 12 pages, Oct. 1994. |
Number | Date | Country | |
---|---|---|---|
20190006340 A1 | Jan 2019 | US |