Embodiments of the invention generally relate to stacked memory dies having volatile and non-volatile based memory dies, and chip packages containing the same.
The memory wall (i.e., bandwidth limitations) has been referred to as one of the key limiters in pushing the bounds of computation in modern systems. High bandwidth memory (HBM) and other stacked dynamic random-access memory (DRAM) memories have been proposed/enabled to alleviate off-chip memory access latency as well as increase memory density. In addition to the traditional DRAM roadmap, several other memories are being explored, that have not yet reached maturity for large scale manufacturing, e.g., technologies such as ferro-electric random-access memory (FeRAM), magneto resistive random-access memory (MRAM), phase-change memory (PCM), etc. During this technology enablement phase, it is crucial to not only examine how new technologies would “replace” the classic roadmap, but if it can aid/complement/address the limitations of existing DRAM without adding much complexity, or how it can be used together with existing technologies to enhance specific properties to achieve superior system on a chip (SoC) performance and power efficiency.
Memory wall problems are currently being tackled by the industry with HBM-like solutions. Stacked DRAM and HBM as described in the JEDEC Solid State Technology Association (e.g., JEDEC) specifications address memory bandwidth and latency issues by replacing long off-chip connections with stacked (e.g., connected through silicon interposers) memory closer to the logic die. However, there exist yield challenges and overhead due to the non-linear power increase with memory capacity increase. Additionally, 3D stacking on logic, brings new thermal challenges that can negatively impact retention in DRAM. On the other hand, other non-volatile memory (NVM) technologies like logic-compatible FeRAM do not have refresh-requirements and can tolerate high temperature; but suffer with scalability/large capacity and wearout, while static random-access memory (SRAM) is a faster but leaky memory system. Current solutions do not fully utilize hybrid memory systems as disclosed by the inventors herein to take advantage of the unique properties of each memory type; hence do not maximize performance/power-efficiency potential.
Non-volatile main memories such as FeRAMs, MRAMs, and volatile memories such as DRAMs (including HBM and other stacked variants of DRAM) are being considered and traded-off for achieving higher memory density, performance, and lower power.
DRAMs have been the most popular off-chip memory, however, even Double Data Rate 5 Synchronous DRAM (DDR5) has certain Performance-Power-Area (PPA) limitations of having to going off-chip to access data. The typical DRAM bitcell consists of a one transistor and one capacitor (1T-1C) structure where the capacitor is formed by a dielectric layer sandwiched in between conductor plates. System interprocess communication (IPC) is often limited by DRAM bandwidth and latency, especially in memory-heavy workloads. HBM has been introduced to provide increased bandwidth and memory density, allowing up to 8-12 layers of DRAM dies to be stacked on top of each other with an optional logic/memory interface die. This memory stack can either be connected to the CPU/GPU through silicon interposers (
FeRAM is like 1T-1C DRAM, except for that the capacitor is made of a ferroelectric material versus a (linear) dielectric as used in DRAM. Bit ‘0’ and ‘1’ are written with electric polarization orientations of the ferroelectric material in the dielectric. The benefit of this technology is refresh-free storage which has potential to offer more density and performance over DRAM.
MRAM on the other hand has a one transistor and one resistor (1T-1R) bitcell. Unlike DRAM and FeRAM, MRAM does not have a destructive read. However, MRAM is less reliable compared to FeRAM and has lower endurance and retention.
Typically, the memory technology is developed and “optimized” as an independent macro or for specific applications like deep neural networks (DNN), in the HBM case. Albeit some advancements like graphics double data rate (GPDDR) vs double data rate (DDR) have been developed to support high-bandwidth memory for graphics applications. More fine-grained optimizations of memory technology with logic technology and architecture are not deeply explored, and there is much to do to achieve superior performance and lower power products. Non-linear power increase and decreasing improvement in performance and memory density from generation to generation requires more design and co-optimization to push alleviate the memory bottleneck.
Disclosed wherein stacked memory dies that utilize a mix of high and low operational temperature memory and non-volatile based memory dies, and chip packages containing the same. High temperature memory dies, such as those using non-volatile memory (NVM) technologies are in a memory stack with low temperature memory dies, such as those having volatile memory technologies. In some cases, the high temperature memory technologies could be used together, in some cases, on the same IC die as logic circuitry. In one example, a memory stack is provided that include a first memory IC die having high temperature memory circuitry, such as non-volatile memory, stacked below a second memory IC die. The second memory IC die has high temperature memory circuitry, such as volatile memory circuitry.
In another example, a memory stack is provided that includes a first memory IC die stacked on a second memory IC die. The first memory IC die includes memory circuitry that requires more frequent refresh rates as compared the second memory IC die. In some other examples, the first memory IC die includes memory circuitry operational at temperatures above 110 degrees Celsius without increased refresh rates as compared to operation at 95 degrees Celsius. The second memory IC die includes memory circuitry requiring increased refresh rates at temperatures above 110 degrees Celsius as compared to operation at 95 degrees Celsius.
In another example, a memory stack is provided that includes a first memory IC die stacked on a second memory IC die. The first memory IC die includes ferro-electric random-access memory (FeRAM). The second memory IC die includes dynamic random-access memory (DRAM) circuitry.
In yet another example, a chip package having a memory stack mounted to a package substrate is provided. The memory stack that includes a plurality of first memory IC dies stacked on a second memory IC die. The second memory IC die includes ferro-electric random-access memory (FeRAM) circuitry and optionally controller circuitry. The second memory IC die is stacked on the package substrate. The plurality of first memory IC dies includes DRAM circuitry.
Also disclosed herein are non-volatile memory (NVM) technologies, that may be utilized in a memory stack with volatile memory technologies. In some cases, the NVM technologies could be used together, in some cases on the same IC die as logic circuitry. Exploitation of specific properties of each of the technologies, in the stacked memory subsystem, can beneficially result in differentiated SoC performance.
In one example, a memory stack is provided that include a first memory IC die having non-volatile memory (NVM) circuitry stacked below a second memory IC die. The second memory IC die has volatile memory circuitry.
In another example of a memory stack, one IC memory die of the memory stack includes ferro-electric random-access memory (FeRAM) or static random-access memory (SRAM) circuitry, while another IC memory die of the memory stack includes volatile memory circuitry.
In another example of a memory stack, one IC memory die of the memory stack includes ferro-electric random-access memory (FeRAM) or static random-access memory (SRAM) circuitry, while another IC memory die of the memory stack includes dynamic random-access memory (DRAM) circuitry.
In another example of a memory stack, first processing in memory (PIM) circuitry is disposed in a second memory IC die of the memory stack while a second PIM circuitry is disposed in the third memory IC die of the memory stack.
In another example of a memory stack, a first buffer IC die disposed between one pair of memory IC dies, a second buffer IC die disposed between another pair of memory IC dies.
In another example, memory stack includes a first memory IC die comprising first ferro-electric random-access memory (FeRAM) circuitry and first processing in memory (PIM) circuitry, a second memory IC die stacked on the first memory IC die, and a third memory IC die stacked on the first memory IC die. The second memory IC die includes second FeRAM circuitry and second PIM circuitry. The third memory IC die includes third FeRAM circuitry and third PIM circuitry.
In yet another example, a chip package is provided that includes a hybrid memory stack mounted on a substrate. The hybrid memory stack include both volatile and non-volatile memory IC dies.
The disclosure herein addresses specific challenges of stacked DRAM subsystem with hybrid memory 3D organization and logic codesign. The disclosed technology defines various methods, systems and devices to design compute- and application-aware advanced memory-based systems. Generally, disclosed are memory die stacks that utilize a mix of high and low operational temperature memory dies such that the high temperature die may function as a temperature buffer with adjacent heat generating logic dies. In particular, disclosed are memory die stacks that utilize a mix of volatile and non-volatile based memory dies, one example of which are stacked DRAM and FeRAM based memory die.
The memory stack 102 generally includes at least one low temperature memory integrated circuit (LTMIC) die 104 stacked with at least one high temperature memory IC (HTMIC) die 106. The space shown in
In still other examples, the HTMIC die 106 has an operational temperature that is greater than the operational temperature of the LTMIC die 104. Defined differently, the HTMIC die 106 is a memory die that can operate at temperatures above 110 degrees Celsius without having to increase the refresh rate (as compared to operation at 95 degrees Celsius). An example of a HTMIC die 106 is a ferromagnetic random-access memory (FeRAM) die. Other examples of HTMIC die 106 include non-volatile memory dues such magnetoresistive random-access memory (MRAM), phase-change memory (PCM), flash memory, and resistive random-access memory (RRAM), among others.
The memory stack 102 may optionally include at least one controller IC die 120 stacked with the LTMIC die 104 and the HTMIC die 106. The IC dies 104, 106, 120 may be electrically and mechanically connected by solder balls and/or hybrid bonding techniques, such that the functional circuitries with in the IC dies 104, 106, 120 can communicate with each other and/or transmit data signals, power and/or ground therethrough.
The functional circuitries within the IC memory dies 104, 106 are arranged into multiple memory banks. Each bank has multiples rows and each row has multiple columns. Residing at each unique memory location within a bank is a memory cell. The memory cell may be addressed using its unique identifying row and cell location within a particular bank of the memory dies 104, 106.
The functional circuitries within the IC dies 104, 106, 120 are coupled to the functional circuitry of the compute/processor IC die 108 via routings 112 formed in the interposer 110. The routings 112 of the interposer 110 are connected to the functional circuitries with in the IC dies 104, 106, 120 via solder connections 118. The routings 112 of the interposer 110 also connect to the functional circuitries of the IC dies 108, 120 to routing 122 formed in the package substrate 114 via the solder connections 118. Solder balls 116 are utilized to connect the routings 122 of the package substrate 114 with routing 124 formed in the PCB 136.
In other embodiments where an interposer is not present, the chip stack 102 and IC die 108 may be mounted directly to the package substrate 114.
The IC die 120 is generally a heat generating device. That is, the IC die 120 generates heat when in use. As the performance of the LTMIC die 104 may be diminished due to the heat generated by the IC die 120, performance of the chip package 100 is enhanced by separating the LTMIC die 104 from the heat generating IC die 120 by one or more HTMIC dies 106. Since the HTMIC die 106 is generally more heat resistant than the LTMIC dies 104, the HTMIC die 106 can be located adjacent the heat generating IC die 120 without significant reduction in performance while enabling the LTMIC dies 104 that are significantly spaced from the heat generating IC die 120 to also maintain robust levels of performance.
In the example depicted in
The controller IC die 120 include functional logic circuitry provides commands that enable the row and column identifying each bank of the memory dies 104, 106 to be addressed. The controller IC die 120 controls the write/read operation from each memory bank.
The HBM memory can be put in low power modes by row address bus to save power on the I/O drivers. To further reduce power consumption, clocks can be gated when in power-down or self-refresh modes.
In the example depicted in
Sandwiched between the LTMIC dies 104 and the controller IC die 120 in the HBM cube 204 is one or more HTMIC dies 106. In
The memory circuitry 222 of the HTMIC die 106 has an operational temperature that is greater than the operational temperature of the memory circuitry 220 of the LTMIC die 104. Defined differently, the memory circuitry 222 of the HTMIC die 106 is memory circuitry that can operate at temperatures above 110 degrees Celsius without having to increase the refresh rate (as compared to operation at 95 degrees Celsius), while the memory circuitry 220 of the LTMIC die 104 is memory circuitry that cannot operate at temperatures above 110 degrees Celsius without having to increase the refresh rate (as compared to operation at 95 degrees Celsius)
In one example, the HTMIC die 106 is a non-volatile random-access memory IC die that has faster refresh speed as compared to the LTMIC dies 104 that have volatile random-access memory. Thus, in addition to the HTMIC die 106 performing better than the LTMIC dies 104 when placed closer to the controller IC die 120, the faster refresh speed enables faster communication with the controller IC die 120, which beneficially reduces latency within the IC die stack 202, and ultimately, the chip package 100 and the electronic system 180.
In the example depicted in
The advanced memory technology roadmap targets increased memory density and bandwidth, with minimal impact to power and performance to alleviate the memory-bottleneck to system performance. With the advancement in memory technology, memory stacking and novel non-volatile memories like FeRAMs, updating circuitry, architecture and memory interfacing principles keeping in pace with the memory technology itself is imperative. Improvements to memory technology are described below that leverage enhancements specific to HBM/other forms of stacked high temperature memory by integrating low temperature memory technology to create hybrid memory stacks. In one example, stacked DRAM memory may be integrate FeRAM based memory to form a hybrid memory stack or hybrid memory-logic assembly.
Hybrid memory and hybrid memory-logic assembly are disclosed that utilizes a mix of memory technologies that can be stacked, for example on top of a logic die. For example, a hybrid memory cube with a non-volatile memory (such as FeRAM and the like) IC die and a volatile memory (such as DRAM IC and the like) die has the HTMIC die 106 beneficially disposed closest to the logic IC die 120, as the HTMIC die 106 can tolerate higher heat dissipated from the logic-die, while the LTMIC dies 104 disposed on top of the HTMIC die 106 could be placed closer to the heat spreader in the chip package (such as the chip package 100 depicted in
Alternatively, the 10/SA logic on each memory die 104 could be separated into a buffer IC die 422, to achieve higher performance and yield, and could also include FeRAM memory blocks that are logic compatible. Such an example is illustrated in
Memory stack and technology may be selected and design in a “hierarchical” manner (hardware managed cache), to use the faster memory/or memories not requiring refresh, closer to the logic IC die 120, to act as an “intermediate” layer, to transfer data to more dense, slower memories on the upper tiers of LTMIC dies 104, away from the logic IC die 120. This could help hiding latency, and overhead due to refresh needed for LTMIC dies 104, such as DRAM IC dies, on the top of the memory die stack 420.
In cases where FeRAM or other non-volatile memory dies are used with multi-bit cell storage, (e.g., NAND flash stores multiple bits in one cell), wear out can become a concern, since each cell will be accessed “n” times, where “n” is the number of bits in a single cell, compared to a single bit cell scenario. Hence, DRAM/SRAM and other volatile memories which have enhanced endurance can be used as a “standby” or hardware managed cache for the non-volatile memory dies. This allows multiple writes (termed write levelling) to be combined to a single write into the non-volatile memory multi-bit cell, which beneficially reduces the number of writes to a single cell in the non-volatile memory IC die and increases the lifetime of the non-volatile memory circuitry. Reads may be combined in substantially the same manner.
The local storage circuitry 506 of the PIM circuitry 502 may be FeRAM and/or embedded DRAM (eDRAM). Advantageously, the FeRAM and/or eDRAM based local storage circuitry 506 generally are low leakage logic-compatible storage as compared to local logic-based high leakage registers. This allows area scaling of PIM circuitry 502 and reduced leakage compared to conventional PIM using registers in the logic-based logical storage. Thus, the amount of processing in memory may be increased within the same IC die area allocated for the PIM circuitry 502.
This can also be used towards more fine-grained PIM (e.g., at a sub-bank level; currently PIM is performed at a bank level) or increased memory density due to reduced PIM area.
Referring first to
In
In
In
This application claims benefit from U.S. Provisional Patent Application No. 63/405,347, filed Sep. 9, 2022, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63405347 | Sep 2022 | US |