Implementations described herein generally relate to integrated circuit (IC) memory devices, such as chip packages among others, having stacked IC memory dies, and in particular, stacked memory having operational logic located remotely from the stacked IC memory dies.
Large dynamic random access memory (DRAM) rows require increased activation energy, thus limiting the number of in-flight row activation commands and reducing irregular bandwidth. Conventional solutions have introduced additional circuitry in the DRAM die to reduce DRAM row size, thus decreasing capacity and increasing area overhead.
Traditional DRAM banks activate each row by sending a signal through the wide master word line (MWL) to four local word lines (LWLs) connected via local word line drivers (LWD). Each LWL is selected by a distinct LWLSel signal, routed underneath each bank. To select one of four rows under a MWL, a pre-decoded signal is sent via the LWLSel to drive one of the four LWLs, activating a wide DRAM row.
Fine-grained DRAM divides a bank into two grains and routes a grain select (GrSel) signal under the bank. Each GrSel signal governs half of the LWDs, selected by a circuit that performs a logical AND operation between LWLSel and GrSel. This approach reduces the row size compared to traditional DRAM. Half-DRAM divides each wordline into two halves and activates one half of memory cells in odd mats and the other half in even mats. This implementation introduces circuitry in the DRAM die and changes the wiring inside a DRAM mat, which increases area overhead. Staggered LWL wiring, used in modern DRAM bank implementations, requires undesirably doubling the number of LWDs, further increasing area overhead.
Half Page Row introduces circuitry within the LWD for row segmentation logic, which reduces the number of LWDs by a factor of two and spans the LWLs to two DRAM mats instead of one. However, this approach also incurs additional area overhead due to increased LWD area. Additionally, column select lines (CSL) connects to twice as many bitlines within a MAT. Similarly, the number of local data lines (LDLs) also double to maintain burst length. These changes increase the capacitance of wires, thus increasing the value of timing parameters such as tRC, tRP, tRAS, and tRCD, all of which reduce the irregular bandwidth.
Therefore, there is a need for improved stacked memory device.
Integrated circuit (IC) memory devices, such as chip packages among others, having stacked IC memory dies, and in particular, stacked memory having operational logic located remotely from the stacked IC memory dies, along with methods for fabricating the same are provided herein. In one example, an IC memory device is provided that includes a memory die stack coupled with a non-memory IC die. The memory die stack includes at least two or more stacked memory IC dies form. The non-memory IC die contains in-die logic circuitry that has an output routed to circuitry of the memory IC dies through vertical wiring passing through the memory die stack.
In another example, an integrated circuit (IC) memory device is provided that includes a substrate, at least two or more memory (IC) dies stacked on the substrate to form a memory die stack, and, a non-memory IC die. The non-memory IC die contains row segmentation logic circuitry having an output routed to corresponding wordline drivers of the memory IC dies through vertical wiring passing through the memory die stack.
In yet another example, a method for operating a memory device is provided. The method includes transmitting, from a row decoder circuitry located in a non-memory IC die to a memory IC die stacked therewith, a signal causing a portion of memory cells coupled to a common word line to be selected; and reading bits from the selected memory cells.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements of one embodiment may be beneficially incorporated in other embodiments.
Examples described herein can be beneficially utilized in high bandwidth (HBM) based 3D-stacked memory. However, the disclosed technology can also be adapted other 3D-stacked memory with sufficient vertical connectivity. Non 3D-stacked memory can also implement some of the logic in a separate chip and route the additional wires. Some examples of the disclosed technology free space within the memory IC die by moving at least a portion of the memory operational circuitry to a non-memory IC die within the chip package (e.g., memory device). Other examples additionally reduces the amount of row activation energy required for DRAM devices by row segmentation logic. The row segmentation logic may also be located remotely from the memory IC die, thus making more space available within the memory IC die for memory, in-memory processing circuitry, or other types of circuitry.
The activation energy required for DRAM devices is determined by the size of their rows, which also limits the number of row activation commands that can be processed simultaneously. Previous methods of reducing row size have introduced additional circuitry and wiring, resulting in reduced capacity and increased area overhead on the DRAM die.
To address this issue, DRAM row segmentation logic is implemented in a newly introduced logic die (or adding circuitry to the existing buffer die) on a 3D stacked memory device. In other words, the DRAM row segmentation logic is present in another integrated circuit (IC) that is stacked with or within the same chip package as the memory IC dies comprising the stacked memory device. The output of the row segmentation logic is routed to the corresponding wordline drivers in the DRAM die through additional vertical wiring in the TSV strip. The cost of additional TSVs is offset by hybrid bonding, and the cost of the logic die is amortized by new functionality such as processing-in-memory (PIM) such as arithmetic-logical-units (ALUs), in the base die. This results in lower energy consumption and higher irregular bandwidth on the 3D stacked memory device.
Locating circuitry responsible for selecting WORD lines for DRAM row segmentation in a newly introduced logic die (non-memory die or the pre-existing buffer die) in the base layer of 3D stacked memory enables more space with in the memory IC die for memory cells without loss of performance. The output of the row segmentation logic in the base die is routed vertically via additional TSVs and subsequently delivered to the target DRAM bank. The TSVs are typically per-bank and physically local the bank which is achieved via high-density TSVs from hybrid bonding.
The grain architecture approach allows for smaller rows to be activated by dividing a DRAM bank into two grains. Each grain contains half the number of mats and data pins (i.e., contact pads), and also divides the LSA and GSA stripes in half, creating two independent datapath within the banks, as shown in
In some high bandwidth memory applications, master wordline (MWL) are connected to four wordline segments (LWLs) each via the local word line drivers (LWDs). 17 such LWDs drive the LWLs in a subarray. Each LWD drives LWL arms to its left and right, enabling 256 access transistors connected to the arm in each memory mat. The mats at the left and right end of subarrays have arms extending only to the right and left, respectively. To select only one of four LWLs connected to the wordline, a LWLSel signal is routed below a bank. This signal acts as an enable signal into the LWD's transistors.
To accommodate independent row activations across two grains within a bank, an additional LWD with only one LWL arm is added at the boundaries of the grain, as indicated in
To achieve smaller rows in the 3D bank design, row segmentation logic is implemented in a non-memory IC die, such as the base IC die. In a grain design having folded banks, logical-AND of GrSel and LWLSel are performed in the logic die to obtain an LWD enable signal (LWDEn). The LWDEn is then routed vertically via the TSVs to the target DRAM (or other memory) die, as illustrated in
The exemplary fine-grained row configuration reduces DRAM row size without increasing the DRAM die area. Compared to previous conventional designs, the area overhead is lower to achieve the same reduction in activation energy and improvement in irregular bandwidth. Furthermore, the free space in the base layer can be re-purposed for adding additional functionality such as implementing PIM ALUs, providing for a more flexible design.
Implementing the DRAM row segmentation logic in the non-memory or base IC die by performing a logical AND of a grain (or sub bank) selection wire and the LWLSel wire in the base die and then routing the output of the AND operation vertically eliminates the need for adding circuitry to the DRAM die and helps save area.
While hybrid bonding incurs negligible additional costs for TSVs, the disclosed technique uses extra signaling tracks in the DRAM die, as illustrated in
Partitioning the banks vertically into grains either decreases the atom size of each access or increases the tBURST value for the baseline DRAM atom size. Furthermore, the timing parameters would include a new grain-to-grain delay for row and column accesses. Other timing parameters such as tCCD_L would also see a marked decrease from the 3D bank design. The new timing parameters would need to be documented in JEDEC specifications. The updated timing parameter values would be made available for use by the memory controller.
Turning now to
The bottom surface 424 is coupled to a top surface 426 of a substrate 402 via solder interconnects 436. The substrate 402 may be package substrate or an interposer used in combination with a package substrate. The first chip complex 404 is also coupled to the substrate 402 via solder interconnects 436. The solder interconnects 436 may be solder microbumps or other suitable electrical connection suitable for transferring ground, signal and power transmissions between the routing circuitry of the substrate 402 and the functional circuitry of the IC dies within the chip complex 404.
A second chip complex 406 is also coupled to the top surface 426 of the substrate 402 via solder interconnects 436. The second chip complex 406 generally includes at least one compute die 442. The compute die 442 includes functional circuitry 444. The functional circuitry 444 may include CPU cores and/or GPU cores. The functional circuitry 444 of the compute dies 442 may also include System Management Unit (SMU) circuitry. The SMU circuitry configured to monitor thermal and power conditions and adjust power and cooling to keep the compute dies 442 functioning as within specifications. The functional circuitry 444 of the compute die 442 may also include DFX Controller IP circuitry. The DFX circuitry provides management of hardware or software trigger events. For example, the DFX circuitry may pull partial bitstreams from memory and delivers them to an ICAP. The DFX circuitry also assists with logical decoupling and startup events, customizable per Reconfigurable Partition. GPU cores when contained in the functional circuitry 444 of the compute die 442 generally includes math engine circuitry. The math engine circuitry is generally designed for task specific computing, such as used data center computing, high performance computing and AI/ML computing. Along with the accelerated compute cores, functional circuitry of the compute die 442 may also include SMU circuitry and DFX circuitry.
The functional circuitry 444 of the compute dies 442 and the functional circuitry of the IC dies comprising the first chip complex 404 are connected via routing 440 formed in, on, and/or through the substrate 402. The bottom surface 428 of the substrate 402 is mounted on a top surface 434 of a printer circuit board (PCB) 430. The routing 440 of the substrate 402 is coupled by solder balls 432 to circuitry 446 formed in the PCB 430.
The chip package 400 mounted to the PCB 430 forms an electronic device 450. The electronic device 450 may be a tablet, computer, server, data center, call center, automobile on-board electronics system, copier, digital camera, smart phone, control system, automated teller machine, call center, computing system, gaming system, artificial intelligence system, or a machine learning system, among others.
Referring back to the memory stack 408 of the first chip complex 404, each memory IC die 410 of the memory stack 408 includes functional circuitry 412. The functional circuitry 412 of the each memory IC die 410 may be configured as volatile memory or non-volatile memory. For example, the functional circuitry 412 when configured as such as volatile memory may be static random-access memory (SRAM), dynamic random-access memory (DRAM) or other suitable volatile memory type. Alternatively, the functional circuitry 412 of the memory circuitry of the memory IC die 410, when configured as non-volatile memory, may be ferroelectric random-access memory (FeRAM) and magnetoresistive random-access memory (MRAM) or other suitable non-volatile memory type.
Adjacent surfaces 438 of the memory IC dies 410 are mechanically and electrically coupled via hybrid bonding. Hybrid bonding uses layers of dielectric and patterned metal, such as copper, formed on the adjacent surfaces 438 of the memory IC dies 410. The patterned metal forms routings that include patterned lines and via. The routing terminate at bond pads. The patterned lines and via of the routing are electrically isolated from one another by a plurality of dielectric layers. The dielectric layers are formed from a material suitable for hybrid bonding, such as polybenzoxazole (PBO), polyimide (PI), benzocyclobutene (BCB), a combination thereof, or the like.
The hybrid bond is made by contacting the adjacent surfaces 438 of the memory IC dies 410. The exposed dielectric material on one of the memory IC dies 410 fusion bonds to the exposed dielectric material of the adjacent memory IC die 410 to bonded structures (e.g., adjacent memory IC dies 410) together. Subsequently, the metal-to-metal bonds may be formed using pressure and heat to form eutectic metal bonds between the exposed bond pads now in contact with each other. The interfusion of the metal materials of the bond pads to create the electric interconnect between the functional circuitry 412 of the memory IC dies 410 being bonded together.
The memory stack 408 generally includes a stack of one or more memory IC dies 410. Although four memory IC dies 410 are shown in the memory stack 408 illustrated in
Adjacent sides 458 of the non-memory IC die 414 and the adjacent memory IC die 410 are mechanically and electrically coupled via hybrid bonding. Hybrid bonding, as discussed above, connects the functional circuitry 412 of the memory IC dies 410 to the operational logic 416 of the non-memory IC die 414. The operational logic 416 of the non-memory IC die 414 generally controls the read and write functions, row activation, precharge, and bank refreshes of the memory arrays and banks comprising the functional circuitry 412 of the memory IC dies 410. The operational logic 416 is later described in greater detail with respect to
Continuing to refer to
The operational circuitry 416 of the non-memory IC die 414 is coupled to the functional circuitry 412 of the memory IC die 410 by vertical wiring 460. The vertical wiring 460 includes vias formed within the memory and non-memory IC dies 414, 414, and the connections across the routing comprising the hybrid bond connection between adjacent dies. The vertical wiring 460 may also couple the functional circuitry of the base IC die 418 to the functional and/or operational circuitry 412, 414 of the memory and non-memory IC dies 414, 414.
The chip package 500 is generally the same as the chip package 400 described above, except that chip package 500 has a chip complex 504 that replaces the chip complex 404 of chip package 400. Similar to the chip complex 404, the chip complex 504 includes a memory stack 408, a non-memory IC die 414, and a base IC die 418. The base IC die 418 of the chip complex 504 is coupled to the substrate 402 via solder interconnects 436. The memory stack 408 of the chip complex 504 is different than the memory stack 408 of the chip complex 404 in that the non-memory IC die 414 is located within the memory stack 408. Stated differently, the non-memory IC die 414 is hybrid bonded on its top and bottom surfaces to adjacent memory IC dies 410, thus locating the non-memory IC die 414 within the memory stack 408.
As with the chip package 400, the operational circuitry 416 of the non-memory IC die 414 is coupled to the functional circuitry 412 of the memory IC die 410 by vertical wiring 460 within the chip package 500.
The chip package 600 is generally the same as the chip package 400 described above, except that chip package 600 has a chip complex 604 that replaces the chip complex 404 of chip package 400. Similar to the chip complex 404, the chip complex 604 includes a memory stack 408, a non-memory IC die 414, and a base IC die 418. However, the functional circuitry 420 of the base IC die 418 is now present, along with the operational circuitry 416, in the non-memory IC die 414. Thus, the chip complex 604 does not have separate base and non-memory IC dies 418, 412, but rather a singular non-memory IC die 414 that includes both the functional circuitry 420 and the operational circuitry 416. The non-memory IC die 414 of the chip complex 604 is coupled to the substrate 402 via solder interconnects 436.
As with the chip package 400, the operational circuitry 416 of the non-memory IC die 414 is coupled to the functional circuitry 412 of the memory IC die 410 by vertical wiring 460 within the chip package 600.
The operational circuitry 416 of the non-memory IC die 414 include one or more circuits selected from the group comprising memory controller circuitry 742, row decoder circuitry 732, column decoder circuitry 734, sense amplifiers 736, and sub-wordline grain select logic circuitry 738. In
The functional circuitry 412 of memory IC die 410 includes wordline driver circuitry 740 and an array 700 of memory mats 706. The array 700 may be arranged in banks, sub-banks, and the like, or other suitable arrangement. Each of the memory mats 706 includes a plurality of memory cells. Memory cells may be configured as volatile or non-volatile memory. Memory cells may be arranged in a NAND or NOR structure. In the example depicted in
The memory mats 706 are arranged in an X-Y matrix. In
Each memory mat 702 includes at least one sub-wordline driver 802. Each sub-wordline driver 802 has an input connected to the wordline 708 and an input connected by wordline segment enable routing 812 to the sub-wordline grain select logic circuitry 738. The wordline 708 is shown as a dashed line is the wordline 708 resides on a different metal layer relative to the sub-wordline drivers 802.
An enable signal provided by the sub-wordline grain select logic circuitry 738 via the wordline segment enable routing 812 to selected ones of the sub-wordline driver 802 causes now the selected sub-wordline driver 802 to drive a voltage from the wordline 708 into the mats 706 of one or more selected grains 810 that comprise a single row 704. For example, when the sub-wordline grain select logic circuitry 738 sends an enables signal via the wordline segment enable routing 8121 to the sub-wordline drivers 8021, the sub-wordline drivers 8021 drive a voltage from the wordline 708 into the mats 706 of selected the grain 8101. Similarly, when the sub-wordline grain select logic circuitry 738 sends an enables signal via the wordline segment enable routing 8122 to the sub-wordline drivers 8022, the sub-wordline drivers 8022 drive a voltage from the wordline 708 into the mats 706 of the selected grain 8102. Similarly, when the sub-wordline grain select logic circuitry 738 sends an enables signal via the wordline segment enable routing 8123 to the sub-wordline drivers 8023, the sub-wordline drivers 8023 drive a voltage from the wordline 708 into the mats 706 of the selected grain 8103. And again, when the sub-wordline grain select logic circuitry 738 sends an enables signal via the wordline segment enable routing 8124 to the sub-wordline drivers 8024, the sub-wordline drivers 8024 drive a voltage from the wordline 708 into the mats 706 of the selected grain 8104. In this manner, only selected grains 810 are selected in a given row 704.
The wordline driver circuitry 740 is coupled to the sub-wordline grain select logic circuitry 738 and/or the row decoder circuitry 732 by individual vertical routings 460. The individual vertical routings 460 extend across the hybrid bond coupling the adjacent sides 458 of the non-memory IC die 414 and the adjacent memory IC die 410. Similarly, a portion of the wordline segment enable routing 812 includes individual vertical routings 460 that extend across the hybrid bond coupling the adjacent sides 458 of the non-memory IC die 414 and the adjacent memory IC die 410.
In one alternative example, the sub-wordline grain select logic circuitry 738 may be disposed in the memory IC die 410, and the sub-wordline grain select logic circuitry 738 may be coupled the row decoder circuitry 732 and/or memory controller 742 by vertical routing 460 across the hybrid bond coupling the adjacent sides 458 of the non-memory IC die 414 and the adjacent memory IC die 410.
The memory controller 742 provides a grain select signal that is used by the sub-wordline grain select logic circuitry 738 to couple the selected one of the grains 8101, 8102, 8103, 8104 of the row 7043 of the array 700 of memory mats 706 to their associated 8021, 8022, 8023, 8024. Thus, the sub-wordline drivers 802 do not need to drive the voltage out across all the mats 706 the selected row 7043, but rather only to the selected one of the grains 8101, 8102, 8103, 8104 at a single instance. This reduces the circuit size, power and time needed to drive the selected wordline segment (i.e., coupled to the selected driver 8021-4) as compared to the power and time needed to simultaneously drive voltage across a wordline connected to all the mats 706 of a common row 704.
Referring back to
Each bitline 710 is coupled to a respective one of the sense amplifiers 736 residing in the operational circuitry 416 of the non-memory IC die 414. The sense amplifiers 736 are coupled by the column decoder circuitry 734 to an outlet pad 718 residing on the surface of the non-memory IC die 414. The outlet pad 718 is connected through the base IC die 418 to the routing 440 of the substrate 402 in the chip package 400 (illustrated in
The column decoder circuitry 734 is coupled to the memory controller 742. The memory controller 742 provides memory cell address information to the column decoder circuitry 734. The column decoder circuitry 734 decodes the column (702) address and couples the sense amplifier 736 of the selected column 702, which allows the output of the selected sense amplifier 736 to be coupled to the output contact pad 718.
Similarly, the row decoder circuitry 732 is coupled to the memory controller 742. The memory controller 742 provides memory cell address information to the row decoder circuitry 732. The row decoder circuitry 732 decodes the row (704) address and couples the wordline driver circuitry of sub-wordline grain select logic circuitry 738 of the selected wordline 708 of the selected row 704.
The non-memory IC die 414 additionally includes contact pads 716 for providing the operational circuitry 416 with power, ground and signal connections.
At operation 906, bits from the selected memory cells are read. In one example, operation 906 includes a sub-operation 908 in which the selected memory cells are read by reading the bits from the selected memory cells with sense amplifiers located in the non-memory IC die. In one example, operation 906 includes a sub-operation 910 in which each of the selected memory cells are selected by column select logic circuitry located in the non-memory IC die.
The method 900 may also include writing to selected memory cells. Writing to selected memory cells can include selecting a memory address using row and column select circuitry located remotely from the memory IC die, and transferring a bit from a sense amplifier located remove from the memory IC die to the memory cell residing at the memory address.
Thus, the disclosed technology that uses fine-grained rows reduces DRAM row size without increasing the DRAM die area. Compared to previous designs, the area overhead is lower to achieve the same reduction in activation energy. Furthermore, the free space in the memory die can be re-purposed for adding additional functionality such as implementing processing-in-memory (PIM) such as arithmetic-logical-units (ALUs), providing for a more flexible design. Implementing the DRAM row segmentation logic in the non-memory IC die by performing a logical AND of a grain (or sub bank) selection wire and the LWLSel wire in the non-memory IC die, and then routing the output of the AND operation vertically eliminates the need for adding circuitry to the DRAM die which helps save area. Additionally, utilizing grain select instead of energizing the entire wordline results in lower energy consumption.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims priority to the U.S. Provisional Patent Application Ser. No. 63/468,767 filed May 24, 2023, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63468767 | May 2023 | US |