Various embodiments described herein relate to apparatus, systems, and methods associated with semiconductor memories, including stacked-die memory architectures.
Microprocessor technology has evolved at a faster rate than that of semiconductor memory technology. As a result, a mis-match in performance often exists between the modern host processor and the semiconductor memory subsystem to which the processor is mated to receive instructions and data. For example, it is estimated that some high-end servers idle three out of four clocks waiting for responses to memory requests.
In addition, the evolution of software application and operating system technology has increased demand for higher-density memory subsystems as the number of processor cores and threads continues to increase. However, current-technology memory subsystems often represent a compromise between performance and density. Higher bandwidths may limit the number of memory cards or modules that may be connected in a system without exceeding Joint Electron Device Engineering Council (JEDEC) electrical specifications.
Extensions to JEDEC interface standards such as dynamic data rate (DDR) synchronous dynamic random access memory (SDRAM) have been proposed but may be generally found lacking as to future anticipated memory bandwidths and densities. Weaknesses include lack of memory power optimization and the uniqueness of the interface between the host processor and the memory subsystem. The latter weakness may result in a need to redesign the interface as processor and/or memory technologies change.
Multi-die memory array embodiments herein aggregate control logic that is normally located on each individual memory array die in previous designs. Subsections of a stacked group of dies, referred to herein as a “memory vault,” share common control logic. The memory vault architecture strategically partitions memory control logic to increase energy efficiency while providing a finer granularity of powered-on memory banks. Embodiments herein also enable a standardized host processor to memory system interface. The standardized interface may reduce re-design cycle times as memory technology evolves.
Each of the stacked dies is divided into multiple “tiles” (e.g., the tiles 205A, 205B, and 205C associated with the stacked die 204). Each tile (e.g., the tile 205C) may include one or more memory arrays 203. In some embodiments, each memory array 203 may be configured as one or more independent memory banks in the memory system 100. The memory arrays 203 are not limited to any particular memory technology and may include dynamic random-access memory (DRAM), static random access memory (SRAM), flash memory, etc.
A stacked set of memory array tiles 208 may include a single tile from each of the stacked dies (e.g., the tiles 212B, 212C and 212D, with the base tile hidden from view in
The stacked-die 3D memory array 200 is thus partitioned into a set of memory “vaults” (e.g., the memory vault 230). Each memory vault includes a stacked set of tiles (e.g., the set of tiles 208), one tile from each of a plurality of stacked dies, together with a set of TWIs to electrically interconnect the set of tiles 208. Each tile of the vault includes one or more memory arrays (e.g., the memory array 240).
The resulting set of memory vaults 102 is shown in
The memory system 100 also includes a plurality of configurable serialized communication link interfaces (SCLIs) 112. The SCLIs 112 are divided into an outbound group of SCLIs 113 (e.g., the outbound SCLI 114) and an inbound group of SCLIs 115. Each of the plurality of SCLIs 112 is capable of concurrent operation with the other SCLIs 112. Together the SCLIs 112 communicatively couple the plurality of MVCs 104 to one or more host processor(s) 114. The memory system 100 presents a highly abstracted, multi-link, high-throughput interface to the host processor(s) 114.
The memory system 100 may also include a switch 116. In some embodiments, the switch 116 may comprise a matrix or cross-connect switch. The switch 116 is communicatively coupled to the plurality of SCLIs 112 and to the plurality of MVCs 104. The switch 116 is capable of cross-connecting each SCLI to a selected MVC. The host processor(s) 114 may thus access the plurality of memory vaults 102 across the plurality of SCLIs 112 in a substantially simultaneous fashion. This architecture can provide the processor-to-memory bandwidth needed by modern processor technologies, including multi-core technologies.
The memory system 100 may also include a memory fabric control register 117 coupled to the switch 116. The memory fabric control register 117 accepts memory fabric configuration parameters from a configuration source and configures one or more components of the memory system 100 to operate according to a selectable mode. For example, the switch 116 and each of the plurality of memory vaults 102 and the plurality of MVCs 104 may normally be configured to operate independently of each other in response to separate memory requests. Such a configuration may enhance memory system bandwidth as a result of the parallelism between the SCLIs 112 and the memory vaults 102.
Alternatively, the memory system 100 may be reconfigured via the memory fabric control register 117 to cause a subset of two or more of the plurality of memory vaults 102 and a corresponding subset of MVCs to operate synchronously in response to a single request. The latter configuration may be used to access a data word that is wider than the width of a data word associated with a single vault. This technique may decrease latency, as further described below. Other configurations may be enabled by loading a selected bit pattern into the memory fabric control register 117.
The outbound SCLI 114 may include a plurality of outbound differential pair serial paths (DPSPs) 128. The DPSPs 128 are communicatively coupled to the host processor(s) 114 and may collectively transport the outbound packet 122. That is, each DPSP of the plurality of outbound DPSPs 128 may transport a first data rate outbound sub-packet portion of the outbound packet 122 at a first data rate.
The outbound SCLI 114 may also include a deserializer 130 coupled to the plurality of outbound DPSPs 128. The deserializer 130 converts each first data rate outbound sub-packet portion of the outbound packet 122 to a plurality of second data rate outbound sub-packets. The plurality of second data rate outbound sub-packets is sent across a first plurality of outbound single-ended data paths (SEDPs) 134 at a second data rate. The second data rate is slower than the first data rate.
The outbound SCLI 114 may also include a demultiplexer 138 communicatively coupled to the deserializer 130. The demultiplexer 138 converts each of the plurality of second data rate outbound sub-packets to a plurality of third data rate outbound sub-packets. The plurality of third data rate outbound sub-packets is sent across a second plurality of outbound SEDPs 142 to the packet decoder 120 at a third data rate. The third data rate is slower than the second data rate.
The packet decoder 120 receives the outbound packet 122 and extracts the command field 310 (e.g., of the example packet 300), the address field 320 (e.g., of the example packet 300), and/or the data field (e.g., of the example packet 400). In some embodiments, the packet decoder 120 decodes the address field 320 to determine a corresponding set of memory vault select signals. The packet decoder 120 presents the set of memory vault select signals to the switch 116 on an interface 146. The vault select signals cause the input data paths 148 to be switched to the MVC 106 corresponding to the outbound packet 122.
Turning now to a discussion of the inbound data paths, the memory system 100 may include a plurality of packet encoders 154 (e.g., the packet encoder 158) coupled to the switch 116. The packet encoder 158 may receive an inbound memory command, an inbound memory address, and/or inbound memory data from one of the plurality of MVCs 104 via the switch 116. The packet encoder 158 encodes the inbound memory command, address, and/or data into an inbound packet 160 for transmission across an inbound SCLI 164 to the host processor(s) 114.
In some embodiments, the packet encoder 158 may segment the inbound packet 158 into a plurality of third data rate inbound sub-packets. The packet encoder 158 may send the plurality of third data rate inbound sub-packets across a first plurality of inbound single-ended data paths (SEDPs) 166 at a third data rate. The memory system 100 may also include a multiplexer 168 communicatively coupled to the packet encoder 158. The multiplexer 168 may multiplex each of a plurality of subsets of the third data rate inbound sub-packets into a second data rate inbound sub-packet. The multiplexer 168 sends the second data rate inbound sub-packets across a second plurality of inbound SEDPs 170 at a second data rate that is faster than the third data rate.
The memory system 100 may further include a serializer 172 communicatively coupled to the multiplexer 168. The serializer 172 aggregates each of a plurality of subsets of the second data rate inbound sub-packets into a first data rate inbound sub-packet. The first data rate inbound sub-packets are sent to the host processor(s) 114 across a plurality of inbound differential pair serial paths (DPSPs) 174 at a first data rate that is faster than the second data rate. Command, address, and data information is thus transferred back and forth between the host processor(s) 114 and the MVCs 104 across the SCLIs 112 via the switch 116. The MVCs 104, the SCLIs 112, and the switch 116 are fabricated on the logic die 202.
The PVCL 510 may be configured to adapt the MVC 106 to a memory vault 110 of a selected configuration or a selected technology. Thus, for example, the memory system 100 may initially be configured using currently-available DDR2 DRAMs. The memory system 100 may subsequently be adapted to accommodate DDR3-based memory vault technology by reconfiguring the PVCL 510 to include DDR3 bank control and timing logic.
The MVC 106 may also include a memory sequencer 514 communicatively coupled to the PVCL 510. The memory sequencer 514 performs a memory technology dependent set of operations based upon the technology used to implement the associated memory vault 110. The memory sequencer 514 may, for example, perform command decode operations, memory address multiplexing operations, memory address demultiplexing operations, memory refresh operations, memory vault training operations, and/or memory vault prefetch operations associated with the corresponding memory vault 110. In some embodiments, the memory sequencer 514 may comprise a DRAM sequencer. In some embodiments, memory refresh operations may originate in a refresh controller 515.
The memory sequencer 514 may be configured to adapt the memory system 100 to a memory vault 110 of a selected configuration or technology. For example, the memory sequencer 514 may be configured to operate synchronously with other memory sequencers associated with the memory system 100. Such a configuration may be used to deliver a wide data word from multiple memory vaults to a cache line (not shown) associated with the host processor(s) 114 in response to a single cache line request.
The MVC 106 may also include a write buffer 516. The write buffer 516 may be coupled to the PVCL 510 to buffer data arriving at the MVC 106 from the host processor(s) 114. The MVC 106 may further include a read buffer 517. The read buffer 517 may be coupled to the PVCL 510 to buffer data arriving at the MVC 106 from the corresponding memory vault 110.
The MVC 106 may also include an out-of-order request queue 518. The out-of-order request queue 518 establishes an ordered sequence of read and/or write operations to the plurality of memory banks included in the memory vault 110. The ordered sequence is chosen to avoid sequential operations to any single memory bank in order to reduce bank conflicts and to decrease read-to-write turnaround time.
The MVC 106 may also include a memory vault repair logic (MVRL) component 524. The MVRL 524 may perform defective memory array address remapping operations using array repair logic 526. The array repair logic 526 may remap requests to redundant cells or arrays of cells located on memory vault dies (e.g., on the stacked die 204 of
Any of the components previously described may be implemented in a number of ways, including embodiments in hardware, software, firmware, or combinations thereof. It is noted that “software” in this context refers to statutory software structures and not to mere software listings.
Thus, the memory system 100; the memory arrays 200, 203, 240, 527; the die 202, 204; the tiles 205A, 205B, 205C, 208, 212B, 212C, 212D; the “Z” dimension 220; the paths 224, 148; the memory vaults 230, 102, 110; the MVCs 104, 106; the SCLIs 112, 113, 114, 115, 164; the processor(s) 114; the switch 116; the register 117; the packets 300, 400, 122, 160; the packet decoders 118, 120; the fields 310, 320, 410; the DPSPs 128, 174; the deserializer 130; the SEDPs 134, 142, 166, 170; the demultiplexer 138; the interface 146; the packet encoders 154, 158; the multiplexer 168; the serializer 172; the PVCL 510; the memory sequencer 514; the refresh controller 515; the buffers 516, 517; the out-of-order request queue 518; the MVRL 524; the array repair logic 526; and the TWI repair logic 528 may all be characterized as “modules” herein.
The modules may include hardware circuitry, optical components, single or multi-processor circuits, memory circuits, software program modules and objects, firmware, and combinations thereof, as desired by the architect of the memory system 100 and as appropriate for particular implementations of various embodiments.
The apparatus and systems of various embodiments may be useful in applications other than a high-density, multi-link, high-throughput semiconductor memory subsystem. Thus, various embodiments of the invention are not to be so limited. The illustrations of the memory system 100 are intended to provide a general understanding of the structure of various embodiments. They are not intended to serve as a complete description of all the elements and features of apparatus and systems that might make use of the structures described herein.
The novel apparatus and systems of various embodiments may comprise or be incorporated into electronic circuitry used in computers, communication and signal processing circuitry, single-processor or multi-processor modules, single or multiple embedded processors, multi-core processors, data switches, and application-specific modules including multilayer, multi-chip modules. Such apparatus and systems may further be included as sub-components within a variety of electronic systems, such as televisions, cellular telephones, personal computers (e.g., laptop computers, desktop computers, handheld computers, tablet computers, etc.), workstations, radios, video players, audio players (e.g., MP3 (Motion Picture Experts Group, Audio Layer 3) players), vehicles, medical devices (e.g., heart monitor, blood pressure monitor, etc.), set top boxes, and others. Some embodiments may include a number of methods.
The method 600 may commence at block 606 with segmenting an outbound packet into a set of first data rate sub-packet portions at the originating device. In some embodiments, the originating device may include one or more processors. In some embodiments, the originating device may include a category of devices capable of direct memory access (DMA) such as a graphics controller. The packet may carry one or more outbound memory subsystem commands, addresses, or data fields to be written to one or more memory subsystem locations.
The method 600 may continue at block 610 with sending each of the first data rate sub-packets from the originating device (e.g., from a selected processor) to a deserializer (e.g., the deserializer 130 of
The method 600 may further include sending each of the second data rate sub-packets from the deserializer to a demultiplexer (e.g., the demultiplexer 138 of
The method 600 may continue at block 622 with receiving the third data rate sub-packets at the packet decoder from the selected SCLI. The method 600 may include assembling the set of third data rate sub-packets into the outbound packet, at block 626. The method 600 may also include extracting at least one of the outbound command, the outbound address, or the outbound data from the packet, at block 628.
The method 600 may also include presenting the outbound command, address, or data to the switch, at block 632. The method 600 may further include concurrently switching an outbound command, address, and/or data associated with each stream at the switch, at block 636. The outbound command, address, and/or data associated with each stream is switched to a destination MVC (e.g., the MVC 106 of
The method 600 may continue at block 640 with buffering the outbound command, address, and/or data at a write buffer component of the MVC (e.g., the write buffer 516 of
In some embodiments, the method 600 may optionally include determining whether the memory subsystem has been configured to operate in a synchronous parallel mode, at block 645. If so, the method 600 may include operating a synchronous subset of the memory vaults in response to a single memory request, at block 646. Such operation may be used to decrease access latency by synchronously transferring a wide data word of a width that is a multiple of a single memory vault word length. The resulting wide data word width corresponds to the number of memory vaults in the synchronous subset of vaults.
The method 600 may optionally include ordering read and/or write operations to a plurality of memory banks associated with a corresponding memory vault at an out-of-order request queue component of the memory sequencer (e.g., the out-of-order request queue 518 of
The method 600 may conclude at block 650 with performing data write operations to write the outbound data to the corresponding memory vault, data read operations to read data from the corresponding memory vault, and/or memory vault housekeeping operations. The data write operations, data read operations, and/or housekeeping operations may be performed independently from concurrent operations associated with other MVCs coupled to other memory vaults.
The method 700 may commence at block 706 with receiving a read command from a processor at an MVC (e.g., the MVC 106 of
The method 700 may also include switching the inbound data word to a packet encoder (e.g., the packet encoder 158 of
The method 700 may continue at block 726 with segmenting the inbound packet into a plurality of third data rate inbound sub-packets. The method 700 may include sending the plurality of third data rate inbound sub-packets to a multiplexer (e.g., the multiplexer 168 of
The method 700 may continue at block 746 with aggregating each of a plurality of subsets of the second data rate inbound sub-packets into a first data rate inbound sub-packet using the serializer. The method 700 may include presenting the first data rate inbound sub-packets to the destination device(s), at block 754. The method 700 may also include assembling the first data rate inbound sub-packets into the inbound packet, at block 758. The method 700 may conclude with extracting the inbound data word from the inbound packet, at block 762, and presenting the inbound data word to an operating system associated with the destination device(s), at block 768.
It is noted that the activities described herein may be executed in an order other than the order described. The various activities described with respect to the methods identified herein may also be executed in repetitive, serial, and/or parallel fashion.
A software program may be launched from a computer-readable medium in a computer-based system to execute functions defined in the software program. Various programming languages may be employed to create software programs designed to implement and perform the methods disclosed herein. The programs may be structured in an object-oriented format using an object-oriented language such as Java or C++. Alternatively, the programs may be structured in a procedure-oriented format using a procedural language, such as assembly or C. The software components may communicate using well-known mechanisms, including application program interfaces, inter-process communication techniques, and remote procedure calls, among others. The teachings of various embodiments are not limited to any particular programming language or environment.
The apparatus, systems, and methods described herein may operate to substantially concurrently transfer a plurality of streams of commands, addresses, and/or data between one or more originating and/or destination devices (e.g., one or more processors) and a set of stacked-array memory vaults. Increased memory system density, bandwidth, parallelism, and scalability may result.
By way of illustration and not of limitation, the accompanying figures show specific embodiments in which the subject matter may be practiced. The embodiments illustrated are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. This Detailed Description, therefore, is not to be taken in a limiting sense.
Such embodiments of the inventive subject matter may be referred to herein individually or collectively by the term “invention” merely for convenience and without intending to voluntarily limit this application to any single invention or inventive concept, if more than one is in fact disclosed. Thus, although specific embodiments have been illustrated and described herein, any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all adaptations or variations of various embodiments. Combinations of the above embodiments and other embodiments not specifically described herein will be apparent to those of skill in the art upon studying the above description.
The Abstract of the Disclosure is provided to comply with 37 C.F.R. §1.72(b) requiring an abstract that will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In the foregoing Detailed Description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted to require more features than are expressly recited in each claim. Rather, inventive subject matter may be found in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
This application is a continuation of U.S. application Ser. No. 12/261,942, filed Oct. 30, 2008, now issued as U.S. Pat. No. 7,978,721, which is a continuation-in-part of U.S. application Ser. No. 12/166,814 which was filed Jul. 2, 2008, U.S. application Ser. No. 12/166,871 which was filed Jul. 2, 2008, now issued as U.S. Pat. No. 8,289,760 and U.S. application Ser. No. 12/176,951 which was filed on Jul. 21, 2008, which is now issued as U.S. Pat. No. 7,855,931. All of which are incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
5347428 | Carson et al. | Sep 1994 | A |
5432729 | Carson et al. | Jul 1995 | A |
5807791 | Bertin et al. | Sep 1998 | A |
5815427 | Cloud et al. | Sep 1998 | A |
5907566 | Benson et al. | May 1999 | A |
5914953 | Krause et al. | Jun 1999 | A |
5943692 | Marberg et al. | Aug 1999 | A |
6046945 | Su et al. | Apr 2000 | A |
6047002 | Hartmann et al. | Apr 2000 | A |
6081463 | Shaffer et al. | Jun 2000 | A |
6154851 | Sher et al. | Nov 2000 | A |
6201733 | Hiraki et al. | Mar 2001 | B1 |
6438029 | Hiraki et al. | Aug 2002 | B2 |
6582992 | Poo et al. | Jun 2003 | B2 |
6661712 | Hiraki et al. | Dec 2003 | B2 |
6754117 | Jeddeloh | Jun 2004 | B2 |
6778404 | Bolken et al. | Aug 2004 | B1 |
6791832 | Budny | Sep 2004 | B2 |
6897096 | Cobbley et al. | May 2005 | B2 |
7009872 | Alva | Mar 2006 | B2 |
7124200 | Sato et al. | Oct 2006 | B2 |
7200021 | Raghuram | Apr 2007 | B2 |
7257129 | Lee et al. | Aug 2007 | B2 |
7477545 | Tu et al. | Jan 2009 | B2 |
7496719 | Peterson et al. | Feb 2009 | B2 |
7526597 | Perego et al. | Apr 2009 | B2 |
7623365 | Jeddeloh | Nov 2009 | B2 |
7701252 | Chow et al. | Apr 2010 | B1 |
7715255 | Tu et al. | May 2010 | B2 |
7855931 | Laberge et al. | Dec 2010 | B2 |
7978721 | Jeddeloh et al. | Jul 2011 | B2 |
8060774 | Smith et al. | Nov 2011 | B2 |
8063491 | Hargan | Nov 2011 | B2 |
8065550 | Kim et al. | Nov 2011 | B2 |
8120958 | Bilger et al. | Feb 2012 | B2 |
8127204 | Hargan | Feb 2012 | B2 |
8139430 | Buchmann et al. | Mar 2012 | B2 |
8233303 | Best et al. | Jul 2012 | B2 |
8281074 | Jeddeloh | Oct 2012 | B2 |
20040250046 | Gonzalez et al. | Dec 2004 | A1 |
20040257847 | Matsui et al. | Dec 2004 | A1 |
20050081012 | Gillingham et al. | Apr 2005 | A1 |
20050189639 | Tanie et al. | Sep 2005 | A1 |
20060036827 | Dell et al. | Feb 2006 | A1 |
20060126369 | Raghuram | Jun 2006 | A1 |
20070058410 | Rajan | Mar 2007 | A1 |
20070067826 | Conti | Mar 2007 | A1 |
20070070669 | Tsern | Mar 2007 | A1 |
20070075734 | Ramos et al. | Apr 2007 | A1 |
20070132085 | Shibata et al. | Jun 2007 | A1 |
20070220207 | Black et al. | Sep 2007 | A1 |
20080201548 | Przybylski et al. | Aug 2008 | A1 |
20090059641 | Jeddeloh | Mar 2009 | A1 |
20090210600 | Jeddeloh | Aug 2009 | A1 |
20090319703 | Chung | Dec 2009 | A1 |
20100005238 | Jeddeloh et al. | Jan 2010 | A1 |
20100070696 | Blankenship | Mar 2010 | A1 |
20100121994 | Kim et al. | May 2010 | A1 |
20100191999 | Jeddeloh | Jul 2010 | A1 |
20100211745 | Jeddeloh | Aug 2010 | A1 |
20110090004 | Schuetz | Apr 2011 | A1 |
20110148469 | Ito et al. | Jun 2011 | A1 |
20110246746 | Keeth et al. | Oct 2011 | A1 |
Number | Date | Country |
---|---|---|
102822966 | Dec 2012 | CN |
0606653 | Jul 1994 | EP |
201220374 | May 2012 | TW |
WO-2008076790 | Jun 2008 | WO |
WO-2009105204 | Aug 2009 | WO |
WO-2009105204 | Aug 2009 | WO |
WO-2010051461 | May 2010 | WO |
WO-2011126893 | Oct 2011 | WO |
WO-2011126893 | Oct 2011 | WO |
Entry |
---|
“Terrazon 3D Stacked Microcontroller with DRAM—FASTACK 3D Super-8051 Micro-controller”, http://www.tezzaron.com/OtherICs/Super—8051.htm, (Link sent Oct. 2, 2007) 2 pgs. |
“Terrazon 3D Stacked DRAM Bi-STAR Overview”, http://www.tezzaron.com/memory/Overview—3D—DRAM.htm, (Link sent Oct. 2, 2007), 1 pg. |
“Terrazon FaStack Memory —3 D Memory Devices”, http://www.tezzaron.com/memory/Overview—3D—DRAM.htm, (Link sent Oct. 2, 2007 Downloaded Oct. 27, 2007), 3 pgs. |
Gann, Keith D, “Neo-stacking technology”, Irvine Sensors Corporation News Release, (Mar. 2007), 4 pgs. |
“European Application Serial No. 09824176.3, Extended European Search Report mailed Feb. 26, 2013”, 4 pgs. |
“International Application Serial No. PCT/US2011/030544, International Preliminary Report on Patentability mailed Oct. 11, 2012”, 7 pgs. |
“International Application Serial No. PCT/US2011/030544, Search Report mailed Dec. 7, 2011”, 3 pgs. |
“International Application Serial No. PCT/US2011/030544, Written Opinion mailed Dec. 7, 2011”, 5 pgs. |
Number | Date | Country | |
---|---|---|---|
20110264858 A1 | Oct 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12261942 | Oct 2008 | US |
Child | 13179156 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12166814 | Jul 2008 | US |
Child | 12261942 | US | |
Parent | 12166871 | Jul 2008 | US |
Child | 12166814 | US | |
Parent | 12176951 | Jul 2008 | US |
Child | 12166871 | US |