The present invention generally relates to the field of random access memory (RAM). More specifically, the present invention is related to a DDR4-SSD dual-port DIMM with a DDR4 bus adaptation circuit configured to expand scale-out capacity and performance.
DDR4 and NVM technologies have been developed as single port memory modules directly attached to CPUs. DDR4 provides the multi-channel architecture of point-to-point connections for CPUs hosting more high-speed DDR4-DIMMs (dual-port dual in-line memory module) rather than previous multi-drop DDR2/3 bus technologies, resulting in more DIMMs having to sacrifice bus-speed. However, the technology has yet to be widely adopted. So far, the vast majority of DDR4 motherboards are still using old multi-drop bus topology.
High density, all-flash-arrays (AFA) storage systems or large-scale NVM systems must use dual-port primary storage modules similar as the SAS-HDD devices for higher reliability and availability (e.g., avoiding single-point failures in any data-paths). The higher the SSD/NVM density is, the more critical the primary SSD/NVM device will be. For example, a high-density DDR4-SSD DIMM may have 15 TB to 20 TB storage capacity. Also, conventional NVDIMMs are focused on maximizing DRAM capacity with the same amount of Flash NAND for power-down protection as persistent-DRAM. Furthermore, conventional UltraDIMM SSD units use a DDR3-SATA controller plus 2 SATA-SSD controllers and 8 NAND flash chips to build SSDs in DIMM form factor with the throughput less than 10% of DDR3 bus bandwidth.
Accordingly, embodiments of the present invention provide a novel approach to put high density AFA primary storage in DDR4 bus slots. Embodiments of the present invention provide DDR4-SSD DIMM form factor designs for high-density storage, without bus speed and utilization penalties, in high ONFI memory chip loads that can be directly inserted into a DDR4 motherboard. Moreover, embodiments of the present invention provide a novel 1:2 DDR4-to-ONFI NV-DDR2 signaling levels, terminations/relaying, and data-rate adaption architecture design.
As such, embodiments can gang up N of 1:2 DDR4-ONFI adaptors to form N times ONFI channel expressions to scale out flash NAND storage. Also, embodiments introduce DDR4 1:2 data buffer load-reducing technologies that can make N=10 or 16 higher fan-outs in the DDR4 domain. In this fashion, NV-DDR2 channel load expansions can occur with lower speed loss or higher bus utilizations. Furthermore, embodiments also include a plurality of DDR4-DRAM chips (e.g., 32 bits) for data buffering, FTL tables or KV tables, GC/WL tables, control functions, and 1 DDR3-STTRAM chip for write caching and power-down protections.
Embodiments of the present invention include DDR4-DIMM interface circuits and DDR4-SDRAM to buffer high speed DDR4 data flows. Embodiments include DDR4-ONFI controllers configured for ONFI-over-DDR4 adaptions, FTL controls, FTL-metadata managements, ECC controls, GC and WL controls, I/O command queuing. Embodiments of the present invention enable 1-to-2 DDR4-to-ONFI NV-DDR2 bus adaptations/terminations/relays as well as data buffering and/or splitting. Furthermore, embodiments of the present invention provide 1-to-N DDR4-ONFI bus expansion methods.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention:
Reference will now be made in detail to several embodiments. While the subject matter will be described in conjunction with the alternative embodiments, it will be understood that they are not intended to limit the claimed subject matter to these embodiments. On the contrary, the claimed subject matter is intended to cover alternative, modifications, and equivalents, which may be included within the spirit and scope of the claimed subject matter as defined by the appended claims.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the claimed subject matter. However, it will be recognized by one skilled in the art that embodiments may be practiced without these specific details or with equivalents thereof. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects and features of the subject matter.
Portions of the detailed description that follows are presented and discussed in terms of a method. Embodiments are well suited to performing various other steps or variations of the steps recited in the flowchart of the figures herein, and in a sequence other than that depicted and described herein.
Some portions of the detailed description are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits that can be performed on computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer-executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computing device. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout, discussions utilizing terms such as “accessing,” “writing,” “including,” “storing,” “transmitting,” “reading,” “associating,” “identifying” or the like, refer to the action and processes of an electronic computing device that manipulates and transforms data represented as physical (electronic) quantities within the system's registers and memories into other data similarly represented as physical quantities within the system memories or registers or other such information storage, transmission or display devices.
DDR4-SSD Controller 110 can receive control signals and/or data streams via several different channels capable of providing connectivity by CPUs to a network comprising a pool of network resources. The pool of resources may include, but is not limited to, virtual machines, CPU resources, non-volatile memory pools (e.g., flash memory), HDD storage pools, etc. As depicted in
DDR4-DBs 103a and 103b can be data buffers which serve as termination/multiplex for DDR4 bus to be shared by host CPUs and DDR4-SSD controller. In this fashion, DDR4-DBs 103a and 103b includes the functionality to manage the loads of external devices such that DDR4-DBs 103a and 103b can drive signals received through channels 101d and 101e to other portions of the DDR4-SSD controller 110 (e.g., DDR4 DRAM 104a, 104b, NAND units 106a through 106h, etc.).
As depicted in
For instance, if multiple host devices seek to perform procedures involving DDR4 DRAM (e.g., read and/or write procedures), SSD Controller 110 can determine whether a particular DDR4 DRAM (e.g., DDR4 DRAM 104a) is experiencing higher latency than another DDR4 DRAM (e.g., DDR4 DRAM 104b). Thus, when responding to a host device's request to perform the procedure, SSD Controller 110 can communicate the instructions sent by the requesting host device to the DDR4 DRAM that is available to perform the requested procedure where it can then be stored for processing. In this manner, DDR4 DRAM 104a and 104b act as separate elastic buffers that are capable of performing DDR4-to-DDR2 rate reduction procedures with buffer data received. This allows for a transmission rate (e.g., 2667 MBs host rate) for host and eAsic bus masters to perform “ping pang” access.
Also, as depicted in
As such, SSD Controller 110 can transform control bus signals and/or data bus signals in accordance with current ONFI communications standards. Moreover, SSD Controller 110 can communicate with a particular ONFI adapter using a respective DDR4 channel programmed for the ONFI adapter. In this fashion, DIMM device 100 enables communications between different DIMM components operating on different DDR standards. For example, NAND chips operating under a particular DDR (e.g., DDR1, DDR2, etc.) technology can send and/or receive data from DRAMs using DDR4 technology.
For example, a CPU can write commands thru bus 102-2 which includes instructions to write data to DDR4-DRAM. SSD Controller 110 stores the instruction within DDR4-DRAM 104a or 104b upon DRAM traffic conditions. Upon NVME write commands, SSD Controller 110 can allocate the input buffers in DRAM 104a and associated flash page among NAND flash chip arrays 122a/b through 124a/b. Thereafter, an ONFI-over-DDR4 write sequences can be carried out thru bus 102-2 with Cmd/Addr and thru port1 101d then DDR4-DB 103a with the data bursts written into pre-allocated buffers in DDR4-DRAM 104a synchronously. Moreover, NVME commands will be inserted to each 8 or 16 DIMMs 100 thru bus 102 concurrently.
Memory Controller 120 will generate sequences of Cmd/Address signals of BL8 writes or reads to perform long burst access to DDR4 DRAM 104a and 104b (16 KB write page or 4 KB read page) under CPUs controls. Memory controller 120 includes the functionality to retrieve data from a particular NAND chip as well as a DDR4-DRAM based on signals received by SSD Controller 110 from a host device. In one embodiment, memory controller 120 includes the functionality to perform ONFI-over-DDR4 adaptions, FTL controls, FTL-metadata managements, EEC controls, GC and WL controls, I/O command queuing, etc. Host device signals can include instructions capable of being processed by memory controller 120 to place data in DDR4-DRAM for further processing. As such, memory controller 120 can perform bus adaption procedures which include interpreting random access instructions (e.g., instructions concerning DDR4-DRAM procedures) as well as page (or block) access instructions (e.g., instruction concerning NAND processing procedures). As illustrated in
Memory controller 120 can also include decoders which assist memory controller 120 in decoding instructions sent from a host device. For instance, decoders can be used by memory controller 120 to determine NAND addresses and/or the location of data stored in DDR4-DRAM 104a and 104b when performing an operation specified by a host device. DDR4-PHY 116a and 116b depict application interfaces which enable communications between memory controller 120 and DDR4-DRAM 104a and 104b and/or CMD queues 117. Memory controller 120 also includes the functionality to periodically poll processes occurring within a set of NAND units (e.g., NAND chips 122a-122d and 124a-124d) in order to assess when data can be made ready for communication to a DDR4-DRAM for further processing.
Furthermore, memory controller 120 includes the functionality communicate output back to a host device (e.g., via CMD-queues 117) using the address of the host device. ONFI I/O timing controller 119 includes the functionality to perform load balancing. For instance, if a host device sends instructions to write data to DDR4-DRAM, ONFI I/O timing controller 119 can assess latency with respect to NAND processing and report status data to memory controller 120 (e.g., using a table). Using this information, memory controller 120 can optimize and/or prioritize the performance of read and/or write procedures specified by host devices.
Moreover, as described herein, embodiments of the present invention utilize “active-passive” dual-access modes of DDR4-SSD DIMM. In one embodiment, only 1 port is used in the active-passive dual-access mode. Also, in one embodiment, 1 byte can be used in the dual-access mode. As depicted in
Channel control 129 includes the functionality to optimize and/or prioritize the performance of communications between data passed between NAND chips and memory controller 120. For example, channel control 129 can prioritize the transmission of data between NAND chips and memory controller 120 based on the size of the data to be carried and/or whether the operation concerns a read and/or write command specified by a host device. Channel control 129 also includes the functionality to synchronize the transmissions of read and/or write command communications with polling procedures which can optimize the speed in which data can be processed by DIMM device 100. Moreover, unified memory interface CPUs can also accept interrupts sent from the 8 bit Cmd/Addr buses 102-2 or 102-3.
DDR4-ONFI adapter 112 can receive command signals in the form of BCOM[3:0] and/or ONFI I/O control signals. In one embodiment, these command signals may be used to control MLC+chips with in accordance with the latest JESD79-4 DDR4 data-buffer specifications. BCOM[3:0] signals 136 can control ONFI read and write timings as well as the control-pins to 4 chips using MDQ[7:0] and NDQ[7:0] channels and/or bus communication signals (e.g., signals 102-2, 102-3 shown in
The Vref
Furthermore, as depicted in
Furthermore, as depicted in
Furthermore, as depicted in
Thus, when responding to a command from either host device 910 or 915 to perform a procedure, SSD Controller 110 can communicate the instructions sent by the requesting host device to the DDR4 DRAM that is available to perform the requested procedure where it can then be store for processing. In this manner, DDR4 DRAM 104a and 104b act as separate elastic buffers that are capable of buffering data received DDR4-DBs 103a and 103b. Moreover, in this fashion, the two paths can use active-passive (“standby”) or active-active modes to increase the reliability and availability of the storage systems on DIMM device 100.
Furthermore,
As illustrated in
As shown in
At step 1105, the DDR4-Solid State Drive (SSD) controller receives the first signal and saves it into a NVME command queue at the DRAM level.
As step 1110, the DDR4-Solid State Drive (SSD) controller allocates buffers and associated flash pages in NAND flash chip arrays through a port (e.g., 8 bit port) corresponding to a pre-assigned data channel and stores the sequences of signals in the command queues at DRAMs resident on the DIMM. In one embodiment, the SSD controller can select the data buffers to store the signals or/and consequence data bursts based on detected DRAM traffic conditions concerning each data buffer.
At step 1115, the SSD controller generates DRAM write cmd/addr sequences of BL8 (burst length 8). These sequences (e.g., writes) can be generated using pre-allocated write buffers. In this fashion, a host can perform DMA/RDMA write operations using 4 KB or 16 KB data bursts into DRAMs with synchronized cmd/addr sequences by the SSD controller. In one embodiment, SSD controller can pack four 4 KB into a 16 KB page.
At step 1120, the SSD controller configures the first signal into a second signal (e.g., signal in the form of a second double data rate dynamic random access memory protocol, such as DDR2) using an Open NAND Flash Interface (ONFI) standard. The ONFI-over-DDR4 interface can modify an ONFI NV-DDR2 Cmd/Addr/data stream by splitting one 8 bit channel into ONFI Cmd/Addr bus to control 8 of DDR4-SSD DIMMs and one 8 bit ONFI data channel to stream long burst data transfers (reads or writes) for optimizing bus utilizations.
As shown in
At step 1130, the SSD controller transmits the read commands of NVME command queues to all related available flash chips with pre-allocated pages and associated output buffers as flash page read ops. All related DDR4-ONFI adaptors thru the cmd/addr/data streaming paths are carrying out the DDR4-to-DDR2 signal level and data rate adaptation and termination and/or retransmission functions.
At step 1135, the SSD controller sets up statue registers regions within the DDR4 DRAM on DIMM for ARM64/FPGA controllers to poll or check whether the ONFI write ops are completed, and also check for ONFI read completions with data ready in the related caches on each flash chip or die(s) inside the chips. In one embodiment, SSD controller can also send hardware interrupts to the unified memory interface at ARM64/FPGA controllers via the 8 bit ONFI cmd/addr bus (modified conventional DDR4 cmd/addr bus to be bi-directional bus). Upon ARM64/FPGA controller polling a read completion, ARM64/FPGA can interrupt the related host device for DMA read directly from the DRAM on DIMM, or will setup RDMA-engine in the ARM64/FPGA controller to RDMA write data packet (4 KB or 8 KB) to the assigned memory space in the host device by reading the DDR4-SSD DIMM associated read buffer. The SSD controller can generate the DRAM read cmd/address sequences to synchronously support this RDMA read burst (in 64 B or 256 B size).
At step 1140, upon receipt of a write completion, the SSD controller configures the data using the first double data rate dynamic random access memory protocol used when received at step 1100 for the next round of new read/write ops on available flash chips or dies. In one embodiment, the SSD controller can interrupt the ARM64/FPGA controller with relayed write-completion info in corresponding status registers; upon receipt of a read ready, the SSD controller will fetch the cached page in related flash chip and write to them to the pre-allocated output buffer in DRAM, then interrupt the ARM64/FPGA controller with relayed read-completion info.
Although exemplary embodiments of the present disclosure are described above with reference to the accompanying drawings, those skilled in the art will understand that the present disclosure may be implemented in various ways without changing the necessary features or the spirit of the present disclosure. The scope of the present disclosure will be interpreted by the claims below, and it will be construed that all techniques within the scope equivalent thereto belong to the scope of the present disclosure.
According to an embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be database servers, storage devices, desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
In the foregoing detailed description of embodiments of the present invention, numerous specific details have been set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention is able to be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention. Although a method is able to be depicted as a sequence of numbered steps for clarity, the numbering does not necessarily dictate the order of the steps. It should be understood that some of the steps may be skipped, performed in parallel, or performed without the requirement of maintaining a strict order of sequence. The drawings showing embodiments of the invention are semi-diagrammatic and not to scale and, particularly, some of the dimensions are for the clarity of presentation and are shown exaggerated in the drawing Figures. Similarly, although the views in the drawings for the ease of description generally show similar orientations, this depiction in the Figures is arbitrary for the most part.
Embodiments according to the present disclosure are thus described. While the present disclosure has been described in particular embodiments, it is intended that the invention shall be limited only to the extent required by the appended claims and the rules and principles of applicable law.
This application claims priority from U.S. Provisional Patent Application Ser. No. 61/951,987, filed Mar. 12, 2014 to Lee et al., entitled “DDR4 BUS ADAPTION CIRCUITS TO EXPAND ONFI BUS SCALE-OUT CAPACITY AND PERFORMANCE” which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61951987 | Mar 2014 | US |