1. Field of the Invention
The present invention is directed in general to the field of computer storage system. In one aspect, the present invention relates to an AHCI or an NVMe based SSD system which is directly connected to the system memory bus.
2. Description of the Related Art
PCIe SSDs have become extremely popular in a very short amount of time. They provide uncomplicated access to high performance storage, allowing latency problems to be significantly reduced on the server where the application is run. The problem with PCIe SSDs is that they require space in the server and can cause potential cooling problems. They also consume not insignificant amounts of power; consume CPU cycles to gain maximum performance.
A SATADIMM, produced by Viking Modular Solutions, resides in the DIMM memory slot of a motherboard to take advantage of spare DIMM memory slots for drawing power. However, I/O operations such as, data transfers to and from a SATADIMM is by way of a SATA cable connected to the SATADIMM, which does not take advantage of the significantly higher bandwidth of the main memory bus for I/O operations.
Many servers may have available DIMM slots since it is simply too expensive to fill them up with maximum capacity DRAM modules. DIMM-based SSD technology should be looked at as a serious alternative to expensive high capacity DRAM. Since a single SSD DIMM provides far inure capacity than DRAM DIMM can, the system can then use this storage as a cache or paging area for DRAM operations.
Therefore, there exists a need for a SSD system and method to provide similar performance to PCIe SSDs, and take the advantages of the SATADIMM, which will be directly connected to the system memory bus as an alternative to expensive high capaci DRAM.
A SSD system directly connected to the system memory bus is disclosed. A SSD system includes at least one system memory bus interface unit, one storage controller with associated shared system memory as its data buffer/cache, one data interconnect unit, one nonvolatile memory module, and flexible association between AHCI/NVMe commands and the nonvolatile memory module. A logical device interface, the Advanced Host Controller Interface or NVM Express, is used for the SSD system programming, which makes the SSD appear to the system as a SATA SSD or an NVMe SSD.
The present invention may be better understood, and its numerous objects, features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference number throughout the several figures designates a like or similar element.
Referring to
The SSD system 100 enables scaling by parallelizing the system memory bus interface and associated processing. The storage system 100 is applicable to more than one interface simultaneously. The storage system 100 provides a flexible association between command quanta and processing resource. The storage system 100 is partitionable, and thus includes completely isolated resource per unit of partition. The storage system 100 is virtualizable.
The storage system 100 includes a flexible non-strict classification scheme. Classification is performed based on command types, destination address, and requirements of QoS. The information used in classification is maskable and programmable. The storage command classification includes optimistically matching command execution orders during the non-strict classification to maximize system throughput. The storage system includes providing a flow table format that supports both exact command order matching and optimistic command order matching.
Referring to
Referring to
Referring to
The RX command queues 221 receive SATA or NVMe commands Storage commands received by the module are sent to the command parser 223.
The command parser 223 classifies the RX commands based on the type of command, the LBA of the target media, and the requirements of QoS. The command parser also terminates commands that are not related to the media read and write.
The command generator 224 generates the TX commands based on the requests from either the command parser 223 or the media processor 230. The generated commands are posted to the TX command queue 222 based on the tag and type of the corresponding RX command.
The command scheduler module 225 includes a strict priority (SP) scheduler module, a weighted round robin (WRR) scheduler module as well as a round robin (RR) scheduler module. The scheduler module serves the storage Interface Units within the storage interface subsystem 110 in either WRR scheme or RR scheme. For the commands coming from the same BIU, the commands shall be served based on the command type and target LBA. The NCQ commands are served strictly based on the availability of the target channel processor. When multiple channel processors are available, they are served in RR scheme. For the non-NCQ commands, they are served in FIFO format depending on the availability of the target channel processor.
Referring to
The media processor 230 includes a Microprocessor module 231, Virtual Zone Table module 232, a Physical Zone Table 234, a Channel Address Lookup Table 235, a DMA Manager module 233, and a Queue Manager module 236.
The Microprocessor module 231 includes one or more microprocessor cores. The module may operate as a large simultaneous multiprocessing (SMP) system with multiple partitions. One way to partition the system is based on the Virtual Zone Table. One thread or one microprocessor core is assigned to manage a portion of the Virtual Zone Table. Another way to partition the system is based on the index of the channel processor. One thread or one microprocessor core is assigned to manage one or more channel processors.
The Virtual Zone Table module 232 is indexed by host logic block address (LBA). It stores of entries that describe the attributes of every virtual strip in this zone. One of the attributes is host access permission that is capable to allow a host to only access a portion of the system (host zoning). The other attributes include CacheIndex that is cache memory address for this strip if it can be found in cache; CacheState is to indicate if this virtual strip is in the cache; CacheDirty is to indicate which modules cache content is inconsistency with flash; and FlashDirty is to indicate which modules in flash have been written. All the cache related attributes are managed by the Queue Manager module 236.
The Physical Zone Table module 234 stores the entries of physical NVM blocks and also describe the total lifetime flash write count to each block and where to find a replacement block in case the block goes bad. The table also has entries to indicate the corresponding LBA in the Virtual Zone Table.
Referring to
The channel processor 240 also supports data randomization using randomizer 243 and de-randomization using de-randomizer 244. The module performs CRC check on both receive and transmit data paths via the ECC encoder 241 and ECC decoder 242, respectively. The module controls the NVM interface timing, and access command sequences via the NVM interface controller 245.
Referring to
The NVM system 510 includes a plurality of NVM modules (510a, 510b, . . . , 510n). Each NVM module includes a plurality of nonvolatile memory dies or chips. The NVM may be one of a Flash Memory, Phase Change Memory (PCM), Ovonic Universal Memory (OUM), and Magnetoresistive RAM (MRAM). Each NVM module may be in the form factor of a DIMM.
Referring to
Referring to
The complete set of registers exposed by an AHCI Host Bus Adapter (HBA) interface are described in the SATA AHCI specification, and not duplicated here. Some key registers are;
AHCI implements the concept of ports. A port is a portal through which a SATA attached device has its interface exposed to the host and allows host direct or indirect access depending on the operational mode of the AHCI HBA. Each port has an associated set of registers that are duplicated across all ports. Up to a maximum of 32 ports may be implemented. Port registers provide the low level mechanisms through which the host access attached SATA devices. Port registers contain primarily either address descriptors or attached SATA device status. In this invention, all the PHY layer, link layer, and transport layer logic of the HBA and SATA ports have been removed to shorten the system access time to the SSD. Each NVM module in 510 can be optionally configured as a SATA device attached to the AHCI controller
As shown in
Issuance of a command to the SSD system 100 is a matter of constructing the command, staging it within an area of the DRAM module 410 and then notifying the AHCI controller 110 that it has commands staged and ready to be sent to the storage controller 210. The memory for each port's Command List is allocated statically due to the fact that AHCI registers must be initialized with the base address of the Command List. The data transfer related commands may have a Physical Region Descriptor (PRD) table which is a data structure used by DMA engines to describe memory regions for transferring data to/from the SSD 100. It is an entry in a scatter/gather list. Since the DMA engine inside the storage controller 210 of the SSD can not directly access the system memory other than DRAM module 410, it is required to allocate the system memory associated to the PRD table inside the DRAM module 410 address space.
Command completion is provided through mechanisms and constructs that are built on the SATA protocols. On command completion the storage controller 210 returns a Device-to-Host Frame Information Structure (FIS). Additional FIS types may play a role in command completion depending on the type of command that was issued and how it was relayed to the SSD 110. Regardless of the FIS types used, the purpose of the completion FIS is to communicate command completion status as well as to update overall device status. The return status FIS is contained within the DRAM module 410 based table termed the Received FIS Structure. At the time the host initializes the AHCI controller inside BIU 110 it will allocate host memory space inside the DRAM module 410 for the purpose of accepting received device FIS information. Each port of an adaptor has its own area of host memory reserved for this purpose.
Notification of command completion can be via interrupt or polling. The AHCI controller inside BIU 110 may be configured to generate an interrupt on command completion or the host may choose to poll the port's Command Issue register and, if the command is a NCQ command, the Serial ATA Active registers. If the host chooses to be notified of command completion via interrupts, then on interruption the host will have to read the contents of three, possibly four, controller registers. The host will have to read the AHCI controller's interrupt status register to determine which port has caused the interrupt, read the port interrupt status register to discover the reason for the interrupt, read the port's Command Issue register to determine which command has completed and finally, if the command is an NCQ command, read the port's Serial ATA Active register to determine the TAG for the queued command. A new pin or the EVENT# pin on the DIMM may be used to generate interrupt to the system.
Referring to
The most significant difference between AHCI and NVMe is in the performance goals of the two interfaces. NVMe was architected from the ground up to provide the most bandwidth and lowest latency possible with today's systems and devices. While performance was important to AHCI, it was in the context of SATA HDDs which do not place the same demands on the surrounding infrastructure and support matrix as PCIe SSDs. The main differences in the two interfaces are listed as following:
NVMe as an interface to devices that have extremely low latency and high bandwidth characteristics has endeavored to enable the full benefit of the device to be realized by the system in which they are used. Efficiency in the transfer of commands and status was made a top priority in the interface design. Parallelism in the interface was also a priority so that the highly parallel systems of today could take full advantage of multiple concurrent IO paths all the way down to the device itself. Add a system memory controller 720 and a CPU core 710 to the storage system as shown in
Referring to
The present invention is well adapted to attain the advantages mentioned as well as others inherent therein. While the present invention has been depicted, described, and is defined by reference to particular embodiments of the invention, such references do not imply a limitation on the invention, and no such limitation is to be inferred. The invention is capable of considerable modification, alteration, and equivalents in form and function, as will occur to those ordinarily skilled in the pertinent arts. The depicted and described embodiments are examples only, and are not exhaustive of the scope of the invention.
For example, while particular architectures are set forth with respect to the SSD system and the SSD host interface unit, it will be appreciated that variations within these architectures are within the scope of the present invention. Also, while particular storage command flow descriptions are set forth, it will be appreciated that variations within the storage command flow are within the scope of the present invention.
Also for example, the above-discussed embodiments include modules and units that perform certain tasks. The modules and units discussed herein may include hardware modules or software modules. The hardware modules may be implemented within custom circuitry or via some form of programmable logic device. The software modules may include script, batch, or other executable files. Thus, the modules may be stored within a computer system memory to configure the computer system to perform the functions of the module. Other new and various types of computer-readable storage media may be used to store the modules discussed herein. Additionally, those skilled in the art will recognize that the separation of functionality into modules and units is for illustrative purposes. Alternative embodiments may merge the functionality of multiple modules or units into a single module or unit or may impose an alternate decomposition of functionality of modules or units. For example, a software module for calling sub-modules may be decomposed so that each sub-module performs its function and passes control directly to another sub-module.
Consequently, the invention is intended to be limited only the spirit and scope of the appended claims, giving full cognizance to equivalents in all respects.
The present application is a continuation-in-part of U.S. patent application Ser. No. 11/953,080, filed on Dec. 10, 2007, which claims the benefit of U.S. Provisional Application No. 60/875,316 entitled “Nonvolatile memory (NVM) based solid-state disk (SSD) system for scaling and quality of service (QoS) by parallelizing command execution” filed Dec. 18, 2006, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 11953080 | Dec 2007 | US |
Child | 13629642 | US |