In one embodiment, a multi-function non-volatile memory express (NVMe) subsystem is provided. The multi-function NVMe subsystem includes a plurality of primary controllers with each primary controller of the plurality of primary controllers being pre-allocated with a predetermined number of queue resources. A first primary controller of the plurality of primary controllers is configured to, after initialization, identify a first number of queue resources to be utilized by the first primary controller, and to request fewer queue resources than the predetermined number of queue resources allocated to the first primary controller when the first number of queue resources is less than the predetermined number of queue resources pre-allocated to the first primary controller. The first primary controller is further configured to reallocate any remaining queue resources pre-allocated to the first primary controller to a global queue resource pool for utilization by a different primary controller of the plurality of primary controllers.
In another embodiment, a method of managing queue resources in a multi-function NVMe subsystem is provided. The method includes pre-allocating a predetermined number of queue resources to each primary controller of a plurality of primary controllers of the multi-function NVMe subsystem. The method also includes identifying a first number of queue resources to be utilized by a first primary controller of the plurality of primary controllers. The method further includes requesting fewer queue resources than the predetermined number of queue resources allocated to the first primary controller when the first number of queue resources is less than the predetermined number of queue resources pre-allocated to the first primary controller, and reallocating any remaining queue resources pre-allocated to the first primary controller to a global queue resource pool for utilization by a different primary controller of the plurality of primary controllers.
In yet another embodiment, a multi-function NVMe subsystem is provided. The multi-function NVMe subsystem includes a plurality of primary controllers, and a plurality of queue resources. The multi-function NVMe subsystem also includes a plurality of policies with each different policy of the plurality of policies differently dictating how the plurality of queue resources is divided amongst different primary controllers of the plurality of primary controllers.
This summary is not intended to describe each disclosed embodiment or every implementation of the NVMe policy-based input/output (I/O) queue allocation described herein. Many other novel advantages, features, and relationships will become apparent as this description proceeds. The figures and the description that follow more particularly exemplify illustrative embodiments.
Embodiments of the disclosure generally relate to queue resource management in non-volatile memory (NVM) subsystems, which utilize a NVM Express (NVMe) interface to enable host software to communicate with the NVM subsystem. The NVM subsystem that employs the NVMe interface is hereinafter referred to as a NVMe subsystem. The NVMe subsystem may include a single data storage device (e.g., a single solid state drive (SSD)) or a plurality of data storage devices.
In general, prior NVMe SSD designs statically divide available queue resources across controllers within the NVMe subsystem. This works in some customer use-cases, but lacks flexibility for more complex customer models (such as special controller models that include administrative controllers).
Embodiments of the disclosure provide for flexible queue resource management in multi-function NVMe subsystems. A function, or peripheral component interconnect (PCI) function, represents an endpoint in a PCI device. A host attaches a driver to the function where the function exposes a protocol based upon the type of function (storage device, network device, display device, etc.). There may also be multiple functions that each expose the same protocol (such as a mass storage device using the NVMe protocol). A PCI function represents a single “controller” within the NVMe subsystem. In NVMe subsystems with multiple functions, each function has its own primary controller, and different numbers of queue resources may be suitable for the different primary controllers. In other words, there may be some asymmetry for different primary controller types (for example, an administrative controller or discovery controller may employ only one administrative queue resource, whereas input/output (I/O) controllers may generally employ a queue resource per central processing unit (CPU) core of the host system to which they are attached (plus an administrative queue resource)). It should be noted that, in some embodiments, all primary controllers of the NVMe subsystem may be I/O controllers, by different I/O controllers may employ different numbers of queue resources.
Embodiments of the disclosure modify the manner in which queue resources are allocated to a given controller through a policy, which is described further below. As indicated above, past designs have implemented the static allocation policy, but richer policies are also provided herein to tailor the behavior of the controller for queue resource allocation. This policy-based approach is compatible with existing mechanisms defined in a current NVMe specification.
It should be noted that like reference numerals are used in different figures for same or similar elements. It should also be understood that the terminology used herein is for the purpose of describing embodiments, and the terminology is not intended to be limiting. Unless indicated otherwise, ordinal numbers (e.g., first, second, third, etc.) are used to distinguish or identify different elements or steps in a group of elements or steps, and do not supply a serial or numerical limitation on the elements or steps of the embodiments thereof. For example, “first,” “second,” and “third” elements or steps need not necessarily appear in that order, and the embodiments thereof need not necessarily be limited to three elements or steps. It should also be understood that, unless indicated otherwise, any labels such as “left,” “right,” “front,” “back,” “top,” “bottom,” “forward,” “reverse,” “clockwise,” “counter clockwise,” “up,” “down,” or other similar terms such as “upper,” “lower,” “aft,” “fore,” “vertical,” “horizontal,” “proximal,” “distal,” “intermediate” and the like are used for convenience and are not intended to imply, for example, any particular fixed location, orientation, or direction. Instead, such labels are used to reflect, for example, relative location, orientation, or directions. It should also be understood that the singular forms of “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
It will be understood that, when an element is referred to as being “connected,” “coupled,” or “attached” to another element, it can be directly connected, coupled or attached to the other element, or it can be indirectly connected, coupled, or attached to the other element where intervening or intermediate elements may be present. In contrast, if an element is referred to as being “directly connected,” “directly coupled” or “directly attached” to another element, there are no intervening elements present. Drawings illustrating direct connections, couplings or attachments between elements also include embodiments, in which the elements are indirectly connected, coupled or attached to each other.
As indicated above, NVMe subsystem 104 may include a single data storage device (e.g., a single solid state drive (SSD)) or a plurality of data storage devices. In the embodiment of
In the embodiment of
1) A default policy, which can be changed later in the field, may be set within the NVMe subsystem 104 before shipping.
2) The policy may be selected through a command from the host 102. The selection is enacted once the NVMe subsystem 104 is reset.
3) The policy may be selected through a side-band management channel (such as System Management Bus (SMBus) or PCI Vendor Defined Message (VDM) using the Management Component Transport Protocol (MCTP)).
Table 1 below shows examples of different policies 118.
In an initial state of NVMe subsystem 104, before the controllers 110A, 110B, 110C are initialized, all of the queue resources are unallocated. Then, once the controllers 110A, 110B, 110C are initialized, based upon the policy 118, the controllers 110A, 110B, 110C receive a number of queues that they can create and advertise to the host 102. In some environments (e.g., a server environment), host 102 may allocate a queue to each CPU 106 core. Processor 120 may be configured to update a table (not shown) in NVMe subsystem 104 that is utilized to track queue resource allocation for each controller 110A, 110B, 110C. It should be noted that the definition of the queues and the arbitration or management of the queues exist within the NVMe subsystem 104, but the memories of the queues themselves (e.g., that memories that store the host 102 commands to be processed by NVMe subsystem 104 and command completion notifications from NVMe subsystem 104) exist in host 102 memory (e.g., in memory 108). A canonical architecture of an NVMe subsystem is briefly described below in connection with
Queues may be allocated through a NVMe get/set feature called “Number of Queues.” It should be noted that this is from the host perspective. The host will allocate queues for use, but internally in the NVMe subsystem the queue resources are allocated to a controller for host allocation. The host may use a “Set Number of Queues” feature to identify how many queues that it wants for a given controller 310A, 310B, 310C. The NVMe subsystem 304 responds with the number of queues that are available for that controller 310A, 310B, 310C. As indicated above, in embodiments of the disclosure, the allocation policy defines how many queues a given controller 310A, 310B, 310C may receive.
The illustrations of the embodiments described herein are intended to provide a general understanding of the structure of the various embodiments. The illustrations are not intended to serve as a complete description of all of the elements and features of apparatus and systems that utilize the structures or methods described herein. Many other embodiments may be apparent to those of skill in the art upon reviewing the disclosure. Other embodiments may be utilized and derived from the disclosure, such that structural and logical substitutions and changes may be made without departing from the scope of the disclosure. Additionally, the illustrations are merely representational and may not be drawn to scale. Certain proportions within the illustrations may be exaggerated, while other proportions may be reduced. Accordingly, the disclosure and the figures are to be regarded as illustrative rather than restrictive.
One or more embodiments of the disclosure may be referred to herein, individually and/or collectively, by the term “invention” merely for convenience and without intending to limit the scope of this application to any particular invention or inventive concept. Moreover, although specific embodiments have been illustrated and described herein, it should be appreciated that any subsequent arrangement designed to achieve the same or similar purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all subsequent adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the description.
The Abstract of the Disclosure is provided to comply with 37 C.F.R. § 1.72 (b) and is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, various features may be grouped together or described in a single embodiment for the purpose of streamlining the disclosure. This disclosure is not to be interpreted as reflecting an intention that the claimed embodiments employ more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter may be directed to less than all of the features of any of the disclosed embodiments.
The above-disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments, which fall within the true spirit and scope of the present disclosure. Thus, to the maximum extent allowed by law, the scope of the present disclosure is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description.