This invention relates to systems and methods for managing interrupts from a peripheral device to a host system.
In a typical computing architecture, a storage device receives a command, executes the command, and sends an interrupt signal to a host with a completion queue entry to notify the host of completion of the command. In some approaches, interrupts are coalesced according to one or both of an aggregation time and an aggregation threshold. In the aggregation time approach, interrupts are aggregated for a threshold period of time before an interrupt is issued to the host system. In the aggregation threshold approach, interrupts are aggregated until a threshold number is reached before an interrupt is issued to the host system.
It would be an improvement in the art to improve the function of a storage or other peripheral device implementing interrupt coalescing.
In order that the advantages of the invention will be readily understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered limiting of its scope, the invention will be described and explained with additional specificity and detail through use of the accompanying drawings, in which:
It will be readily understood that the components of the present invention, as generally described and illustrated in the Figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the invention, as represented in the Figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of certain examples of presently contemplated embodiments in accordance with the invention. The presently described embodiments will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout.
The invention has been developed in response to the present state of the art and, in particular, in response to the problems and needs in the art that have not yet been fully solved by currently available apparatus and methods.
Embodiments in accordance with the present invention may be embodied as an apparatus, method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
Any combination of one or more computer-usable or computer-readable media may be utilized. For example, a computer-readable medium may include one or more of a portable computer diskette, a hard disk, a random access memory (RAM) device, a read-only memory (ROM) device, an erasable programmable read-only memory (EPROM or flash memory) device, a portable compact disc read-only memory (CDROM), an optical storage device, and a magnetic storage device. In selected embodiments, a computer-readable medium may comprise any non-transitory medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object-oriented programming language such as Java, Smalltalk, C++, or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on a computer system as a stand-alone software package, on a stand-alone hardware unit, partly on a remote computer spaced some distance from the computer, or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions or code. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a non-transitory computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Computing device 100 includes one or more processor(s) 102, one or more memory device(s) 104, one or more interface(s) 106, one or more mass storage device(s) 108, one or more Input/Output (I/O) device(s) 110, and a display device 130 all of which are coupled to a bus 112. Processor(s) 102 include one or more processors or controllers that execute instructions stored in memory device(s) 104 and/or mass storage device(s) 108. Processor(s) 102 may also include various types of computer-readable media, such as cache memory.
Memory device(s) 104 include various computer-readable media, such as volatile memory (e.g., random access memory (RAM) 114) and/or nonvolatile memory (e.g., read-only memory (ROM) 116). memory device(s) 104 may also include rewritable ROM, such as flash memory.
Mass storage device(s) 108 include various computer readable media, such as magnetic tapes, magnetic disks, optical disks, solid-state memory (e.g., flash memory), and so forth. As shown in
I/O device(s) 110 include various devices that allow data and/or other information to be input to or retrieved from computing device 100. Example I/O device(s) 110 include cursor control devices, keyboards, keypads, microphones, monitors or other display devices, speakers, printers, network interface cards, modems, lenses, CCDs or other image capture devices, and the like.
Display device 130 includes any type of device capable of displaying information to one or more users of computing device 100. Examples of display device 130 include a monitor, display terminal, video projection device, and the like.
interface(s) 106 include various interfaces that allow computing device 100 to interact with other systems, devices, or computing environments. Example interface(s) 106 include any number of different network interfaces 120, such as interfaces to local area networks (LANs), wide area networks (WANs), wireless networks, and the Internet. Other interface(s) include user interface 118 and peripheral device interface 122. The interface(s) 106 may also include one or more user interface elements 118. The interface(s) 106 may also include one or more peripheral interfaces such as interfaces for printers, pointing devices (mice, track pad, etc.), keyboards, and the like.
Bus 112 allows processor(s) 102, memory device(s) 104, interface(s) 106, mass storage device(s) 108, and I/O device(s) 110 to communicate with one another, as well as other devices or components coupled to bus 112. Bus 112 represents one or more of several types of bus structures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, and so forth.
For purposes of illustration, programs and other executable program components are shown herein as discrete blocks, although it is understood that such programs and components may reside at various times in different storage components of computing device 100, and are executed by processor(s) 102. Alternatively, the systems and procedures described herein can be implemented in hardware, or a combination of hardware, software, and/or firmware. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein.
Referring to
The methods described below may be performed by the host, e.g. the host interface 208 alone or in combination with the SSD controller 206. The methods described below may be used in a flash storage system 200 or any other type of non-volatile storage device. The methods described herein may be executed by any component in such a storage device or be performed completely or partially by a host processor coupled to the storage device.
The host 300 may implement a command queue 304, a completion queue 306, and an interrupt handler 308. The command queue 304 stores commands that are fetched and executed by the storage device 302. The commands may be executed using one or both of the Flash Translation Layer (FTL) and Flash Interface Layer (FIL) of the storage device 302 The completion queue 306 stores outcomes from execution of the commands by the storage device 302. In some embodiments, multiple command queues 304 (also referred to as submission queues 304 in some contexts) may be mapped by the host 300 to the same completion queue 306. The command queue 304 and completion queue 306 may be implemented in hardware or firmware.
The interrupt handler 308 receives interrupts from the storage device 302 and performs functions corresponding to the interrupt. For example, the interrupt handler 308 may define a plurality of interrupts or interrupts vector and perform a function corresponding to each interrupt when the each interrupt is set by the storage device 302. For example, where a command is a read operation, the completion queue 306 may include the data read by the storage device in response to the read operation. Accordingly, the interrupt handler 308 may respond to an interrupt from the storage device 302 by reading the data from the completion queue 306 and returning it to a process that invoked the read operation. The manner in which the interrupt handler 308 implements and processes interrupts may be according to any approach for implementing known in the art.
The storage device 302 may include a command fetcher 310 that retrieves commands from the command queue 304 and invokes execution of the commands by a command processor 312. For example, the command processor 312 may read and write data from a storage medium in response to read and write commands, respectively and return a result of the commands to a completion manager 314. The completion manager places the result of each command (“the completion entry”) in the completion queue 306 and further generates an interrupt to the interrupt handler 308. The interrupt handler 308 will then read the completion entries and remove them from the completion queue 306.
The storage device 302 may be embodied as a Non Volatile Memory Express (NVMe) device and the host 300 may define an interface according to the NVMe specification for interacting with an NVMe device.
The approach of
In the approach of
The method 600 may include evaluating whether a completion is requested by the command processor 312 for a command. In response, the completion manager 314 sends 604 a completion entry including a result of the command as received from the command processor 312 to the completion queue 306 of the host 300. The completion manager 314 may further evaluate 606 whether there are commands outstanding in the command queue 304. Step 606 may include evaluating whether there are any commands in the command queue 304 or whether the number of commands in the command queue 304 meets a depth threshold. If the result of step 606 is negative (no commands in the command queue 304 or the threshold condition is not met), then an interrupt is sent 610 to the interrupt handler 308. As a result of steps 606 and 610, coalescing of completions is disabled and consideration of the completion queue 306 with respect to an aggregation threshold or aggregation time is disabled. This will therefore reduce latency in cases where the command queue depth is low.
For example, if the command queue depth is lower than the aggregation threshold, it is unlikely that the aggregation threshold will be met and therefore generation of an interrupt will be delayed until the aggregation delay is met. Accordingly, the depth threshold may be selected to be a value that is smaller than the aggregation threshold. Likewise, the storage device may have the ability to process a certain number of commands per unit time (“throughput”). Accordingly, the depth threshold may be selected such that the depth threshold divided by the throughput is less than the aggregation delay.
In instances where multiple command queues 304 are mapped to the same completion queue 306, step 606 may include evaluating all of these queues. For example, where a single command is sufficient to meet the condition of step 606, a command in any of the command queues 304 mapped to the completion queue will be sufficient to meet the condition of step 606. Where a threshold number of commands is required, when the aggregate number of commands in the command queues meets this threshold number, the condition of step 606 may be deemed to be met.
If the result of step 606 is positive (a command in the command queue 304 or the threshold condition is met), then step 608 may be executed. Step 608 may include evaluating 608 whether a threshold is met 608 by the number of completions in the completion queue 306 that have not been read and removed by the interrupt handler. In particular, step 608 may include evaluating the number of completions in the completion queue 306 with respect to one or both of an aggregation threshold and an aggregation time. In particular, if the number of completions in the completion queue 306 meets the aggregation threshold, the result of step 608 if positive. If the oldest completion in the completion queue 306 is older than the aggregation time, then the result of step 608 is positive. In some approaches, the completion queue 306 is only evaluated with respect to one of the aggregation threshold and the aggregation delay.
If the result of step 608 is positive, then an interrupt is sent 610 to the host. If the result of step 608 is negative, processing may continue at step 602. In particular, an interrupt is not generated until the result of step 606 or step 608 is positive. Note that where no completion request is received 602 during an iteration of the method 600 but the result of step 608 is positive, then an interrupt is sent 610.
Note that a single storage device 302 may have multiple corresponding interrupt vectors implemented by the host 300. In some embodiments, each interrupt vector has its own corresponding command queue 304 and completion queue 306. Accordingly, completion entries for commands corresponding to a particular interrupt vector may be placed in the completion queue 306 for that interrupt vector. Accordingly, the method 600 may be executed for each interrupt vector based on the state of the command queue 304 and completion queue 306 of that interrupt vector.
The advantage of the approach of
In prior systems, the aggregation delay and aggregation threshold are global to all interrupt vectors. Using the approach described above, the interrupt coalescing for each interrupt vector is managed according to the command queue 304 and completion queue 306 for that interrupt vector thereby enabling more fine control of the coalescing for each interrupt vector.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative, and not restrictive. In particular, although the methods are described with respect to a NAND flash SSD, other SSD devices or non-volatile storage devices such as hard disk drives may also benefit from the methods disclosed herein. The scope of the invention is, therefore, indicated by the appended claims, rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.