Benefit is claimed under 35 U.S.C. 119(a)-(d) to Foreign application Serial No. 175/CHE/2009 entitled “METHOD AND SYSTEM FOR OPTIMIZING NETWORK INPUT/OUTPUT PERFORMANCE” by Hewlett-Packard Development Company, L.P., filed on 27th January, 2009, which is herein incorporated in its entirety by reference for all purposes.
In a storage network comprising multiple hosts and a disk array, a disk driver is a computer program which allows one or more applications accessing a host to interact with a disk in the disk array. The disk driver typically communicates with the disk through a computer bus or communications subsystem to which the disk is connected. When an application invokes one or more input/output (I/O) requests to the driver, the disk driver issues respective commands to the disk. Once the disk sends data back to the disk driver, the disk driver may invoke routines in the original application program.
A queue may be used to deal with multiple I/O requests in the host side as well as the disk array side. On the host side, multiple I/O requests from one or more applications may wait in a queue coupled to the disk driver until the disk driver is ready to service them. On the disk array side, many I/O requests from one or more hosts may wait in a queue coupled to each disk (e.g., or port) of the array to execute the I/O requests in order. Here, a maximum queue depth may refer to the maximum number of the I/O requests or commands which can be concurrently issued by the disk driver, where not all of the I/O requests may be serviced or responded by the disk.
In the industry, it is common practice to configure the maximum queue depth before a host device, such as a server, is delivered to a customer. However, the configuration may be done without considering the type or capability of the disk device(s) the host is connected to. Accordingly, if the maximum queue depth is too big for the customer's use, it may cause the disk driver to issue too many I/O requests, thus resulting in slowing down or termination of the I/O process. On the other hand, if the maximum queue depth is too small, the disk driver may issue too few I/O requests at a time, thus resulting in lower throughput and inefficient usage of the disk device. Furthermore, when too many I/O requests from multiple hosts are concurrently forwarded to the queue for the same disk of the storage array, a “queue full” condition may occur at the device side of the queue. In such a case, a host may have to retransmit the I/O requests which have been rejected, thus significantly reducing the I/O throughput of the storage network.
Embodiments of the present invention are illustrated by way of an example and not limited to the FIGs. of the accompanying drawings, in which like references indicate similar elements and in which:
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.
A method and system for optimizing network I/O throughput is disclosed. In the following detailed description of the embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
The file system 110, which sits on top of the volume manager 112, stores and organizes computer files and the data so as to make it easy to find and access them. The volume manager 112 may allocate space on the two storage devices 106A and 106B by concatenating, striping together or otherwise combining partitions into larger virtual ones that can be resized or moved while the storage devices 106A and 106B are used. Underneath the layer of the volume manager 112, an aggregate of multiple instances of disk driver (e.g., the disk driver 114) may reside. The disk driver 114 allows one or more applications (not shown in
The multi-pathing layer 117 figures out if there are multiple paths to a particular storage device (e.g., the disk 124A) in terms of multiple hardware adapters (e.g., the HBA 120A, the HBA 120B, etc.), in terms of multiple switches, which may reside in the network 104, or multiple controllers on the storage device. The multi-pathing layer 117 may aggregate all these paths and provides an optimal path to the storage device. It is appreciated that the multi-pathing layer 117 can be embedded or included within the layer of the disk driver 114. Below the multi-pathing layer 117, an interface driver (e.g., the HBA driver 118) manages an interface card (e.g., the HBA 120A or the HBA card 120B, which could be operating at different data transfer speeds).
Then, the HBA 120A and the HBA 120B are used to connect the host 102A to the network 104 (e.g., switches) via cables. Then, cables run between the network 104 and the disk array 106A and the storage device(s) 106B. It is appreciated that the network 104 can be any one of a storage area network (SAN), a network attached storage (NAS), and a direct attached storage (DAS). As illustrated in
In one embodiment, the disk driver 114 may comprise the adaptive queue module 116 which adjusts a maximum queue depth (which is not shown in
It is appreciated that the adaptive queue module 116 illustrated in
In order to rectify the slow down of I/O performance by the device, the adaptive queue module 116 of
Once the disk 124A processes the I/O request 306, it may forward a response 310 back to the disk driver 114 (e.g., with accompanying data). In one embodiment, a service time 312 consumed by the disk 124A for responding to the I/O request 306 may be monitored (e.g., measured). Once the service time 312 is obtained, it may be compared with an expected service time 316 for a typical I/O request which is similar in size with the I/O request 306 and which accesses a same type of device as the disk 124A to determine an I/O performance status 328 between the disk driver 114 and the disk 124A. In one embodiment, the I/O performance status 328 is used to adjust the maximum queue depth for the disk 124A and/or to control the number of I/O requests that can be concurrently issued by the disk driver 114.
In one embodiment, the expected service time 316 may be obtained using a table listing expected service time 324 according to a size of I/O request 320 and a type of storage device 318 being accessed. Thus, typical or expected service times for I/O requests of different I/O sizes (e.g., 1 KB, 2 KB, 4 KB, etc.) and/or for different device types can be maintained in a persistent, yet updatable repository, such as a non-volatile random access memory (RAM) 326. As illustrated in
In another embodiment, a mathematical approximation technique, such as a linear parametric model 322, may be used to extrapolate the expected service time. In one exemplary embodiment, the parametric model for the disk 124A that can be accessed by the disk driver (e.g., the adaptive queue module 116 or a comparator 314) may be built or generated. For instance, the expected service time 316 can be modeled as “y=ax+b,” where “y” is the expected service time 316, “a” is related to the byte transfer rate of the link connecting the host 102A to the disk 124A, “x” is the number of bytes (e.g., the size of I/O request 320) in the IO request 306, and “b” is the set-up time for the request. In one embodiment, the disk driver 114 of the host 102A may compute “a” and “b” during the initial phase of its operation. It is appreciated that the mathematical approximation technique can include other types of mathematical models, such as a quadratic model, a geometric model, and so on.
With the expected service time obtained for the type of storage device 318 and the size of the I/O request 320, the comparator 314 is used to generate the I/O performance status 328 (e.g., too slow service time, too fast service time, etc.). As will be illustrated in detail in
Then, in operation 408, the service time is compared with the expected service time (e.g., which can be obtained using one of the two methods illustrated in
In operation 504, a status of an I/O performance between the storage driver and the storage device is determined by comparing the service time with an expected service time for the storage device in completing the I/O request, where the expected service time is calculated based on a type of the storage device and size of the I/O request. In one embodiment, the expected service time is stored in a non-volatile RAM coupled to the storage driver. In one exemplary implementation, the expected service time is generated by using a mathematic approximation technique, such as a linear parametric model, which sets the expected service time equal to a byte transfer rate between the storage driver and the storage device multiplied by a number of bytes in the I/O request compensated by a set-up time for the I/O request. It is appreciated that the byte transfer rate between the storage driver and the storage device and the set-up time for the I/O request are generated during an initial phase of I/O operations between the storage driver and the storage device. In addition, the status of the I/O performance may comprise a too slow service time and a too fast service time.
In operation 506, a maximum queue depth associated with the storage device, henceforth the storage driver, is adjusted based on the status of the I/O performance. In one embodiment, the maximum queue depth may be decreased (e.g., by half) if the service time is greater than the expected service time value multiplied by a first factor (e.g., 10). In alternative embodiment, the maximum queue depth may be increased if the service time is significantly less than the expected service time and if the number of pending I/O requests in the I/O queue is greater than the maximum queue depth multiplied by a second factor (e.g., 0.7).
In addition, it will be appreciated that the various operations, processes, and methods disclosed herein may be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and may be performed in any order (e.g., including using means for achieving the various operations). Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Although the present embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments. For example, the various devices, modules, analyzers, generators, etc. described herein may be enabled and operated using hardware circuitry (e.g., CMOS based logic circuitry), firmware, software and/or any combination of hardware, firmware, and/or software (e.g., embodied in a machine readable medium). For example, the various electrical structure and methods may be embodied using transistors, logic gates, and electrical circuits (e.g., application specific integrated circuitry (ASIC)).
Number | Date | Country | Kind |
---|---|---|---|
175/CHE/2009 | Jan 2009 | IN | national |