The disclosed invention relates to RAID controllers and more specifically to improving I/O performance and controlling I/O latency for a RAID array.
There are many applications, particularly in a business environment, where there are needs beyond what can be fulfilled by a single hard disk, regardless of its size, performance or quality level. Many businesses can't afford to have their systems go down for even an hour in the event of a disk failure. They need large storage subsystems with capacities in the terabytes. And they want to be able to insulate themselves from hardware failures to any extent possible. Some people working with multimedia files need fast data transfer exceeding what current drives can deliver, without spending a fortune on specialty drives. These situations require that the traditional “one hard disk per system” model be set aside and a new system employed. This technique is called Redundant Arrays of Inexpensive Disks or RAID. (“Inexpensive” is sometimes replaced with “Independent”, but the former term is the one that was used when the term “RAID” was first coined by the researchers at the University of California at Berkeley, who first investigated the use of multiple-drive arrays in 1987. See D. Patterson, G. Gibson, and R. Katz. “A Case for Redundant Array of Inexpensive Disks (RAID)”, Proceedings of ACM SIGMOD '88, pages 109-116, June 1988.
The fundamental structure of RAID is the array. An array is a collection of drives that is configured, formatted and managed in a particular way. The number of drives in the array, and the way that data is split between them, is what determines the RAID level, the capacity of the array, and its overall performance and data protection characteristics.
An array appears to the operating system to be a single logical hard disk. RAID employs the technique of “striping”, which involves partitioning each drive's storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order.
In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be relatively small (perhaps 64 k bytes) so that a single record often spans all disks and can be accessed quickly by reading all disks at the same time.
In a multi-user system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O (Input/Output) across drives.
Most modern, mid-range to high-end disk storage systems are arranged as RAID configurations. A number of RAID levels are known. RAID-0 “stripes” data across the disks. RAID-1 includes sets of N data disks and N mirror disks for storing copies of the data disks. RAID-3 includes sets of N data disks and one parity disk, and is accessed with synchronized spindles with hardware used to do the striping on the fly. RAID-4 also includes sets of N+1 disks, however, data transfers are performed in multi-block operations. RAID-5 distributes parity data across all disks in each set of N+1 disks. RAID levels 10, 30, 40, and 50 are hybrid levels that combine features of level 0, with features of levels 1, 3, 4 and 5. One description of RAID types can be found at http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci214332,00.html.
Thus RAID is simply several disks that are grouped together in various organizations to either improve the performance or the reliability of a computer's storage system. These disks are grouped and organized by a RAID controller.
All I/O to a redundant array is through the RAID controller. I/O requests for a disk in a redundant array originate from an application and are conveyed by the OS (Operating System) to the RAID controller. These I/O requests are then issued by the RAID controller to respective disks in the array. Conventional method of improving I/O performance by using a sorted queue
A common method to improve random I/O performance in a redundant array involves sorting the I/Os before issuing them to respective disks in the array. I/Os are sorted according to their read or write location on the disk, thereby optimizing movement of the disk's head and reducing I/O processing delays. While this does reduce movement of the disk's head, it is however an “unfair algorithm” in that it will continuously sort new I/Os ahead of previously received I/Os if the read or write location for the new I/Os precedes that of the previously received I/Os. This is not an issue if the incoming I/O rate is low. However, if the incoming I/O rate is high, then possibly an excessive number of new I/Os are sorted before previously received I/Os, thereby creating an unfair algorithm. Thus while head movement is minimized, existing I/Os in the queue might have to wait longer than necessary to be processed. Alternatively, I/Os can be processed in the order they were received, thereby providing a first come first served methodology. However, the tradeoff is excessive disk head movement which results in increased I/O latency. A “fair algorithm” would be able to provide reasonable priority to foremost I/Os while minimizing disk head movement.
What is needed is a new method to improve I/O performance and control I/O latency when issuing I/Os to a redundant array.
The invention comprises a method and computer program product for improving I/O performance and controlling I/O latency for reading or writing to a disk in a redundant array, comprising determining an optimal number of I/O sort queues, their depth and a latency control number, directing incoming I/Os to a second sort queue if the queue depth or latency control number for a first sort queue is exceeded, directing incoming I/Os to a FIFO queue if all sort queues are saturated and issuing I/Os to a disk in the redundant array from the sort queue having the foremost I/Os.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed. The detailed description is not intended to limit the scope of the claimed invention in any way.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. In the drawings:
While the present invention is described herein with reference to illustrative embodiments for particular applications, it should be understood that the invention is not limited thereto. Those skilled in the art with access to the teachings provided herein will recognize additional modifications, applications, and embodiments within the scope thereof and additional fields in which the invention would be of significant utility.
The invention uses n sort queues in combination with a First In First Out (FIFO) queue to improve I/O performance and control I/O latency by using an algorithm that provides fairness to previously received I/Os. A “latency control number” is used in the invention to control the switching of queues. The latency control number in conjunction with other parameters such as the number of sorted queues and queue depth are used to control latency and maintain a fair algorithm. The number of sorted queues, the queue depth and the latency control number are determined based on I/O request rates and I/O statistics. Each disk in the array has its own FIFO queue and n sort queues having corresponding queue depths and latency control numbers. The FIFO queue is sufficiently deep to accept all incoming I/Os that cannot be directed to a sort queue.
Incoming I/Os are initially stored in a first sort queue which sorts the I/Os according to read or write location to a disk. When either the queue depth or the latency control number for the queue is exceeded, it is said to be “saturated” or in the “saturated state”. When the queue is completely empty it is said to be “empty” or in the “empty state”. The queue remains in the saturated state till all stored I/Os have been issued and does not accept any new incoming I/Os till it is in the empty state. This ensures fairness in the algorithm by first issuing the foremost I/Os to a disk in the redundant array.
If the first sort queue enters the saturated state, incoming I/Os are transferred to the next sort queue. While the second sort queue is receiving I/Os, the first sort queue continues issuing I/Os to the disk. After the first sorted queue is empty, the second sort queue issues I/Os to the disk and so on.
If all the sort queues are in a saturated state, incoming I/O requests are directed towards the FIFO queue. When the first sort queue is empty, I/Os are transferred to it from the FIFO queue. If the first sort queue saturates before the FIFO queue has transferred all I/Os, then the FIFO queue transfers I/Os to the second sort queue when the second sort queue is empty and so on.
Assume there are nine I/O requests D1 to D9 (numbered in the order they were received) issued by the OS to the RAID controller. Consider a case where D1 is to be issued to disk location 100, D2 to location 200, D3 to location 9000, D4 to location 9100, D5 to location 700, D6 to location 710, D7 to location 720, D8 to location 750 and D9 to location 770. After issuing D1 and D2 to locations 100 and 200 respectively, the sort queue will not process D3 and D4 until D5 to D9 have been issued because D5 to D9 have been sorted ahead of D3 and D4 based on their issue location to disk. This will result in a severe delay in processing I/O requests D3 and D4 (since they were sorted to locations 9000 and 9100 respectively), even though they were received prior to I/O requests D5 through D9. If further I/Os, with issue locations prior to 9000 and 9100 are received, then D3 and D4 issue latency will increase significantly.
One aspect of the invention employs n sort queues in conjunction with a FIFO queue to overcome the processing delays mentioned above and to control I/O latency.
The latency control number, the number of sort queues and their depth are determined based on such factors as the frequency of I/O requests, I/O statistics and nature of the applications currently running. The latency control number determines if incoming I/Os need to be re-directed from the current sort queue to the next available sort queue. This number typically depends on the frequency of incoming I/Os. For example, if the I/O rate is extremely high, it is likely that some I/Os are being continuously sorted ahead of existing I/Os. In this case if the latency control number is exceeded (or if the queue depth is exceeded), the queue enters the saturated state and incoming I/O requests will be re-directed to the next sort queue (or the FIFO queue if all the sort queues are saturated).
It should be noted that parameters such as the number of queues, the queue depth and the latency control number or the method to determine these can be implemented in various forms in different embodiments of the invention by those skilled in the art without departing from the spirit and scope of the invention. It should also be noted that the invention is a combination of at least one sort queue in conjunction with a FIFO queue to control I/O latency and improve I/O performance by minimizing the disk's head movement and maintaining a fair algorithm. The terms storage device, hard disk drive or disk drive are used interchangeably throughout. The terms I/Os, incoming I/Os, incoming I/O requests and I/O requests refer to read or write requests received from the OS that are to be issued to a disk in the array after being sorted by a sort queue (whence they are referred to as sorted I/Os). Although this invention is directed towards improving I/O performance for disk drives controlled by a RAID controller, this invention can be implemented for any storage device that writes based on location.
As shown in
It is possible that the FIFO queue is never empty if the incoming I/O rate is extremely high. In that case, the FIFO queue will continue receiving incoming I/Os and transferring the I/Os to the sort queues when they become available. This methodology maintains a fair algorithm while minimizing disk head movement.
An exemplary method employing the features of the invention proceeds along the following steps as shown in the flowchart of
When an incoming I/O request is received, it is first determined whether all sort queues are saturated in step 1001. A sort queue is saturated if its queue depth has been exceeded or the latency control number has been exceeded. Once a queue is saturated it will not accept any more I/Os till all the stored I/Os have been issued to disk.
If all sort queues are not saturated, then incoming I/O requests are directed to the next available sort queue in step 1002.
If all sort queues are saturated, incoming I/O requests are directed to the FIFO queue in step 1003.
The FIFO queue periodically checks to see if a sort queue is available (i.e., it is empty) in step 1004.
If there is an empty sort queue available then the FIFO queue transfers its stored I/Os to the empty sort queue in step 1002.
In step 1005, I/Os are issued continuously from the sort queue having the foremost I/Os to the disk in the array.
The following description of a general purpose computer system is provided for completeness. The present invention can be implemented in hardware, or as a combination of software and hardware. Consequently, the invention may be implemented in the environment of a computer system or other processing system. An example of such a computer system 1100 is shown in
Computer system 1100 also includes a main memory 1105, preferably random access memory (RAM), and may also include a secondary memory 1110. The secondary memory 1110 may include, for example, a hard disk drive 1112, and/or a RAID array 1116, and/or a removable storage drive 1114, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage drive 1114 reads from and/or writes to a removable storage unit 1118 in a well known manner. Removable storage unit 1118, represents a floppy disk, magnetic tape, optical disk, etc. As will be appreciated, the removable storage unit 1118 includes a computer usable storage medium having stored therein computer software and/or data.
In alternative implementations, secondary memory 1110 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 1100. Such means may include, for example, a removable storage unit 1122 and an interface 1120. Examples of such means may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 1122 and interfaces 1120 which allow software and data to be transferred from the removable storage unit 1122 to computer system 1100.
Computer system 1100 may also include a communications interface 1124. Communications interface 1124 allows software and data to be transferred between computer system 1100 and external devices. Examples of communications interface 1124 may include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, etc. Software and data transferred via communications interface 1124 are in the form of signals 1128 which may be electronic, electromagnetic, optical or other signals capable of being received by communications interface 1124. These signals 1128 are provided to communications interface 1124 via a communications path 1126. Communications path 1126 carries signals 1128 and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels.
The terms “computer program medium” and “computer usable medium” are used herein to generally refer to media such as removable storage drive 1114, a hard disk installed in hard disk drive 1112, and signals 1128. These computer program products are means for providing software to computer system 1100.
Computer programs (also called computer control logic) are stored in main memory 1108 and/or secondary memory 1110. Computer programs may also be received via communications interface 1124. Such computer programs, when executed, enable the computer system 1100 to implement the present invention as discussed herein. In particular, the computer programs, when executed, enable the processor 1104 to implement the processes of the present invention. Where the invention is implemented using software, the software may be stored in a computer program product and loaded into computer system 1100 using raid array 1116, removable storage drive 1114, hard drive 1112 or communications interface 1124.
In another embodiment, features of the invention are implemented primarily in hardware using, for example, hardware components such as Application Specific Integrated Circuits (ASICs) and gate arrays. Implementation of a hardware state machine so as to perform the functions described herein will also be apparent to persons skilled in the relevant art(s).
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the invention.
The present invention has been described above with the aid of functional building blocks and method steps illustrating the performance of specified functions and relationships thereof. The boundaries of these functional building blocks and method steps have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Any such alternate boundaries are thus within the scope and spirit of the claimed invention. One skilled in the art will recognize that these functional building blocks can be implemented by discrete components, application specific integrated circuits, processors executing appropriate software and the like or any combination thereof. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.