This invention relates to data storage and more particularly to the I/O workload generated by applications such as video surveillance.
Many currently-provided data storage systems receive write requests directed to random or disparate disk locations. One reason may be because write requests are received by a storage system controller from multiple sources. Furthermore, file systems, such as Microsoft New Technology File System (NTFS), themselves may route incoming write requests to random disk addresses of a storage system, typically sending incoming write requests to whichever free area of the storage system it chooses. The performance of a storage system would likely improve if data were written to it in a more orderly manner.
Aspects of the present invention provide methods, computer media encoding instructions, and systems for receiving write requests directed to logical block addresses and writing the write requests to sequential disk block addresses in a storage system. Aspects of the present invention include overprovisioning a storage system to include an increment of additional storage space such that it is more likely a large enough sequential block of storage will be available to accommodate incoming write requests
Some embodiments of the present invention provide methods, computer media encoding instructions, and systems for receiving write requests directed to logical block addresses and writing the write requests to sequential disk block addresses in a storage system. Some embodiments further include overprovisioning a storage system to include an increment of additional storage space such that it is more likely a large enough sequential block of storage will be available to accommodate incoming write requests. Certain details are set forth below to provide a sufficient understanding of embodiments of the invention. However, it will be clear to one skilled in the art that embodiments of the invention may be practiced without various of these particular details. In some instances, well-known circuits, control signals, timing protocols, storage devices, and software operations have not been shown in detail in order to avoid unnecessarily obscuring the described embodiments of the invention.
A schematic diagram describing a method according to an embodiment of the present invention is shown in
When a read request is received, the read request specifies a logical block address. Since the data was not stored at the logical block address, but instead at a disk block address determined as described above, reference is made to the logical block address associations to determine which disk block address is associated with the logical block address. The read data is then retrieved from the associated disk block address.
A free disk block indicator may be stored and updated 160 as data is written to disk blocks. For example, the indicator may be a map of disk block addresses associating a particular value with the addresses that are free. By referencing this indication, a free disk block can be identified and selected. In some embodiments, a pointer is set at the first identified free disk block address. A second free disk block address for a subsequent write can be identified by incrementing 170 the pointer to the next available free disk block address. The next available disk block address may be a sequential disk block address. However, if the sequential disk block address is unavailable, the pointer is set to the next available free disk block address in some embodiments.
Each incoming write request in may be directed to a disk block address determined by identifying a next free disk block address and directing a write request to the next available free disk block address. If the next sequential disk block address is not free, embodiments of the present invention direct the incoming write request to the next available disk block address, and sequential operation is continued for subsequent received write requests when possible.
Generally, write requests received will be of a fixed size corresponding to a size of the disk blocks in a storage volume. Any fixed size may generally be used, such as 64 kilobytes, 32 kilobytes, or any other size. However, some write requests may be received which are longer or shorter than the fixed size. Larger write requests may be reformatted as a sequence of requests having the fixed size with header and/or trailer information added as needed. Write requests smaller than the fixed size can be dealt with in several ways. In one embodiment, a write request smaller than the fixed size is received (for example, a request to write to 40 k of the 64 k disk block). The write request references a logical block address. The lba_map, or other indicator, may be consulted to determine where the data corresponding to the received logical block address is stored, and the write request written to that disk block address, instead of the next free disk block address. While this disrupts sequential writes, it preserves the remaining data in the disk block beyond the portion written. In another embodiment, the write request shorter than the block size is written to the next free disk block address, however the remaining data needed to fill the block is read from the block address where it was previously stored, as determined by the lba_map or other indicator. This introduces a non-sequential read, but also preserves the data. Further, by writing to the next free disk block address, sequential operations can resume once the data in that block is written out or no longer needed.
An example of an embodiment of the present invention is now described with reference to
So, for example, if a write request arrives directed to logical block address 50 and the free_space_bitmap and lba_map contain the values shown in
When the write pointer reaches an end of a list of disk block addresses, it is incremented again to the start of the list, in some embodiments. Accordingly, after writing a write request directed to logical block address 50 to disk block address 123, the tables are updated as shown in
In some embodiments, data stored in the disk block address is no longer needed when it is deleted by a user or another process. Additionally or instead, in some embodiments, data stored in the disk block address is no longer needed when it has been stored for a predetermined time, such as a retention time, determined by the storage system or an end user or process (for example, storing video data for one month).
To ensure data consistency after a system crash or other interruption, in some embodiments, the free_space indicator and associations between logical disk block addresses and disk block addresses (free_space_bitmap and lba_map in
So, for example, after 1000 write requests are received, the delta map 410 may be written to non-volatile storage. Then, after another predetermined number of write requests, the entire free_space indicator and logical block associations, such as free_space_bitmap 210 and lba_map 220 in one embodiment, can be written to non-volatile storage. In one embodiment, free_space_bitmap and lba_map are written to non-volatile storage after 1000 writes of the delta map 410, or 1000*1000=1,000,000 write requests. Following a system crash or interruption requiring reconstruction of the free space indicator and logical block association indicator, a stored version of the indicators are retrieved. Using the exemplary numbers provided above, in one embodiment, these stored indicators are accurate to the last 1 million writes. The stored indicators are updated with the stored delta maps. In the embodiment described above, these delta maps provide data for the most recent n*1000 user writes. In some embodiments only the most recent writes (the most recent 1000 in the example described) are reconstructed by scanning metadata on the disk.
A disk block address may be reused once it is indicated as free, in some embodiments. As described above, a disk block address may be indicated as free when its corresponding logical block address is written to again, such as disk block address 66 in
Recall that, as described above, newly freed disk block addresses are noted in the free_space_bitmap once their corresponding logical block addresses are again written to. This occurs following an adjustment time after the data in the disk block address is actually no longer needed, and may be deallocated by a file system. Accordingly, data in a disk block address may be no longer needed, and may be deallocated, after a predetermined retention time t in one embodiment. However, the status may not be updated in a free space indicator until an adjustment time, x, has elapsed in one embodiment. Accordingly, some embodiments of the present invention provide an overprovisioned amount of storage to take this adjustment time into account and increase the time when sequential disk block addresses are available. By providing an additional amount of disk block addresses beyond what is dictated by the system requirements of storing a predetermined amount of data for a predetermined time, the likelihood of having sequential disk block addresses to write to is increased. In the example described above, if there is sufficient overprovisioned area to continue storing received data for a time t+x, at that time sequential disk block addresses will become available through the recognition of deallocated space. Accordingly, in some embodiments, an overprovisioned amount of storage is provided. The amount of overprovisioning will be determined according to the amount of data expected in the time t+x, in some embodiments. The amount of overprovisioning may also be determined by considering disk block addresses used to store persistent data, such as file metadata, for example, which may not be deleted, and other space taken to store files that will not become available after a retention time, such as administrative or other non-deleted files.
Embodiments of methods described above may be implemented by a processor-based system including a computer readable media encoded with instructions that, when executed, cause the processor-based system to perform all or portions of the methods described above. An embodiment of a processor-based system 500 according to the present invention is shown in
An embodiment of a system 600 according to the present invention is shown in
The disk drive storage may be prepared in accordance with any storage method known in the art, including one or more Redundent Array of Independent Disks (RAID) levels, such as RAID 5 or 6 in some embodiments. The storage subsystem 610 may itself serve as non-volatile storage for storing versions of the free block indicator and logical block associations described above, in some embodiments. In other embodiments a separate back-up storage is provided to store these indicators and associations for use in the event of a system interruption. The storage subsystem 610 contains storage having a number of disk block addresses, described above. Write and read requests designed for the storage subsystem 610, however, specify logical block addresses, as described above.
A storage subsystem controller 620 is coupled to the storage subsystem 610 through any available communication medium, wired or wireless, and passes write and/or read requests to the storage subsystem 610 in accordance with embodiments of methods according to the present invention. For example, the storage subsystem controller 620 is operable to convert non-sequential write requests into sequential write operations within the storage subsystem 610 in some embodiments. The storage subsystem controller 620 may include the processor 510 and computer readable medium 520 shown in
The application receives data from the cameras 630 and 635 in some embodiments and develops appropriate write requests. In some embodiments the application 650 may manipulate the data received from cameras 630, 635, perform data compression or other functionality. The data and/or write requests are passed to the file system 640 in some embodiments, and the file system, such as Microsoft NTFS in one embodiment, determines a logical block address for each write request. The logical block address and write request is passed to the storage subsystem controller 620 which can determine an appropriate disk block address of the storage subsystem 610 in accordance with embodiments of the invention.
Accordingly, embodiments of the present invention may increase performance of a data storage system or program. Some embodiments of the present invention may be used to improve performance of a video surveillance system or application, including but not limited to, Intransa's StorStac for video surveillance application. Video surveillance applications typically require many writes, but comparatively fewer reads, and therefore are benefited by embodiments described that speed the storage of write requests. Other applications have similar properties, such as data archiving applications, and those may be similarly improved using embodiments of the present invention. Some embodiments of the present invention slow down read requests, however, because sequential read operations are turned into random reads. In some embodiments, read requests may be slowed down by as much as five times. However, this slow down does not significantly hamper some embodiments of applications having relatively few read operations, such as video surveillance, for example.
Measurements taken using embodiments of the present invention to convert random writes to sequential writes indicate improved performance can be achieved relative to a system performing random writes. In one measured embodiment, performance of one configuration of a RAID5 subsystem improved from a random write throughput of 17 MB/sec to a sequential write throughput of 60 MB/s. Another configuration of a RAID5 subsystem exhibited improvement from a random write throughput of 7 MB/s to a sequential write throughput of 70 MB/s. These measured performance results are intended to demonstrate advantages of some embodiments of the present invention. Other factors including the number of read requests, system loading, and the like, may slow the system or decrease the advantages of the present invention. The measured data is representative only of one set of measurements under otherwise fixed conditions and is meant to highlight some conditions under which improvement can be achieved using embodiments of the present invention.
Writing to sequential disk block addresses usually achieves better performance, in that the write requests can be completed faster. One reason for this, is that writing to disparate locations requires more time for the disk to seek and spin to that location. Other reasons for improved performance may be associated with the particular disk formatting used. For example, in RAID5, a sequential write avoids two fill-in reads necessary for parity computation, in RAID6, a sequential write avoids three fill-in reads necessary for parity computation.
From the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. Embodiments of the present invention described herein may be implemented in hardware, software, or combinations thereof.
This application is a continuation of U.S. application Ser. No. 14/924,287 filed Oct. 27, 2015, titled METHOD FOR ACHIEVING SEQUENTIAL I/O PERFORMANCE FROM A RANDOM WORKLOAD, issued as U.S. Pat. No. 9,588,687 on Mar. 7, 2017, which is a continuation of U.S. application Ser. No. 14/480,943 filed Sep. 9, 2014, titled METHOD FOR ACHIEVING SEQUENTIAL I/O PERFORMANCE FROM A RANDOM WORKLOAD, issued as U.S. Pat. No. 9,170,930 on Oct. 17, 2015, which is a continuation of U.S. application Ser. No. 13/906,699 filed May 31, 2013, titled METHOD FOR ACHIEVING SEQUENTIAL I/O PERFORMANCE FROM A RANDOM WORKLOAD, issued as U.S. Pat. No. 8,832,405 on Sep. 9, 2014, which is a continuation of U.S. application Ser. No. 12/057,120 filed Mar. 27, 2008, titled METHOD FOR ACHIEVING SEQUENTIAL I/O PERFORMANCE FROM A RANDOM WORKLOAD, issued as U.S. Pat. No. 8,473,707 on Jun. 25, 2013, all of which are incorporated herein in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
8036696 | Hall | Oct 2011 | B2 |
20050002420 | Jeanne | Jan 2005 | A1 |
20050180732 | Ono | Aug 2005 | A1 |
20150016250 | Flinta | Jan 2015 | A1 |
Number | Date | Country | |
---|---|---|---|
Parent | 14924287 | Oct 2015 | US |
Child | 15443763 | US | |
Parent | 14480943 | Sep 2014 | US |
Child | 14924287 | US | |
Parent | 13906699 | May 2013 | US |
Child | 14480943 | US | |
Parent | 12057120 | Mar 2008 | US |
Child | 13906699 | US |