The present invention relates to computer storage systems and to methods and apparatus for placement of multiple instances of data on disk storage to simultaneously benefit I/O profiles that are different, such as read and write access.
A significant job of a file system is to place data on a storage medium, such as a disk storage device. Where the data is written (placed on the disk), and when and how it is accessed, can have a significant effect on performance. For example, random reads of 4 KB blocks on a disk may result in a bandwidth of 512 KB/sec, whereas reading large 128 KB blocks sequentially can increase the bandwidth to 5 MB/sec, a factor of 10 greater.
This suggests that if one were to optimize for write access, a file system would write all new data in sequential order, treating the disk as a large queue. Subsequent read access, however, might suffer from a large amount of random I/O.
Conversely, a read friendly data placement might place data that is logically adjacent in a file physically adjacent on a disk regardless of the order in which the data is written so that subsequent reads of the data will be sequential; many defragmenters do this, as do file systems. This comes at the expense of write performance, as a write to a file touches other internal file system data structures (such as super blocks, modes, indirection tables), with the result that a read optimized data placement tends to randomize write access.
File systems are most often forced to choose, at design time, the I/O profile of greatest interest, which then determines a fixed data location strategy. Some file systems select for read performance, others for writes, and others still for small file (high density) vs. large file (low density) access.
The allocation constraints of traditional prior art file systems are due in large part to their ability to place only one instance of data on the disk storage. Existing file systems do not have the capability to place and manage multiple copies of data on disks, in different orders, for the benefit of different I/O profiles and use cases.
Apparatus and methods are provided in accordance with the present invention wherein multiple instances of data can be placed at different locations on disk storage, and in different data orders (sequences), for the benefit of multiple input/output and/or other use profiles, and/or to provide data security. The data placement is performed in conjunction with an index for mapping a unique data identifier to multiple locations, wherein the data identifier does not change based on the locations.
In accordance with one embodiment, a storage system is provided comprising an interface component for locating data for storage on disk storage, wherein the interface component references each data by a globally unique identifier (GUID) and the GUID does not change based on where the data are stored on the disk storage; and a mapping index that allows, for a single GUID, multiple pointers to different locations on the disk storage for multiple instances of data.
In one embodiment, the interface comprises a file-based storage system.
In one embodiment, the interface comprises a block storage manager.
In one embodiment, a location strategizer is provided for determining multiple locations on the disk storage for multiple instances of the data for different purposes.
In one embodiment, the multiple purposes include read optimization and write optimization.
In one embodiment, the multiple purposes include read optimization, write optimization, and data security.
In one embodiment, the location strategizer dynamically determines the multiple locations as the data is referenced by the interface component.
In one embodiment, the mapping index is implemented by programmable logic.
In one embodiment, the mapping index is implemented by executable computer program instructions.
In another embodiment of the invention a storage system is provided comprising programmable logic configured to implement a method of locating data on disk storage, or a computer medium containing executable program instructions for executing the method, wherein the method comprises:
In one embodiment, the locating step comprises determining the physical locations for storing instances of the data based on one or more of read access, write access, and data security.
In one embodiment, the location step is performed dynamically during storage system activity.
In one embodiment, the method includes locating multiple instances for read optimization and write optimization.
In one embodiment, the method includes locating multiple instances on different disks for read optimization and write optimization.
In one embodiment, the method includes locating multiple instances on different disks for read optimization, write optimization and data security.
In one embodiment, the index includes a data structure for each data containing pointers to physical block addresses where the data instances are stored on the disk storage.
In one embodiment, the GUID comprises a hash of the data content, preferably a cryptographic hash or collision resistant hash of the data content.
In one embodiment, the index maps to physical locations on a plurality of disks.
In one embodiment, the disks are of different sizes and/or access speeds.
In one embodiment, the data comprises data and/or metadata, and collections of data have their own GUID derived from the contents of the collection such that a change to one or more data of the collection changes the collection GUID.
In one embodiment, the disk storage includes multiple disks and the method includes locating multiple instances for read optimization and write optimization on one or more of the disks.
In one embodiment, the disks are of different sizes and/or access speeds.
In another embodiment of the invention, a computing environment is provided for locating data of a storage system to disk storage, a data structure comprising, for each data of the storage system:
The invention will be more fully understood by reference to the following detailed description of various embodiments, along with the following drawings wherein:
According to one embodiment of the invention, a data placement method and apparatus are provided for use with a storage system that stores data on disk storage. The storage system may comprise for example a file system, a block storage device, or other storage system for storing data. Data written to such storage systems typically comprises many small (e.g., 4 KB) pieces of data, herein referred to interchangeably as data or data elements, which data may be of the same or variable sizes.
As used herein, data means an opaque collection of data, e.g., any sequence of symbols, (typically denoted “0” and “1”) that can be input to a computer, stored and processed there, or transmitted to another computer. As used herein, data includes metadata, a description of other data.
The storage system uses an identifier or name to reference each data element in storage. The name is a GUID (globally unique ID), such as a hash of the data content, preferably a cryptographic hash or collision resistant hash of the data content; other naming conventions are possible, as long as each data element has a unique name within the storage system. In an alternative embodiment, a central GUID server generates the names from some distinguishing aspect of the data. Data names are usually fixed length binary strings intended for use by programs, as opposed to humans. In the present example, the data name is a content-derived (derived from the data content) and globally-unique identifier referred to as a GUID.
An index (sometimes referred to as a dictionary or catalog) of all the data is needed by the file system in order to access (locate) each data element. Each record in the index may contain the data name, location and other information. According to the present invention, an index entry for a given GUID can map a single GUID to a plurality of physical locations on disk for storing multiple instances of the one data element. However, as described further below, the GUID does not change based on the locations where the data instances are stored.
After a discussion of the problems inherent in prior art systems, a detailed description of various embodiments of the invention are set forth in conjunction with
The practical effect of a file system designed to store one instance of data is illustrated in
In contrast, in a read-optimized disk storage system (
Thus, the prior art file system optimizes for either writes or reads, resulting in different placement of the data on the disk. When optimized to write, a read will likely require movement of the disk head. Similarly, in a system optimized to read, a write will likely require movement of the disk head.
The mapping index 44 allows, for a single data name (GUID), multiple pointers to multiple instances of the data on the disk storage. A location strategizer 46 determines these multiple locations for different purposes and the physical locations can be dynamically changed over time for different purposes. Because the storage system knows the logical relationships (the “what”) of the objects, the storage system can suggest desired location strategies to the strategizer for determining these multiple locations. Still further, this strategy can be determined dynamically during storage system activity to change the strategy as the anticipated use of the data, condition of the disks, or other system parameters change over time. Thus, location is no longer fixed or limited in time as in the prior art methods of allocation.
In
The prior illustration is just one example of the invention for simultaneously optimizing both read and write access by locating multiple instances of data in different orders at multiple physical locations on the disk storage. These and other examples of a location strategy for disk storage will be apparent to those of ordinary skill in the art.
Various embodiments of the present invention can provide one or more benefits over the prior art methods such as optimizing both read and write performance. For example, in the prior art, disk defragmentation may be utilized for read or write optimization. However, the defragmentation is an expensive process time-wise, and one needs to know beforehand how the data will be used (e.g., read). In contrast, the present invention allows for a dynamic determination of location which permits storing multiple instances of data at different disk locations and within different data sequences at each location.
Furthermore, most prior art storage systems do not allow for the maintenance of two instances of the content of a given file. The reason is this would complicate the storage system data structures and require excessive (costly) storage capacity. In contrast, the present invention provides a simplified method allowing multiple instances of data to be stored without affecting the storage system data structure.
The prior art RAID systems operate by striping data across multiple disks so that if one disk crashes, another instance of the data will survive. Data locations in a RAID system are based on a fixed algorithm rather than the dynamically variable location made possible by pointers (as in the present invention). As a result, RAID requires all of the multiple disks to be of the same type, e.g., size and speed. This would be an undesirable limitation on a user who needs/wants to utilize or has pre-existing disk storage of different types.
The subject matter of the present invention may be implemented as computer executable instructions (software). It may also be implemented in hardware as a series of logic blocks.
One or more components of the present invention may be implemented by a computing apparatus such as that illustrated in
The flowchart and block diagrams contained herein illustrate various examples of an architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flow chart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by a general computer or by special purpose hardware-based systems that perform the specified functions or acts, or by combinations of special purpose hardware and computer instructions.
As used herein, computer-readable media can be any media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media.
As used herein, disk storage can be used for the storage of information such as computer-readable instructions, data structures, program modules, or other data. Disk storage includes any magnetic, solid state or optical disk storage which can be used to store the desired information and which can be accessed by a computer.
As used herein, the term “includes” and its variants are to be read as open-ended terms that mean “includes, but is not limited to.” The term “or” is to be read as “and/or” unless the context clearly dictates otherwise.
It is to be understood that the foregoing description is intended to illustrate and not limit the scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
20010034812 | Ignatius | Oct 2001 | A1 |
20040215754 | Orleth et al. | Oct 2004 | A1 |
20050071436 | Hsu et al. | Mar 2005 | A1 |
20060117323 | Kendall | Jun 2006 | A1 |
20060161635 | Lamkin et al. | Jul 2006 | A1 |
20090204636 | Li et al. | Aug 2009 | A1 |
20090228460 | Martinez et al. | Sep 2009 | A1 |
20090292734 | Miloushev et al. | Nov 2009 | A1 |
20100037161 | Stading et al. | Feb 2010 | A1 |
20100114848 | McKelvie et al. | May 2010 | A1 |
20100250549 | Muller et al. | Sep 2010 | A1 |
20100324999 | Conway et al. | Dec 2010 | A1 |
20110022566 | Beaverson et al. | Jan 2011 | A1 |
20110055494 | Roberts et al. | Mar 2011 | A1 |
20110055621 | Mandagere et al. | Mar 2011 | A1 |
20110196838 | Zunger et al. | Aug 2011 | A1 |
Number | Date | Country |
---|---|---|
WO03014939 | Feb 2003 | WO |
Entry |
---|
International Search Report and Written Opinion in corresponding PCT/US2012/046398 mailed May 2, 2013. |
Lo S-L: “Ivy: a study on replicating data for performance improvement”, HP Labs Technical Report, XX, XX, Dec. 14, 1990 (Dec. 14, 1990), pp. 1-39, XP002265842, the whole document. |
Number | Date | Country | |
---|---|---|---|
20130024615 A1 | Jan 2013 | US |