There are a wide variety of different types of computer storage devices. As used herein, a “storage device” is a device used to store data in a computing system, and can include devices such as magnetic storage devices (e.g., hard disk drives), solid state storage devices (e.g., solid state drives, block addressable flash memory on a system bus, and word addressable solid state memory such as phase change memory), smart cards, floppy drives, optical drives, etc. A storage device may be a logical drive, which can be spread across multiple physical drives and/or take up only a portion of a physical drive.
Currently storage devices are mostly considered homogenous by host computer components, including file systems. Subject to a few exceptions (e.g., in some situations where two storage devices appear as one), storage areas are considered equal by host computer components outside the storage devices themselves. Some storage devices are actually an amalgamation of multiple types of storage, such as a hard disk with an embedded solid state cache. In such a hybrid device, the solid state cache can make up a very small portion of the total space, and a file system can determine which files to place in the cache.
Whatever the advantages of previous data storage tools and techniques, they have neither recognized the tools and techniques for formatting data storage according to data classification described and claimed herein, nor the advantages produced by such tools and techniques.
In one embodiment, the tools and techniques can include classifying a set of data into a data level of multiple possible data levels. As used herein, a data level is a class into which data sets can be classified based on one or more criteria that indicate an expected importance and/or frequency of use of the data in the sets. Thus, different data levels can represent different levels of expected importance and/or expected frequency of use of data. Examples of different criteria are discussed below, but the term data level is not limited to any particular criteria or combination of criteria. Additionally, an indicator of the data level for the set of data can be transmitted to a storage device. Data level indicator is used herein broadly to refer to information that indicates a data level. Indicators can take numerous different forms. For example, an indicator could be a number corresponding to a pre-defined data level, a number representing historical frequency with which a set of data has been accessed, a single bit representing a high or low data level, a number representing an importance level assigned by a user, etc. In response to receiving the indicator, a storage area in the device can be formatted to store data at a storage quality level. As used herein, a storage quality level represents quality of storage in terms of speed, reliability, and/or resiliency. Thus, one quality level of storage differs from another different level by having a different expected speed (such as, for example, by having a different historical average access speed for that type of storage), a different expected reliability (such as, for example, by having a different historical average failure rate for that type of storage), and/or a different expected resiliency. Formatting refers to formatting a storage area to store data at one of multiple different quality levels. For example, a solid state storage area may be programmed to store a set of data in a region formatted in a specified cell level configuration, such as a single level cell (SLC) configuration or any of multiple multi level cell (MLC) configurations (2 bit per cell MLC, 3 bit per cell MLC, etc.). The formatting may include reformatting from one cell level configuration to another. The set of data can be stored in the storage area at the storage quality level.
In another embodiment of the tools and techniques, a set of data can be classified into a data level of multiple possible data levels, such as by a processor outside a storage device. If the data level is one level, then an instruction to format a first storage area on the storage device according to a first storage quality level, and to store the set of data at the first quality level in the first storage area can be sent to the device. If the data level is another level, then an instruction to format a second storage area on the storage device according to a second quality level that is different from the first quality level, and to store the set of data at the first quality level in the first storage area can be sent to the device.
In yet another embodiment of the tools and techniques, a set of data can be classified into a data level of multiple possible data levels at a computing component outside a solid state storage device. If the data level is one level, the storage device can be programmed by the computing component to store the set of data at a first cell level configuration, such as by the computing component sending on or more instructions to the storage device. If the data level is another level, the storage device can be programmed by the computing component to store the set of data at another cell level configuration.
This Summary is provided to introduce a selection of concepts in a simplified form. The concepts are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Similarly, the invention is not limited to implementations that address the particular techniques, tools, environments, disadvantages, or advantages discussed in the Background, the Detailed Description, or the attached drawings.
Embodiments described herein are directed to techniques and tools for improved formatting of data storage according to data classification. Such improvements may result from the use of various techniques and tools separately or in combination.
Such techniques and tools may include classifying data at different data levels that correspond to different storage quality levels, and then storing the data in different ways for the different quality levels. The classification can be done outside the storage device, such as by a computing component outside the device (e.g., an operating system or storage stack component) being executed by one or more processors outside the device. Data can be classified into a data level and a device can be programmed to store the data at a storage quality level corresponding to the data level. For example, an indicator of the data level can be transmitted to a storage device. In response, the storage device can format a storage area in the device to store data at a storage quality level corresponding to the data level, and the set of data can be stored in the area. For example, the storage quality level could be a cell level configuration (SLC, two bit per cell MLC, etc.).
Accordingly, one or more substantial benefits can be realized from the data classification and storage tools and techniques described herein. For example, data may be stored in ways that are appropriate for that class of data. For example, frequently accessed data may be stored at a storage quality level with fast retrieval times, important data may be stored at a storage quality level that has a high reliability, etc. Additionally, data that is not frequently accessed and/or unimportant may be stored at a storage quality level that is more efficient in terms of storage space but is slower or less reliable, thereby freeing up faster and/or more reliable storage for other data.
The subject matter defined in the appended claims is not necessarily limited to the benefits described herein. A particular implementation of the invention may provide all, some, or none of the benefits described herein. Although operations for the various techniques are described herein in a particular, sequential order for the sake of presentation, it should be understood that this manner of description encompasses rearrangements in the order of operations, unless a particular ordering is required. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Techniques described herein with reference to flowcharts may be used with one or more of the systems described herein and/or with one or more other systems. Moreover, for the sake of simplicity, flowcharts may not show the various ways in which particular techniques can be used in conjunction with other techniques.
The computing environment (100) is not intended to suggest any limitation as to scope of use or functionality of the invention, as the present invention may be implemented in diverse general-purpose or special-purpose computing environments.
With reference to
Although the various blocks of
A computing environment (100) may have additional features. In
The storage (140) may be removable or non-removable, and may include computer-readable storage media such as magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment (100). The storage (140) stores instructions for the software (180). The storage (140) can include one or more storage devices, which can each include at least one storage device processing unit and storage device memory. The processing unit can execute computer-executable instructions (e.g., firmware stored in the storage device memory), typically to perform storage-related operations.
The input device(s) (150) may be a touch input device such as a keyboard, mouse, pen, or trackball; a voice input device; a scanning device; a network adapter; a CD/DVD reader; or another device that provides input to the computing environment (100). The output device(s) (160) may be a display, printer, speaker, CD/DVD-writer, network adapter, or another device that provides output from the computing environment (100).
The communication connection(s) (170) enable communication over a communication medium to another computing entity. Thus, the computing environment (100) may operate in a networked environment using logical connections to one or more remote computing devices, such as a personal computer, a server, a router, a network PC, a peer device or another common network node. The communication medium conveys information such as data or computer-executable instructions or requests in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
The tools and techniques can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment (100), computer-readable media include memory (120), storage (140), and combinations of the above.
The tools and techniques can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment. In a distributed computing environment, program modules may be located in both local and remote computer storage media.
For the sake of presentation, the detailed description uses terms like “determine,” “choose,” “adjust,” and “operate” to describe computer operations in a computing environment. These and other similar terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being, unless performance of an act by a human being (such as a “user”) is explicitly noted. The actual computer operations corresponding to these terms vary depending on the implementation.
The storage stack (230) can include a classification component (240). For data that is being passed down the storage stack (230) to be stored, the classification component (240) can be executed to classify the data into appropriate data levels that correspond to data storage quality levels. The classification component (240) can also program storage devices to store data at the corresponding storage quality levels, such as by passing information about the classification of the data farther down the storage stack. The classification component may also route data to appropriate storage devices. The classification component (240) may be located just above one or more device drivers (250 and 251) in the stack. Alternatively, the classification component (240) may be located in one or more other locations outside the storage devices in the system, such as farther up the storage stack, elsewhere inside or outside the operating system (220), or even as a separate device between the main host (205) and one or more storage devices.
In one example illustrated in
An example of operation of the environment (200) will now be described. The classification component can classify sets of data into three data levels corresponding to the three storage regions (256, 268, and 270). For example, each set of data that is classified could be a file or even a portion of a file (e.g., a logical block or a set of logical blocks). Various different classification schemes can be used to determine appropriate data levels (and therefore appropriate data quality storage levels) for sets of data. For example, the classification component (240) can track the frequency with which sets of data are accessed. Data that is expected to be accessed more frequently could be classified into levels that correspond to storage quality levels with faster access times. As an example, the frequency of access could be tracked with counters in file metadata, with tables, or in some other manner. The past access frequency could be used to indicate the frequency with which sets of data are expected to be accessed in the future.
Additionally, the classification scheme could consider other factors, such as whether the file was authored by a particular user or on a particular machine, and/or whether the file was a particular file type. Authorship may be determined by examining metadata for a file. For example, some file types may be more likely to be important and/or more likely to be frequently accessed. Accordingly, some file types may always be at a higher data level (corresponding to a higher data storage quality level), some file types may always be at a lower data level (corresponding to a lower data storage quality level), and other file types may be assigned to a data level based on the frequency with which they have been used.
As another example, user input, an application (210), and/or an operating system (220) could identify data levels to which particular files or portions of files are classified. However, it may be useful to limit the effect of such classifications by particular applications because such applications may be programmed to promote their own performance at the expense of other applications. For example, an application may be allowed to identify the relative importance or expected frequency of use for different sets of data that are used by that application. As one specific example, an email program may request that emails received in the past two days be assigned to higher data levels than older emails because the newer emails are more likely to be important and more likely to be accessed frequently. Such identification by an application may be particularly useful for applications that handle large amounts of data, such as email routing servers and database servers.
As yet another example of a factor that can be used in classification, data that has been backed up could be more likely to be classified to a higher data level because backing up sets of data could indicate that those data sets are important.
Referring still to
If the classification component (240) classifies a data set to second or third data levels corresponding to the second storage region (268) and the third storage region (270), respectively, then the classification component (240) can program the second storage device (262) to store each set of classified data (264) in the appropriate storage region (268 or 270). For example, the classification component (240) can pass data (264) to the second device driver (251) along with a level indicator (266), thereby instructing the second device driver (251) to have the data stored in the data storage region (268 or 270) corresponding to the indicated level in the second storage device (262). The second device driver (251) can pass the data (264) and the level indicator (266) on to the second storage device (262).
The level indicator (266) may indicate a level and indicate a data set to which the level applies. For example, the level indicator (266) may identify a level (such as by including a level number), and may also identify a group of one or more data blocks, such as by listing one or more logical block addresses (LBA's). For one or more contiguous logical block addresses, the level indicator (266) may indicate a logical block address range, or ranges, to which it applies.
Upon receiving a level indicator (266), the second storage device (262) can store the indicated data set in the indicated storage region (268 or 270). The storage regions (268 and 270) may or may not be located in separate continuous physical areas of the second storage device (262). Indeed, the regions (268 and 270) but may be interspersed in a way that can be determined by the second storage device (262). Storing the data set can include formatting an area of the appropriate storage region (268 or 270) for storage of the indicated data set. This may include re-formatting an area of one region (268 or 270) so that it becomes part of another region (268 or 270).
For example, if the third storage region (270) is full and a level indicator (266) indicates that another data set (264) is to be stored in the third storage region, then the second storage device (262) can reformat an area of the second storage region (268) to have the characteristics of the third storage region (270), so that the area becomes part of the third storage region (270). For example, if the second storage region (268) is MLC solid state storage and the third storage region (270) is SLC solid state storage, then some MLC storage can be reformatted to be SLC storage. Of course, this reformatting can decrease the amount of data that can be stored in the second storage device (262), but it can improve performance when accessing the new data to be stored in the third storage region (270).
As another example, if some data stored in the third storage region (270) is not used for a long period of time, it may be re-classified to the second data level corresponding to the second storage region (268). In that situation, the classification component (240) can pass a level indicator (266) to the second device driver (251), indicating one or more logical block addresses and the data level to which they are now classified (the second data level). In response, the second storage device (262) can reformat some SLC storage to be MLC storage, and can store the indicated data set in the MLC storage. This change may decrease performance when accessing the indicated data set (because accessing MLC storage is typically slower than accessing SLC storage), but the change may free up additional storage space on the second storage device (262) (because MLC is compressed as compared to SLC).
While this specific example of a data classification and storage environment has been described, many variations and alternative data classification and storage environments can be implemented. For example, the first storage device (252) and the first device driver (250) could be omitted, or the classification component (240) could be configured to send classified data only to the second storage device (262). As another example, additional storage devices and/or storage regions could be included, such as by including storage regions in the second device (262) that are respectively formatted for SLC, two bit per cell MLC, three bit per cell MLC, etc.
As yet another example of a variation, referring now to
In addition to or instead of such tracking, the classification component (322) can track other information that is not particular to data sets stored at the data site (320). For example, users may opt into providing statistical information to the data site (320) over the network, and that statistical information could be used in classification. For example, the statistical information could indicate how frequently particular file types are accessed or backed up, what applications utilize those file types, etc. Such statistical information could be used to classify sets of data, at least initially.
Another machine (330) may be configured to access the network (310) to receive classification data from the data site (320). Such classification data may include particular assignments of particular data sets to particular data levels, statistical usage information, assignments of file types to default initial data levels (which may change with actual usage of the files on the machine (330)), etc. The machine (330) can include a classification component (332), which can use the classification information from the data site (320) to program a storage device (333) to store particular data sets to particular storage regions (334 or 336) having particular storage quality levels (e.g., SLC formatted regions, one bit per cell MLC formatted regions, etc.), as discussed above with reference to the second storage device (262) in
Referring to
An indicator of the data level for the set of data can be transmitted (420) to a storage device. In response to receiving that indicator, a storage area in the device can be formatted (430) to store data at a storage quality level, which can correspond to the data storage level. This formatting can include reformatting the storage area from a first quality level to a second quality level. The two quality levels can have different characteristics. For example, the second quality level can have different expected access speeds than the first quality level. As another example, the second quality level can have different expected reliability than the first quality level. As one specific example, formatting the storage area can include switching the storage area between SLC and MLC configurations. Formatting the storage area may change a ratio between amounts of different storage quality levels in the storage device. For example, the formatting may change a storage area between different cell level configurations (e.g., between MLC and SLC storage or between different levels of MLC storage), so that a ratio of bytes of one cell level configuration in the device to another cell level configuration in the device changes. Additionally, the formatting may change how much data storage space is available in the device. For example, switching a storage area from SLC storage to MLC storage can increase the data storage space available in a solid state storage device.
Moreover, the set of data can be stored (440) in the storage region at the storage quality level, such as in MLC storage, SLC storage, or some other storage configuration.
Referring now to
Referring now to
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.