1. Field of Art
This disclosure pertains in general to accessing media stored in an optical disc storage system, and specifically to a media library of a storage appliance.
2. Description of the Related Art
Because the consequences of data loss can be dire, methods of archiving data for long-term storage have been developed. Traditionally, there have been two choices for permanent storage: either data is kept online or it has been archived. Online data offers the advantages of rapid access in a searchable format. Archived data offers the advantage of being removable, providing longer-term storage, and freeing space on high-cost online storage subsystems, such as hard drives.
One alternative for storing data is to copy data onto tape for archiving. Tape is not designed to provide easy, immediate access to information. It is typically written in a proprietary backup format and can only be searched sequentially. It is designed for the infrequent and unlikely retrieval of backup data when primary storage fails. It is designed for density, not access. Besides the inaccessibility of tape, there is the risk of storing important archives on a medium not intended for permanence. Tape is used for periodically overwriting files, not for preserving valuable fixed content in a permanently etched, unalterable form. Unlike certain types of optical media, tape is not native write-once read-many (WORM) compliant, and tape is susceptible to environmental influences such as magnetic interference. As a result, tape is not well-suited for archiving high-value content.
FIG. (“FIG.”) 1 illustrates a software architecture of a hybrid storage appliance, in accordance with an embodiment.
The figures (“FIGS.”) depict embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
Embodiments disclosed include methods, systems and computer readable media for accessing and compressing data stored in an optical media library. In one embodiment, a simulation layer of a hybrid storage appliance allows one or more libraries of optical media with WORM properties to look like one or more logical block devices with non-WORM characteristics. In another embodiment, data from a user's files is compressed by the media library appliance in chunks in such a way that coarse granularity seeking is possible within a compressed user file. In another embodiment, a media cloud is used by a hybrid storage appliance to seamlessly recover from failures in optical media, library robotics, optical drives and network connections during the creation, recovery, and distribution of data.
Other embodiments provide methods, systems, and computer readable media for tracking optical media with media tags and cartridge manifests within a library. A manifest attached to a media cartridge contains detailed information on each piece of media contained in the cartridge. In addition, each piece of media has an associated media tag that follows the piece of media around inside of the library. The media tag is stored, for example, in flash on the device where the piece of media resides, be it in a cartridge, a robotics sled, or in an optical disc drive.
A simulation layer of a hybrid storage appliance (“HSA”) allows one or more libraries of optical media with WORM properties to appear to act like one or more logical block devices with non-WORM characteristics. In one embodiment, a direct-attached Logical Unit Number storage interface is provided for access to logical units of data on a HSA.
The HSA functions as a data pipeline. One end of the pipe is accessed via client computers and the other end is optical media. In one embodiment, clients write data into the pipeline using the network file server (NFS) or common internet file server (CIFS) file sharing protocols. The network file server daemon (NFSD) and server message block daemon (SMBD) blocks handle the file serving protocols and read/write data from/to a cache file system represented by XFS. The cached data is stored on a hard disk. When files are created and altered XFS notifies a command and control daemon (CCD) of these attempts via a data migration application program interface (DMAPI). CCD can then decide to allow the access, deny the access, or delay the access until needed data is available. As files are created in XFS, CCD monitors the files until the files are no longer being changed. At this point, CCD marks the files as being immutable. Next, CCD adds the immutable files to an in-progress universal disk format (UDF) files system instance with a UDF image creator. The UDF image creator writes immutable files into a UDF file system image that is stored in a staging area. Once the UDF file system image is full, the UDF image creator directs a single board computer daemon (SBCD) to copy the UDF file system image to an optical disc. The SBCD uses robotics to move the appropriate optical disc into a drive and then performs the copy operation. Once data is stored on an optical disc, the copy of the data stored in the cache file system (XFS) can be purged, freeing up space in the cache.
In the future, an NFS or CIFS client may wish to access data which had been purged from the cache file system. When this happens, DMAPI notifies CCD that data that is not in the cache file system needs to be retrieved from an optical disc. CCD will then direct SBCD to load the appropriate optical disc into a drive, read the needed data, and send it back to CCD. CCD then writes the data back into the cache file system, and then informs XFS that the data it needs is again available. XFS then lets NFSD or SMBD return a copy of the data to the requesting client.
The Jukebox manager (JBM) tracks where optical discs reside and whether they are in use or idle. When CCD needs to write to or read from an optical disc, it consults JBM to schedule access to the optical disc. Once JBM grants access, CCD can direct an SBCD to perform whatever access is needed. When the access is complete, JBM marks the involved optical disc as idle and schedules any other accessors waiting for that piece of media.
Also included in some embodiments of the HSA is a logical volume manager (LVM) and/or a redundant array of inexpensive discs (RAID). The ODSS uses LVM and/or RAID to gather physical disc drives and treat them as a larger logical disc drive with protection from loss of data caused by the failure of a single disc drive.
As
In one embodiment, user file contents are compressed as they are written into a Universal Disc Format (“UDF”) archive volume of a media library. A problem presented by file compression for UDF increment generation is that the size of the compressed file is unknown without actually compressing it. To compress a file, the contents must be read, and it is desirable to only read a file's contents once to generate an increment. Thus, in one embodiment, the act of compressing a file's contents puts the compressed data into the increment being generated. Another problem presented by compression is that it is not efficient to uncompress a large mass of data when a user wants to retrieve a small portion of the data from a large archived file. It is desirable to compress data in such a way that coarse granularity seeking is possible within a compressed user file.
Historically, the increment generation process was split into two phases. The first phase gathered metadata for frozen files, built the corresponding UDF metadata into an in-memory tree structure, and repeated these steps until the UDF increment being assembled was full. An increment was allowed to be resized once if a big file did not fit into the remaining space in an increment. Once the increment was full, disk space for an increment (e.g., an adequate number of sectors of disk space) was pre-allocated and the UDF increment was generated by synthesizing the UDF metadata, copying user file data into the increment, and writing the manufactured error correction code data into the increment.
In step 702, with the increment size selected, the address of the File Set Descriptor that follows the compressed user data can be assigned. For example, the address of the File Set Descriptor can be the last two sectors in the increment that are protected by error correction code.
Once the increment size is fixed and space is allocated for the increment file, in step 703, the preamble to the user data is written to the UDF increment file. In one embodiment, the preamble includes the items in
In step 704, the user files are read, compressed, and written to the UDF increment. While compressed files are written, in step 705, the in-memory UDF metadata is updated with the file's location and file directory information. In one embodiment, a compressed chunk directory for each file is created which is written into the UDF metadata. As files are added to an increment, there eventually comes a point where there is not enough room to hold the next file and its metadata. When there is not enough space left in the increment to accommodate the next file and its metadata, in step 706, the UDF metadata is written into the increment.
After the UDF metadata is written to the increment file, in step 707, the trailing UDF information is written. In one embodiment, the trailing UDF information includes the items in
In one embodiment, files are compressed in chunks of a predetermined size, for example, 64 megabytes. In one embodiment, the 64 megabyte chunk is a preferred size because file contents are typically recalled in 64 megabyte chunks; however, it is noted that larger or smaller chunks sizes may be used. Compressing a user file involves reading 64 megabytes (or less) from the file, compressing that chunk into another buffer and then writing the compressed result into the UDF increment. This process is repeated until the file is completely in the increment. If an attempt to compress the chunk results in a chunk that is larger than 64 megabytes, the uncompressed data is written into the increment. Since the ultimate goal is to save sectors on archive media, compressing a file should result in saving at least one sector (2048 bytes, in one embodiment) of space in order to justify the compression. Otherwise, the data is archived in an uncompressed state.
Each 64 megabyte chunk of a file (compressed or not) will have a byte offset relative to the beginning of the file stored into a compressed chunk directory. Each file will have a compressed chunk directory, as described above with reference to step 705, that is stored, for example, in the file's UDF extended attributes. The compressed chunk directory is used during file recall to quickly locate any 64 megabyte chunk in a compressed archived file.
With the addition of compression, knowledge of where compressed data desired to be recalled is located within the compressed data for the file is needed. In step 802, the location of the desired compressed data is obtained from the chunk directory. As described above, there is a compressed chunk directory in the UDF extended attributes for every compressed file. To allow the file recall code to get to the compressed chunk directory quickly, in one embodiment, the location of the chunk directory is stored in the cache file system extended attributes for the file. In one embodiment, a buffer is used to hold the compressed chunk directory. The recall process reads in the compressed chunk directory pointed to in the extended attributes. Then the archive sectors containing the compressed data can be identified.
In step 803, the compressed data in the identified sectors is uncompressed. Recalling the contents of an archived file requires that the contents of the file be uncompressed if they are compressed. A compressed file is detected by the presence of its compressed chunk directory. If there is no directory, the file is assumed to be uncompressed, in one embodiment. Since, in one embodiment, compression is performed in 64 megabyte chunks, two 64 megabyte buffers are used for file recall processing: one to contain the compressed data and one to hold the uncompressed data as it is uncompressed.
The above described processes for compressing user data and accessing compressed user data are compatible with and complimentary to many compression algorithms known in the art. In one embodiment, the LZO compression algorithms are used. The LZO compression algorithms are available from http://www.oberhumer.com/opensource/lzo.
Seamless Recovery from Media Cloud
The Hybrid Storage Appliance (“HSA”) provides online archival access to very large collections of files. In on embodiment, files are distributed in various forms in a cloud of optical media. The cloud refers to all optical media stored in libraries locally attached or remotely connected to the HSA via WAN/LAN or a sister HSA. The nature of the underlying optical media does not allow for the use of traditional technologies for redundancy and automatic error recovery. Traditional file systems are backed by block devices which allow for various levels of RAID such as mirroring and parity drives. The HSA is backed by file based optical media so different techniques are used to seamlessly recover from failures in optical media, library robotics, optical drives, and network connections for the creation, recovery, and distribution of data across the libraries and optical media.
For file storage, files first show up on the server in the front-end file system cache. The files go through a waiting period before they freeze and are marked eligible for migration to optical media. An increment is created containing one, or a portion of one, or more than one file, for example, as described above with reference to
Once the file has been placed in an increment, the file is removed from the system and a stub is left that will trigger a file recovery to the front-end cache the next time the file is accessed. As described above, in one embodiment, a stub is a zero length file of the same name with extended attributes that have the information necessary to recover the file data from optical media. This information includes a list of volumes (burned optical discs) and for each volume a list of extents for the file. Each extent details a location on the optical media and its size.
If a failure occurs during the burn process, a new combination of library, media, and optical drive are picked and the process continues until one or more copies of the increment have been created. In one embodiment, the final location of the data in the media cloud 100 is typically not known by a user of the HSA server 110.
A file is recovered from the media cloud 100 when a request is made to access the file through the front-end file system cache. The file stub access triggers a request to be made to the media cloud 100. A piece of media containing the file is chosen based upon resource availability. If the file exists on a single piece of media, then the decision is simply when to schedule loading the piece of media into an available drive. If the media exists in multiple locations in the cloud 100, the decision is based on a preference for local libraries 111 and 112 over remote libraries 121 and 122 and then on library and/or drive availability within the library.
If a failure occurs while trying to access this piece of media, the cloud automatically chooses a new combination of library, drive, and optical media. In one embodiment, the self-healing media cloud 100 has the following properties:
When data arrives in the server's front-end cache, the data is sent back to the original requester of the data. The end user need not be notified or even aware of how the user's request was fulfilled by the media cloud 100. After some period of inactivity, the contents of the file are purged from the front-end cache and again replaced with the stub. In one embodiment, no data is written to optical media during this purge.
The media cloud 100 provides an automatic fail over for the creation, recovery, and distribution of data across the libraries and optical media. The media cloud 100 can recover from failures in libraries, drives, and optical media, and the media cloud's activities may be transparent to the end-user of the HSA.
In one embodiment, the Hybrid Storage Appliance (HSA) supports 500 pieces of media in a library. This media is moved between 514 locations within the library, including storage cartridges, disc transfer assemblies, and media drives. Optical media normally resides in small (e.g., 25 slots) or bulk (e.g., 225 slots) cartridges that are frequently moved in and out of the libraries. Since loading and reading the contents of each disc can take well over 2 hours depending upon the configuration, a mechanism is used to track the location of each disc in the library along with a summary of the disc's contents. This information also follows the discs around in the cartridge as the cartridges are moved in and out of libraries.
A manifest is created per cartridge that has detailed information on each piece of media it contains. This manifest is maintained, for example, in a flash device physically attached to the body of the media cartridge, in one embodiment. Alternative storage mechanisms or memory devices can also be used. In one implementation, flash devices are also attached to optical drives within the library and the body of a robotics sled used to transport the media between slots of a cartridge and the optical drives. Each piece of media has an associated media tag that follows the piece of media around inside the library. Media can reside in a cartridge, a robotics sled, or in an optical drive. The media tag is stored in flash or other storage medium on the device where the piece of media currently resides, be it a cartridge, robotics sled, or an optical disc drive.
When the library starts up, in one implementation, the library performs an inventory of all the media present in the library. This inventory is created from the contents of the various flash devices on cartridges, robotic sleds, and drives. For cartridges, the manifest entries 1010 and media tags 1001 reside in the cartridge flash so that the cartridges can be removed and replaced in libraries and still provide instant access to the inventory. As a result of the inventory, the library is presented with a map indicating the locations of media along with the associated media tags 1001. If a piece of media has a media tag 1001, the corresponding manifest entry is retrieved from the cartridge flash. This initial inventory process occurs very quickly and avoids the need to load discs into drives or for discs to be registered to a particular location.
In one embodiment, during normal operation, the manifest entry is only modified following an operation performed while the disc is in the drive (e.g., data written to the media). However, loading a disc into a drive merely to read its contents would not change the manifest contents. After an operation is performed on the disc while the disc is in the drive, the current state of the media is compared to the recorded state of the media in the manifest 1010 as it is unloaded. If the states differ, the manifest 1010 is updated to reflect the current state. As discussed above, the manifest entry is not tied to a particular slot in the cartridge, but instead the manifest entry is associated to the media with the media tag 1001. This allows the media to be moved around at will within the cartridge, robotics sled and optical disc drive without changing the manifest entry.
In one embodiment, during normal operation, the media tag 1001 remains unchanged, except for the following situations:
Because the media tag 1001 remains unchanged during normal operations, except in certain circumstances detailed above, the frequency of updating the media tags 1001 and the manifest is manageable. Thus, the media tags and cartridge manifests provide a convenient mechanism to track the media in a library as the media are moved into, throughout, and out of the library.
The above description is included to illustrate the operation of embodiments and is not meant to limit the scope of the disclosure. From the above discussion, many variations will be apparent to one skilled in the relevant art that would yet be encompassed by the spirit and scope as set forth herein. Those of skill in the art will also appreciate other embodiments from the teachings herein. The particular naming of the components, capitalization of terms, the attributes, data structures, or any other programming or structural aspect is not mandatory or significant, and the mechanisms that implement the features may have different names, formats, or protocols. Also, the particular division of functionality between the various system components described herein is merely exemplary, and not mandatory; functions performed by a single system component may instead be performed by multiple components, and functions performed by multiple components may instead performed by a single component.
The methods and operations presented herein are not inherently related to any particular computer or other apparatus. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the disclosure herein is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings as described herein, and any references to specific languages are provided for enablement and best mode of embodiments as disclosed.
Embodiments disclosed are well suited to a wide variety of computer network systems over numerous topologies. Within this field, the configuration and management of large networks comprise storage devices and computers that are communicatively coupled to dissimilar computers and storage devices over a network, such as the Internet.
Finally, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure is intended to be illustrative, but not limiting, of the scope.
This application claims a benefit of, and priority under, 35 U.S.C. §119(e) to U.S. Provisional Patent Application Ser. No. 61/176,697, “Accessing, Compressing, And Tracking Media Stored In An Optical Disc Storage System,” filed on May 8, 2009, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61176697 | May 2009 | US |