Combined stream auxiliary copy system and method

Description

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosures, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to data storage in a computer network and, more particularly, to a system and method for optimizing storage operations.

2. Description of Related Art

The GALAXY data storage management system software manufactured by COMMVAULT SYSTEMS, INC. of Oceanport, N.J., uses storage policies to direct how data is to be stored. Referring to FIG. 1, there is shown a library storage system 100 in accordance with the prior art. Storage policies 20 in a management server 21 may be used to map copy data from a source 24, through a media agent 26 to a physical media location 28, 30, 32, 34, 36, 38 using e.g., tapes, drives, etc., where data is to be stored. Storage policies 20 are generally created at the time of installation of each media library, and/or stand alone drive. Numerous storage policies may be created and modified to meet storage management needs. A storage policy allows the user to define how, where, and the duration for which data should be stored without requiring intimate knowledge or understanding of the underlying storage architecture and technology. The management details of the storage operations are transparent to the user.

Storage policies 20 can be viewed as a logical concept that direct the creation of one or more copies of stored data with each copy being a self-contained unit of information. Each copy may contain data from multiple applications and from multiple clients or data sources. Within each copy are one or more archives, relating to a particular application. For example, one archive might contain log files related to a data store and another archive in the same copy might contain the data store itself.

Storage systems often have various levels of storage. A primary copy or data set, for example, indicates the default destination of storage operations for a particular set of data that the storage policy relates to and is tied to a particular set of drives. These drives are addressed independently of the library or media agent to which they are attached. In FIG. 1, the primary drives are media 28, 30, 32, 34, 36 and 38. Clearly other forms of storage media could be used such as tapes or optical media. The primary data set might, for example, contain data that is frequently accessed for a period of one to two weeks after it is stored. A storage administrator might find storing such data on a set of drives with fast access times preferable. On the other hand, such fast drives are expensive and once the data is no longer accessed as frequently, the storage administrator might find it desirable to move and copy this data to an auxiliary or secondary copy data set on a less expensive tape library or other device with slower access times. Once the data from the primary data set is moved to the auxiliary data set, the data can be pruned from the primary data set freeing up drive space for new data. It is thus often desirable to perform an auxiliary storage operation after a primary data set has been created. In FIG. 1, the auxiliary data set is copied to drives or tapes 40, 42 and 44.

Storage policies generally include a copy name, a data stream, and a media group. A primary copy name may be established by default whenever a storage policy for a particular client is created and contains the data directed to the storage policy. A data stream is a channel between the source of the data, such as data streams 50 and 52 in FIG. 1 and the storage media such as data streams 50 and 52 in FIG. 1. Such a data stream is discussed in HIGH-SPEED DATA TRANSFER MECHANISM, Ser. No. 09/038,440 referenced above. To increase the speed of a copy, data to be backed-up is frequently divided into a plurality of smaller pieces of data and these pieces are sent to a plurality of storage media using their own respective data streams. In FIG. 1, data from source 24 is broken into two portions and sent using streams 50,52 to media 28, 36.

A client's data is thereby broken down into a plurality of sub-clients. In FIG. 1, media 28, 30, 32 and 34 may comprise a single media group and media 36 and 38 a second media group. A media group generally refers to a collection of one or more physical pieces of storage media. Only a single piece of media within the group is typically active at one time and data streams are sent to that media until it achieves full capacity. For example, data stream 50 will feed source data to medium 28 until it is full and then feed data to media 30. Multiple copies may be performed using multiple streams each directed to a respective media group using multiple storage policies.

Auxiliary copying, discussed in more detail in commonly owned application Ser. No. 10/303,640, denotes the creation of secondary copies, such as medium 40 or medium 42, of the primary copy. Since auxiliary copying involves multiple storage policies and data streams which each point to a particular media group, data is likely scattered over several pieces of media. Even data related to single stream copy operations might also be scattered over several media. Auxiliary copying is generally performed on a stream-by-stream basis and one stream at a time, in an attempt to minimize the number of times the primary media are mounted/unmounted. For example, for a copy of 10 pieces of primary media where four streams are used, auxiliary copying first entails copying all archive files of the first stream to a first set of auxiliary media, then the second stream to a second set of auxiliary media, etc. In FIG. 1, an auxiliary copy of stream 50 is made using auxiliary stream 50a to medium 40 and, if needed, medium 42. Thereafter, an auxiliary copy of stream 52 is made using auxiliary stream 52a to medium 44.

An archive file, at least with respect to auxiliary copying, is generally copied from a first chunk of data to a last chunk. When an auxiliary copy operation is cancelled or suspended before all chunks of an archive file are successfully copied to the destination copy, the chunks successfully copied are generally discarded or overwritten later when the archive file is again copied to the same copy or medium. This is undesirable because it wastes time and resources to copy the same chunks repeatedly; it wastes media because useless data occupies the media until the media is reusable; and if the network is not stable, a large archive file may never be successfully copied.

Although the GALAXY data storage management system software provides numerous advantages over other data storage management systems, the process for restoring copied data may require access to several media, which involves multiple mounting/unmounting of media, thereby increasing the time necessary for a restoration. Additionally, although an effort is made to minimize the number of times media are mounted and unmounted, the stream-by-stream basis used in auxiliary copying does not minimize the number of mount/unmount times necessary for the auxiliary copy and does not minimize tape usage. For example, in FIG. 1, media 40 and 44 may both be less then half full but both are needed to copy data through streams 50a, 52a using conventional techniques and both must be remounted for a restore. Performing auxiliary copying on a stream-by-stream basis is also generally a lengthy process. Finally, restarting a copy of an archive file that has been cancelled or suspended by always copying the first to the last chunk is inefficient with respect to media usage and the time necessary to complete a copy.

There is therefore a need in the art for a system and method for increasing the efficiency of storage management systems.

SUMMARY OF THE INVENTION

A system and method for transferring data in a library storage system. The library storage system comprises a media server including a storage policy. A media agent is connected to the media server. A plurality of storage media and a data source are connected to the media agent. The media agent divides the data source into at least a first and a second portion of data. The portions of data are transferred from the data source to a first and second primary storage medium using a first and a second data stream respectively. The media agent then causes the first and second portion of data to be transferred from the first and second storage medium to a third auxiliary storage medium using a third combined data stream. Auxiliary copying is performed in chunks and multiple streams are copied in parallel.

One aspect of the invention is a method for transferring data in a library storage system. The library storage system comprises a management server. A media agent is connected to the management server. A plurality of storage media are connected to the media agent and a data source is connected to the media agent. The method comprises dividing the data source into at least a first and a second portion of data. The method further comprises transferring the first and second portion of data from the data source to a first and second storage medium using a first and a second data stream respectively. The method still further comprises transferring the first and second portion of data from the first and second storage medium to a third storage medium using a third combined data stream.

Another aspect of the invention is a system for transferring data. The system comprises a data source, a media agent connected to the data source and a management server connected to the media agent. The system further comprises at least a first, second, and third storage medium connected to the media agent. The data source is divided into at least a first and a second portion of data. The media agent transfers the first and the second portion of data from the data source to the first and second storage medium using a first and second data stream respectively. The media agent transfers the first and second portion of data from the first and second storage medium to the third medium using a third combined data stream.

Still another aspect of the invention is a recording medium in a storage system with data stored thereon. The storage system comprises a management server, a media agent connected to the management server, a plurality of storage media connected to the media agent, and a data source connected to the media agent. The data is produced by splitting data source into at least a first and a second portion; transferring the first portion to a first storage medium using a first stream; transferring the second portion to a second storage medium using a second stream; and transferring the first and second portion of data from the first and second storage medium to a third storage medium using a third data stream.

Yet still another aspect of the invention is a method for transferring data in a storage system. The storage system comprises a management server, a media agent connected to the management server, a plurality of storage media connected to the media agent, and a data source connected to the media agent. The method comprises dividing the data source into at least a first and a second portion of data. The method further comprises transferring the first and second portion of data from the data source to a first number of pieces of storage media. The method further comprises transferring the first and second portion of data from the first number of pieces of storage media to a second number of pieces of storage media, the second number being less than the first number.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing the operation of a library storage system in accordance with the prior art.

FIG. 2 is a block diagram showing the operation of a library storage system in accordance with the invention.

FIG. 3 is a flow chart detailing some of the operations of an embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

The efficiency of data storage management systems is increased in the invention by providing a system and method that combines data streams of one or more storage policies during an auxiliary copying operation. Combining data streams generally denotes copying or backing-up archive files associated with different streams, onto a single or a fewer number of streams, thereby minimizing the number of media required for an auxiliary copy operation and consequently reducing the number of mount/unmount times necessary.

Combining streams may be enabled by allowing a plurality of applications in a source to be copied to point to a single storage policy. Referring to FIG. 2, there is shown a library storage system 200 in accordance with the invention. Data from a plurality of applications (e.g. EXCHANGE, WORD, EXCEL, etc.) from a source 25 is controlled via media agent 27 according to storage characteristics specified by a storage policy 19 in a management server 23. The data are each copied to a respective medium, such as a hard drive, tape, etc through streams 54, 56, and 57. For example, for a primary copy or storage of three applications, the data from each of the applications will be saved to tapes 58, 60, 62, 64, 66 and 68, respectively, through streams 54, 56, and 57 respectively, thereby requiring at least three tapes for the copy operation, i.e., tapes 58, 62, and 66. During the auxiliary copy operation which combines streams, the data on tapes 58, 62 and 66 are combined into fewer media—i.e., tape 70 and, only if needed, tape 72. This may be accomplished, for example, by a storage policy pointing to either the same or another media library for storing the auxiliary copies. Thus, the primary copy operation requires three tapes, but the auxiliary copy is reduced to one tape, assuming the capacity of one tape is sufficient to hold the data from the three applications.

The archive files may be given an application identification, e.g., appId (sub-client), and the files are copied by default in ascending order according to the appId in order to minimize the impact on restore speeds. Alternately, combining streams can be done stream-by-stream for the fastest copying times as discussed below.

In addition to combining data streams, data is copied to the auxiliary media 70, 72 in a logical order, such as in the order of the primary copy according to the date the primary archive files were created. In this way, the archive files of a copy may be copied together in a single medium, allowing users to more easily determine which medium contains a particular copy.

Combining streams helps in media recycling. For instance, assume that there are several primary copies on four media that correspond to four streams and archive pruning has pruned all but one copy. The remaining copy may still hold up the four media. If the primary copy is copied, job-by-job, into one stream to an auxiliary copy, the primary copy will be copied onto one medium and the other three primary media are then recyclable.

The option of combining data streams may be selected or specified as an optional copy method at the time a particular storage policy is created or defined. The combined data stream copy method may be applied to either synchronous data replication or selective data replication. Primary copies requiring multiple streams will generally not be copied to a medium with copies using combined streams. A copy made pursuant to a storage policy that combines streams generally cannot be changed into a copy that doesn't combine streams and vice versa.

A storage policy that combines streams includes a property, which may be selected or defined, that may be used for specifying the order for which data will be copied to media, e.g., a copy order. By default the copy order is in the order of application and job (explained below). This enhances efficiency with respect to restoring data from the copy. Alternately, the user may specify that the data be copied in the order of the stream number which is more efficient but yields a high penalty for restores.

The “order of application and job” technique works as follows. All copy jobs within a given instance/copySet are copied together, e.g., all jobs selected for each client, appType, and instance/copySet. The jobs, e.g., the archive files, are then copied in the ascending order of their archive file Identification (“Ids”). Once a job copy is started, all the job's archive files are copied together, even if those numbers are higher than other archive file Ids.

For example, if a copy set has two sub-clients with following archive files: (1) Sub-client 1: Job 1, archive file1 (“AF1”) created 2:00 pm, (2) Sub-client 2: Job 2, archive file 2 (“AF2”) created 3:00 pm, (3) Sub-client 1: Job 1, archive file 3 (“AF3”) created 4:00 pm (resumed and finished or a multi-archive file job like exchange database), and (4) Sub-client 1: Job 3, archive file 4 (“AF4”) created 5:00 pm. The copy order is AF1, AF3 (to finish Archive files of the job being copied), AF2, and AF4.

When a property or feature of the primary copy is changed or modified, the copy order of each auxiliary copy that combines streams may also be changed. For example, if the primary copy was copied on a non-magnetic medium and now will be copied on a magnetic medium, the copy order will automatically be set to in order of application and job for all secondary copies.

Otherwise, there will be generally no change in the copy order for the secondary copies. After the primary copy has been changed, the former primary copy by default will not combine data streams.

During the creation of a storage policy for a nonmagnetic media group or drive, the graphic user interface (“GUI”) includes a form element, e.g., check box, that allows the user to select the combine data stream option. The option is preferably checked OFF by default. Users can select the option by selecting or turning the feature ON in the Copy Policy interface screen in order to enable the combine data stream option.

If the combine data stream option is selected, the copy order property will be enabled which allows the user to select from one of two choices: in order of stream number and in order of application and job. For a storage policy or policies for auxiliary copies whose primary copy is saved or to be saved on magnetic media or drives, where the combine data stream option is selected, the copy order is preferably in order of application and job by default. Otherwise, the default copy order is in order of application and job. The copy order can be changed from one to the other at any time.

The GUI may display a message, such as a popup message, in the following situations:

- Where the primary copy is stored or to be stored on non-magnetic media or drives, if the user selects the combine data stream option or changes the copy order, the GUI warns the user about a higher amount of mount/unmount and tape seek activity during restores that will occur if the combine stream option is in order of stream number or during auxiliary copies if the option is in order of application and job.
- If the user tries to point a SQL or DB2 sub-client to a storage policy that has a copy with the combine data stream option selected, the GUI warns the user that the multi-stream SQL or DB2 copies will not be copied using combined streams.
- If a storage policy is pointed to by a SQL or DB2 sub-client and the user tries to create a new copy with the combine data stream option selected or tries to select the combine data stream option for an existing copy, the GUI warns the user that the multi-stream SQL or DB2 copies will not be copied to an existing copy using combined streams.

An archive manager is a computer program or instruction that manages archive operations, such as creating and updating a storage policy, and retrieving data related to a storage policy. The archive manager may be implemented as an application or module and resides on a reference storage manager or media agent. An archive manager is preferably embodied in an ArchiveManagerCS class that is implemented as an Application Program Interface (API). The class further interfaces with at least one database or table which preferably includes the details of storage policies, such as the copy name, data stream, media group, combine data stream properties, etc. The database or table includes values such as streamNum and flags, which indicate the selection of the combine data stream option. Additionally, the database or table may be accessed by other object classes, which may use the relevant data contained therein.

The stream number of an archive file copy is passed to a createCopy( ) method included in CVArchive. Additionally, an AuxCopyMgr process sends the stream number of an archive file copy to a remote auxCopy process in a CVA_COPYAFILE_REQ message.

All copies associated with a storage policy have the same number of streams, e.g., the maximum number of streams, of the storage policy. This does not mean that a library for each copy has to have the same number of drives. A primary copy needs enough drives to support multi-stream copy. An auxiliary copy that combines data streams actually needs only one drive for auxiliary copying and for data restoration. Consequently, the associated library can be a stand-alone drive. In order to take advantage of stream consolidation, users that select the combine stream option are allowed to create a storage policy pointing to a storage library with fewer auxiliary drives than copies.

Backup and synthetic full backups are allowed, which include a backup process writing the streamNum related to a storage policy into the archFileCopy table rather than archFile table when each archive file is created. The archive manager preferably handles this process.

A file system-like restore (involving indexing) includes one or more sub-clients. The sub-client restorations, may be performed serially, one at a time, in an arbitrary order or based on archive file location. For example, for each sub-client restore, archive files may be restored chronologically, such as in the order that the files were created. Alternatively or in addition, files may be restored, according to their offsets, such as restoring in order of offsets ascending within each archive file. Offset refers to the distance from a starting point, e.g., the start of a file. Movement within an archive file typically corresponds with higher physical offsets from the beginning of the archive file.

The archive files in a secondary or auxiliary copy that are created by combining data streams are by default ordered as required for restoration. Restore efficiency could therefore be better with the auxiliary copy than with the primary copy. With respect to combining data stream-by-stream, the order of the archive files on media holding an auxiliary copy, may not agree with the order the primary copies were created, which may require backwards tape movement during the restore. Backwards tape movement, the need to rewind, may be correctly handled by programming, such as by DATAMOVER software by GALAXY, during data restoration. Backward movement, however, has a negative impact on performance. A multi-stream ORACLE or INFORMIX copy can be restored from a single stream. However, backwards tape movement during the restore may occur.

It is preferred that a copy involving multiple streams will not be copied to a copy medium that combines streams. Single stream copies may be copied to a copy medium that combines streams.

Referring to FIG. 3, there is shown a summary of the operations of the invention with respect to combining streams. At step S102, the storage policy is queried or the user is asked whether the combine streams option should be enabled in his copy. If the user answers no or the storage policy indicates no, control branches to step S112 and copying is performed as in the prior art. Otherwise, control branches to step S106, and the system determines whether the streams can be combined. For example, auxiliary copy of SQL data should be the same number of streams as the primary copy. If the streams cannot be combined, control branches to step S104, the user is informed that the streams cannot be combined, and copying is performed as in the prior art in step S12.

If the streams can be combined, control branches to step S108 and the data is backed up to the primary storage media using a desired number of streams. Thereafter, control branches to step S110 where the auxiliary copy is performed combining data streams.

Copy Restartable at Chunk

As stated above, in prior art auxiliary copying systems, auxiliary file copying restarts from a first chunk if the auxiliary copying was interrupted. This means if the copying operation is stopped in the middle, all copied chunks need to be copied again.

In the invention, auxiliary copying is performed such that data chunks of an archive file that have been successfully copied to a copy medium are not discarded and the copy operation resumes copying where the previous copying left off; auxiliary copy operations are restartable by a chunk, as opposed to restartable at archive file. Copying that is restartable at a chunk may be achieved with an API which calls a class that includes methods that do not delete the copied chunks. For example, the createArchFileCopy( ) method in the ArchiveManagerCS class may include an instruction so that the successfully copied chunks are not deleted. A method may further be included to retrieve the last chunk copied for each archive file to be copied, such as a getToBeCopiedAfilesByCopy( ) method in the ArchiveManagerCS class. Additionally, new fields may be added into the CVA_COPYAFILE_REQ, such as messagearchFileSeqNum, startChunkNum, startLogicalOffset and startPhysicalOffset fields.

The process for restarting a copy at chunks includes the AuxCopyMgr process checking if the archive file to be copied has chunks that were successfully copied to the copy media. If chunks have been copied successfully, the AuxCopyMgr process retrieves variables archFileSeqNum, startChunkNum, startLogicalOffset and startPhysicalOffset for the archive file and sends them to the AuxCopy process in the CVA_COPYAFILE_REQ message. For each stream of the destination copy, the AuxCopyMgr process starts copying from the archive file that has chunks that were successfully copied. The AuxCopy process calls CVArchive::createCopy( ) using the parameters archFileSeqNum, startChunkNum, startLogicalOffset and startPhysicalOffset. This allows AuxCopy to start writing or copying from the correct chunk and offset. The AuxCopy process may also call DataMover::Seek( ) with startPhysicalOffset as one of the input parameters to find the starting chunk and offset before the first DataMover::Read( ) call.

Additionally, the CVArchive::createCopy( ) API, which is used by AuxCopy, includes input parameters archFileSeqNum, startChunkNum, startLogicalOffset and startPhysicalOffset. When startChunkNum>1, the API does not send a CVA_CREATEAFILECOPY_REQ message to commServer for creating an archFileopy entry since there is one already. The API also uses the parameters passed in to it to call Pipelayer::create( ).

Multi-Stream Auxiliary Copy

In another aspect of the invention, methods and systems are provided which allow multi-stream auxiliary copying. In the prior art, auxiliary copying is performed one stream at a time no matter how many streams are used during a copy. The amount of time for copying a copy job is therefore proportional to the number of streams used during a copy. This is referred to as single-stream Auxiliary Copying.

In the invention, multi-stream Auxiliary Copying refers to performing auxiliary copies for a plurality of streams in parallel. This may be accomplished by providing a sufficient number of drives so that each stream may copy to at least one drive, thereby reducing the time necessary for auxiliary copies involving multiple streams. For example, in an instance where two drives are required for each stream (e.g., one source and one destination), the number of streams that can be copied at the same time is half of the number of available drives. If six streams were used for copy jobs, an auxiliary copy job can copy archive files for three streams at a time if there are six drives available, and can copy archive files for six streams at a time if there are twelve drives available, etc.

The process for multi-stream auxiliary copying includes the AuxCopyMgr process reserving more than one stream for the same destination copy or for multiple destination copies at same time. One stream is assigned to one destination copy at a time. If the AuxCopyMgr process has not reserved enough streams, the process will keep trying if some streams are temporarily not available. When a copy is done with a stream for a destination copy, the AuxCopyMgr first releases the stream then tries to reserve the next stream (the copy can be different). The AuxCopy process is able to run more than one worker thread that copies an archive file for a stream and each thread uses its own pipeline. When an Auxiliary Copy job is interrupted, stopped, or cancelled, the AuxCopy process stops all the worker threads and exits, and the AuxCopyMgr process releases all the streams and exits. If an AuxCopy process fails to copy for one stream, the worker thread reports the failure to AuxCopyMgr process and exits. The AuxCopy process continues to run until no work thread is running or is stopped by AuxCopyMgr. Depending on the nature of the failure, the AuxCopyMgr process decides whether it is necessary to stop copying archive files for all streams of a copy or stop copying archive files for all copies.

Thus, by combining streams in auxiliary copying, auxiliary copy operations are optimized. By allowing auxiliary copies to be performed by chunk, auxiliary copying may be performed more efficiently even if the copying is interrupted. By allowing for multiple stream auxiliary copies, auxiliary copying may be performed even more quickly than that available in the prior art.

Although the invention has been described in connection with the GALAXY data management system by way of example, it is understood that the disclosure may be applied to other data management systems, and references to the GALAXY system should therefore not be viewed as limitations.

While the invention has been described and illustrated in connection with preferred embodiments, many variations and modifications as will be evident to those skilled in this art may be made without departing from the spirit and scope of the invention, and the invention is thus not to be limited to the precise details of methodology or construction set forth above as such variations and modification are intended to be included within the scope of the invention.

Claims

1. A method for transferring data in a multi-tiered storage system, the method comprising: performing a first backup copy of data stored in a data source, wherein said performing of the first backup copy further comprises: dividing the data in the data source into at least a first portion of data and a second portion of data, the data comprising multiple file types, andtransferring the first and second portions of data from the data source to a first storage medium and a second storage medium using a first data stream and a second data stream respectively to create the first backup copy of the data;identifying the multiple file types of data in the first and second portions of data;determining based at least upon the file types if the first portion of data and the second portion of data in the first backup copy can be combined;if the first portion of data and the second portion of data can be combined, performing a second backup copy of the first and second portions of data, wherein the second backup copy saves the first and second portions of data in a combined format, wherein the performing of the second backup copy comprises: transferring the first and second portions of the first backup copy of the data from the first and second storage mediums to a third storage medium by combining data streams from the first and second storage mediums, andstoring on the third storage medium, the additional copies of the data by storing in a combined format, the first and second portions of the first backup copy to create the second backup copy; andrestoring the first portion of data to the data source by retrieving the first portion of data from the combined format of the second backup copy.
2. The method as recited in claim 1, wherein the transfer from the first and second storage medium to the third storage medium is performed in chunks.
3. The method as recited in claim 1, wherein the transfer using the third data stream is performed based on a client identification of the first and second portion of data.
4. The method as recited in claim 1, wherein the transfer using the third data stream is performed based on respective stream numbers of the first and second streams.
5. A recording medium in a storage system with data stored thereon, the data produced by: copying data from a data source to a plurality of storage media, wherein said copying comprises splitting the data source data into at least a first and a second portion,transferring the first portion to a first storage medium using a first stream,transferring the second portion to a second storage medium using a second stream;identifying file types of data in the first and second portions of data;determining, based upon the file types, whether or not the first portion and the second portion are combinable into one or more data streams; andtransferring the first and second portion of data from the first and second storage medium to a third storage medium using a third combined data stream to create additional copies of the first and second portions of data wherein the additional copies store the first and second portions of data in a combined format; andrestoring the first portion of data by retrieving the first portion of data from the combined format of the additional copies stored in the third storage medium.
6. A method for transferring data in a storage system, the method comprising: dividing a data source into at least a first and a second portion of data;transferring the first and second portion of data from the data source to a first number of pieces of storage media;accessing user input regarding whether the first and second portions of data should be combined;determining if the first portion of data and the second portion of data are combinable based upon files types contained in the first and second portions of data; andtransferring the first and second portion of data from the first number of pieces of storage media to a second number of pieces of storage media, the second number being less than the first number to create additional copies of the first and second portions of data wherein the additional copies store the first and second portions of data in a combined format; andrestoring the first portion of data by retrieving the first portion of data from the combined format of the additional copies stored in the second number of pieces of storage media.
7. The method of claim 1, additionally comprising providing a user notification if the first portion of data and the second portion of data cannot be combined.
8. The method of claim 1, wherein the first portion of data is associated with a first application and the second portion of data is associated with a second application.

RELATED APPLICATIONS

This application claims priority to provisional application No. 60/411,202 filed Sep. 16, 2002, the entirety of which is hereby incorporated by reference. This application is related to the following pending applications, each of which is hereby incorporated herein by reference in its entirety: application Ser. No. 09/610,738, titled MODULAR BACKUP AND RETRIEVAL SYSTEM USED IN CONJUNCTION WITH A STORAGE AREA NETWORK, filed Jul. 6, 2000;application Ser. No. 09/609,977, titled MODULAR BACKUP AND RETRIEVAL SYSTEM WITH AN INTEGRATED STORAGE AREA FILING SYSTEM, filed Jul. 5, 2000;application Ser. No. 09/354,058, titled HIERARCHICAL BACKUP AND RETRIEVAL SYSTEM, filed Jul. 15, 1999;application Ser. No. 09/774,268 titled LOGICAL VIEW AND ACCESS TO PHYSICAL STORAGE IN MODULAR DATA AND STORAGE MANAGEMENT SYSTEM, filed, Jan. 30, 2001; application Ser. No. 09/038,440, titled HIGH-SPEED DATA TRANSFER MECHANISM, filed Mar. 11, 1998; andapplication Ser. No. 10/303,640 titled SELECTIVE DATA REPLICATION SYSTEM AND METHOD, filed Nov. 25, 2002.

US Referenced Citations (297)

Number	Name	Date	Kind
4296465	Lemak	Oct 1981	A
4686620	Ng	Aug 1987	A
4695943	Keeley et al.	Sep 1987	A
4888689	Taylor et al.	Dec 1989	A
4995035	Cole et al.	Feb 1991	A
5005122	Griffin et al.	Apr 1991	A
5062104	Lubarsky et al.	Oct 1991	A
5093912	Dong et al.	Mar 1992	A
5133065	Cheffetz et al.	Jul 1992	A
5163131	Row et al.	Nov 1992	A
5193154	Kitajima et al.	Mar 1993	A
5212772	Masters	May 1993	A
5226157	Nakano et al.	Jul 1993	A
5239647	Anglin et al.	Aug 1993	A
5241668	Eastridge et al.	Aug 1993	A
5241670	Eastridge et al.	Aug 1993	A
5247616	Berggren et al.	Sep 1993	A
5276860	Fortier et al.	Jan 1994	A
5276867	Kenley et al.	Jan 1994	A
5287500	Stoppani, Jr.	Feb 1994	A
5301351	Jippo	Apr 1994	A
5311509	Heddes et al.	May 1994	A
5321816	Rogan et al.	Jun 1994	A
5333315	Saether et al.	Jul 1994	A
5347653	Flynn et al.	Sep 1994	A
5377341	Kaneko et al.	Dec 1994	A
5388243	Glider et al.	Feb 1995	A
5410700	Fecteau et al.	Apr 1995	A
5428783	Lake	Jun 1995	A
5448724	Hayashi et al.	Sep 1995	A
5465359	Allen et al.	Nov 1995	A
5487160	Bemis	Jan 1996	A
5491810	Allen	Feb 1996	A
5495607	Pisello et al.	Feb 1996	A
5504873	Martin et al.	Apr 1996	A
5515502	Wood	May 1996	A
5544345	Carpenter et al.	Aug 1996	A
5544347	Yanai et al.	Aug 1996	A
5555404	Torbjornsen et al.	Sep 1996	A
5559957	Balk	Sep 1996	A
5559991	Kanfi	Sep 1996	A
5588117	Karp et al.	Dec 1996	A
5592618	Micka et al.	Jan 1997	A
5598546	Blomgren	Jan 1997	A
5606359	Youden et al.	Feb 1997	A
5615392	Harrison et al.	Mar 1997	A
5619644	Crockett et al.	Apr 1997	A
5638509	Dunphy et al.	Jun 1997	A
5642496	Kanfi	Jun 1997	A
5644779	Song	Jul 1997	A
5651002	Van Seters et al.	Jul 1997	A
5673381	Huai et al.	Sep 1997	A
5675511	Prasad et al.	Oct 1997	A
5680550	Kuszmaul et al.	Oct 1997	A
5682513	Candelaria et al.	Oct 1997	A
5687343	Fecteau et al.	Nov 1997	A
5692152	Cohen et al.	Nov 1997	A
5699361	Ding et al.	Dec 1997	A
5719786	Nelson et al.	Feb 1998	A
5729743	Squibb	Mar 1998	A
5737747	Vishlitzky et al.	Apr 1998	A
5751997	Kullick et al.	May 1998	A
5758359	Saxon	May 1998	A
5761104	Lloyd et al.	Jun 1998	A
5761677	Senator et al.	Jun 1998	A
5761734	Pfeffer et al.	Jun 1998	A
5764972	Crouse et al.	Jun 1998	A
5778395	Whiting et al.	Jul 1998	A
5790828	Jost	Aug 1998	A
5805920	Sprenkle et al.	Sep 1998	A
5812398	Nielsen	Sep 1998	A
5813008	Benson et al.	Sep 1998	A
5813009	Johnson et al.	Sep 1998	A
5813017	Morris	Sep 1998	A
5815462	Konishi et al.	Sep 1998	A
5829023	Bishop	Oct 1998	A
5829046	Tzelnic et al.	Oct 1998	A
5860104	Witt et al.	Jan 1999	A
5875478	Blumenau	Feb 1999	A
5875481	Ashton et al.	Feb 1999	A
5878056	Black et al.	Mar 1999	A
5887134	Ebrahim	Mar 1999	A
5890159	Sealby et al.	Mar 1999	A
5897643	Matsumoto	Apr 1999	A
5901327	Ofek	May 1999	A
5924102	Perks	Jul 1999	A
5926836	Blumenau	Jul 1999	A
5933104	Kimura	Aug 1999	A
5936871	Pan et al.	Aug 1999	A
5950205	Aviani, Jr.	Sep 1999	A
5956519	Wise et al.	Sep 1999	A
5958005	Thorne et al.	Sep 1999	A
5970233	Liu et al.	Oct 1999	A
5970255	Tran et al.	Oct 1999	A
5974563	Beeler, Jr.	Oct 1999	A
5987478	See et al.	Nov 1999	A
5995091	Near et al.	Nov 1999	A
5999629	Heer et al.	Dec 1999	A
6003089	Shaffer et al.	Dec 1999	A
6009274	Fletcher et al.	Dec 1999	A
6012090	Chung et al.	Jan 2000	A
6021415	Cannon et al.	Feb 2000	A
6026414	Anglin	Feb 2000	A
6041334	Cannon	Mar 2000	A
6052735	Ulrich et al.	Apr 2000	A
6058494	Gold et al.	May 2000	A
6076148	Kedem	Jun 2000	A
6094416	Ying	Jul 2000	A
6094684	Pallmann	Jul 2000	A
6101255	Harrison et al.	Aug 2000	A
6105129	Koseki et al.	Aug 2000	A
6105150	Noguchi et al.	Aug 2000	A
6112239	Kenner et al.	Aug 2000	A
6122668	Teng et al.	Sep 2000	A
6131095	Low et al.	Oct 2000	A
6131190	Sidwell	Oct 2000	A
6137864	Yaker	Oct 2000	A
6148412	Cannon et al.	Nov 2000	A
6154787	Urevig et al.	Nov 2000	A
6154852	Amundson et al.	Nov 2000	A
6161111	Mutalik et al.	Dec 2000	A
6167402	Yeager	Dec 2000	A
6175829	Li et al.	Jan 2001	B1
6212512	Barney et al.	Apr 2001	B1
6230164	Rikieta et al.	May 2001	B1
6260069	Anglin	Jul 2001	B1
6269431	Dunham	Jul 2001	B1
6275953	Vahalia et al.	Aug 2001	B1
6292783	Rohler	Sep 2001	B1
6295541	Bodnar et al.	Sep 2001	B1
6301592	Aoyama et al.	Oct 2001	B1
6304880	Kishi	Oct 2001	B1
6324581	Xu et al.	Nov 2001	B1
6328766	Long	Dec 2001	B1
6330570	Crighton	Dec 2001	B1
6330572	Sitka	Dec 2001	B1
6330642	Carteau	Dec 2001	B1
6343324	Hubis et al.	Jan 2002	B1
6350199	Williams et al.	Feb 2002	B1
RE37601	Eastridge et al.	Mar 2002	E
6353878	Dunham	Mar 2002	B1
6356801	Goodman et al.	Mar 2002	B1
6374266	Shnelvar	Apr 2002	B1
6374336	Peters et al.	Apr 2002	B1
6381331	Kato	Apr 2002	B1
6385673	DeMoney	May 2002	B1
6389432	Pothapragada et al.	May 2002	B1
6418478	Ignatius et al.	Jul 2002	B1
6421711	Blumenau et al.	Jul 2002	B1
6438586	Hass et al.	Aug 2002	B1
6487561	Ofek et al.	Nov 2002	B1
6487644	Huebsch et al.	Nov 2002	B1
6505307	Stell et al.	Jan 2003	B1
6519679	Devireddy et al.	Feb 2003	B2
6538669	Lagueux, Jr. et al.	Mar 2003	B1
6542909	Tamer et al.	Apr 2003	B1
6542972	Ignatius et al.	Apr 2003	B2
6564228	O'Connor	May 2003	B1
6571310	Ottesen	May 2003	B1
6577734	Etzel et al.	Jun 2003	B1
6581143	Gagne et al.	Jun 2003	B2
6604149	Deo et al.	Aug 2003	B1
6631442	Blumenau	Oct 2003	B1
6631493	Ottesen et al.	Oct 2003	B2
6647396	Parnell et al.	Nov 2003	B2
6654825	Clapp et al.	Nov 2003	B2
6658436	Oshinsky et al.	Dec 2003	B2
6658526	Nguyen et al.	Dec 2003	B2
6675177	Webb	Jan 2004	B1
6732124	Koseki et al.	May 2004	B1
6757794	Cabrera et al.	Jun 2004	B2
6763351	Subramaniam et al.	Jul 2004	B1
6772332	Boebert et al.	Aug 2004	B1
6785786	Gold et al.	Aug 2004	B1
6789161	Blendermann et al.	Sep 2004	B1
6791910	James et al.	Sep 2004	B1
6859758	Prabhakaran et al.	Feb 2005	B1
6871163	Hiller et al.	Mar 2005	B2
6880052	Lubbers et al.	Apr 2005	B2
6909722	Li	Jun 2005	B1
6928513	Lubbers et al.	Aug 2005	B2
6952758	Chron et al.	Oct 2005	B2
6965968	Touboul et al.	Nov 2005	B1
6968351	Butterworth	Nov 2005	B2
6973553	Archibald, Jr. et al.	Dec 2005	B1
6983351	Gibble et al.	Jan 2006	B2
7003519	Biettron et al.	Feb 2006	B1
7003641	Prahlad et al.	Feb 2006	B2
7035880	Crescenti et al.	Apr 2006	B1
7062761	Slavin et al.	Jun 2006	B2
7069380	Ogawa et al.	Jun 2006	B2
7085904	Mizuno et al.	Aug 2006	B2
7103731	Gibble et al.	Sep 2006	B2
7103740	Colgrove et al.	Sep 2006	B1
7107298	Prahlad et al.	Sep 2006	B2
7107395	Ofek et al.	Sep 2006	B1
7117246	Christenson et al.	Oct 2006	B2
7120757	Tsuge	Oct 2006	B2
7130970	Devassy et al.	Oct 2006	B2
7155465	Lee et al.	Dec 2006	B2
7155633	Tuma et al.	Dec 2006	B2
7159110	Douceur et al.	Jan 2007	B2
7174433	Kottomtharayil et al.	Feb 2007	B2
7209972	Ignatius et al.	Apr 2007	B1
7246140	Therrien et al.	Jul 2007	B2
7246207	Kottomtharayil et al.	Jul 2007	B2
7246272	Cabezas et al.	Jul 2007	B2
7269612	Devarakonda et al.	Sep 2007	B2
7272606	Borthakur et al.	Sep 2007	B2
7277941	Ignatius et al.	Oct 2007	B2
7278142	Bandhole et al.	Oct 2007	B2
7287047	Kavuri	Oct 2007	B2
7287252	Bussiere et al.	Oct 2007	B2
7293133	Colgrove et al.	Nov 2007	B1
7298846	Bacon et al.	Nov 2007	B2
7315923	Retnamma et al.	Jan 2008	B2
7346623	Prahlad et al.	Mar 2008	B2
7359917	Winter et al.	Apr 2008	B2
7380072	Kottomtharayil et al.	May 2008	B2
7398429	Shaffer et al.	Jul 2008	B2
7401154	Ignatius et al.	Jul 2008	B2
7409509	Devassy et al.	Aug 2008	B2
7448079	Tremain	Nov 2008	B2
7454569	Kavuri et al.	Nov 2008	B2
7457933	Pferdekaemper et al.	Nov 2008	B2
7467167	Patterson	Dec 2008	B2
7472238	Gokhale	Dec 2008	B1
7484054	Kottomtharayil et al.	Jan 2009	B2
7490207	Amarendran	Feb 2009	B2
7500053	Kavuri et al.	Mar 2009	B1
7500150	Sharma et al.	Mar 2009	B2
7509019	Kaku	Mar 2009	B2
7519726	Palliyil et al.	Apr 2009	B2
7523483	Dogan	Apr 2009	B2
7529748	Wen et al.	May 2009	B2
7536291	Retnamma et al.	May 2009	B1
7546324	Prahlad et al.	Jun 2009	B2
7546482	Blumenau et al.	Jun 2009	B2
7581077	Ignatius et al.	Aug 2009	B2
7596586	Gokhale et al.	Sep 2009	B2
7613748	Brockway et al.	Nov 2009	B2
7627598	Burke	Dec 2009	B1
7627617	Kavuri et al.	Dec 2009	B2
7631194	Wahlert et al.	Dec 2009	B2
7685126	Patel et al.	Mar 2010	B2
7765369	Prahlad et al.	Jul 2010	B1
7809914	Kottomtharayil et al.	Oct 2010	B2
7831553	Prahlad et al.	Nov 2010	B2
7840537	Gokhale et al.	Nov 2010	B2
7861050	Retnamma et al.	Dec 2010	B2
8019963	Ignatius et al.	Sep 2011	B2
20020004883	Nguyen et al.	Jan 2002	A1
20020029281	Zeidner et al.	Mar 2002	A1
20020040405	Gold	Apr 2002	A1
20020042869	Tate et al.	Apr 2002	A1
20020042882	Dervan et al.	Apr 2002	A1
20020049778	Bell et al.	Apr 2002	A1
20020065967	MacWilliams et al.	May 2002	A1
20020107877	Whiting et al.	Aug 2002	A1
20020129203	Gagne et al.	Sep 2002	A1
20020194340	Ebstyne et al.	Dec 2002	A1
20020198983	Ullmann et al.	Dec 2002	A1
20030014433	Teloh et al.	Jan 2003	A1
20030016609	Rushton et al.	Jan 2003	A1
20030061491	Jaskiewicz et al.	Mar 2003	A1
20030066070	Houston	Apr 2003	A1
20030079112	Sachs et al.	Apr 2003	A1
20030169733	Gurkowski et al.	Sep 2003	A1
20040073716	Boom et al.	Apr 2004	A1
20040088432	Hubbard et al.	May 2004	A1
20040107199	Dairymple et al.	Jun 2004	A1
20040193953	Callahan et al.	Sep 2004	A1
20040210796	Largman et al.	Oct 2004	A1
20050033756	Kottomtharayil et al.	Feb 2005	A1
20050114477	Willging et al.	May 2005	A1
20050166011	Burnett et al.	Jul 2005	A1
20050172093	Jain	Aug 2005	A1
20050246568	Davies	Nov 2005	A1
20050256972	Cochran et al.	Nov 2005	A1
20050262296	Peake	Nov 2005	A1
20060005048	Osaki et al.	Jan 2006	A1
20060010154	Prahlad et al.	Jan 2006	A1
20060010227	Atluri	Jan 2006	A1
20060044674	Martin et al.	Mar 2006	A1
20060149889	Sikha	Jul 2006	A1
20060224846	Amarendran et al.	Oct 2006	A1
20070288536	Sen et al.	Dec 2007	A1
20080059515	Fulton	Mar 2008	A1
20080229037	Bunte et al.	Sep 2008	A1
20080243914	Prahlad et al.	Oct 2008	A1
20080243957	Prahlad et al.	Oct 2008	A1
20080243958	Prahlad et al.	Oct 2008	A1
20080256173	Ignatius et al.	Oct 2008	A1
20090319534	Gokhale	Dec 2009	A1
20090319585	Gokhale	Dec 2009	A1
20100005259	Prahlad	Jan 2010	A1
20100131461	Prahlad et al.	May 2010	A1

Foreign Referenced Citations (27)

Number	Date	Country
0 259 912	Mar 1988	EP
0259912	Mar 1988	EP
0 405 926	Jan 1991	EP
0405926	Jan 1991	EP
0 467 546	Jan 1992	EP
0467546	Jan 1992	EP
0 774 715	May 1997	EP
0 809 184	Nov 1997	EP
0809184	Nov 1997	EP
0862304	Sep 1998	EP
0 899 662	Mar 1999	EP
0899662	Mar 1999	EP
0 981 090	Feb 2000	EP
0981090	Feb 2000	EP
1174795	Jan 2002	EP
1115064	Dec 2004	EP
2366048	Feb 2002	GB
9513580	May 1995	WO
WO 9839707	Sep 1998	WO
WO 9839709	Sep 1998	WO
9912098	Mar 1999	WO
WO 9912098	Mar 1999	WO
WO 9914692	Mar 1999	WO
WO 9917204	Apr 1999	WO
WO 0205466	Jan 2002	WO
WO 2004090788	Oct 2004	WO
WO 2005055093	Jun 2005	WO

Non-Patent Literature Citations (36)

Entry
Armstead et al., “Implementation of a Campus-wide Distributed Mass Storage Service: The Dream vs. Reality,” IEEE, 1995, pp. 190-199.
Arneson, “Mass Storage Archiving in Network Environments,” IEEE, 1998, pp. 45-50.
Cabrera et al., “ADSM: A Multi-Platform, Scalable, Backup and Archive Mass Storage System,” Digest of Papers of the Computer Society Conference, IEEE Comp. Soc. Press, vol. Conf. (Mar. 5, 1995), pp. 420-427.
Jander, M., “Launching Storage-Area Net,” Data Communications, US, McGraw Hill, NY, vol. 27, No. 4 (Mar. 21, 1998), pp. 64-72.
Rosenblum et al., “The Design and Implementation of a Log-Structure File System,” Operating Systems Review SIGOPS, vol. 25, No. 5, New York, US, pp. 1-15 (May 1991).
Jason Gait, “The Optical File Cabinet: A Random-Access File System for Write-Once Optical Disks,” IEEE Computer, vol. 21, No. 6, pp. 11-22 (1988) (see in particular figure 5 in p. 15 4nd the recitation in claim 5).
U.S. Appl. No. 10/803,542, filed Mar. 18, 2004, Ignatius, Paul et al.
U.S. Appl. No. 11/269,513, filed Nov. 7, 2005, Prahlad, et al.
U.S. Appl. No. 11/269,520, filed Nov. 7, 2005, Gokhale, et al.
U.S. Appl. No. 11/738,914, filed Apr. 23, 2007, Ignatius, Paul et al.
Arneson, “Development of Omniserver; Mass Storage Systems,” Control Data Corporation, 1990, pp. 88-93.
Ashton, et al., “Two Decades of policy-based storage management for the IBM mainframe computer”, www.research.ibm.com, 19 pages, published Apr. 10, 2003, printed Jan. 3, 2009., www.research.ibm.com, Apr. 10, 2003, pp. 19.
Canadian Office Action, Application No. CA/2499073, dated Dec. 24, 2010.
Canadian Office Action, Application No. CA/2499073, dated Oct. 7, 2009.
Canadian Office Action, Application No. CA/2544063, dated Dec. 10, 2009.
Commvault Systems, Inc., Continuous Data Replicator 7.0, Product Data Sheet, 2007.
Eitel, “Backup and Storage Management in Distributed Heterogeneous Environments,” IEEE, 1994, pp. 124-126.
European Office Action, Application No. EP/019067693, dated Aug. 12, 2009.
European Office Action, Application No. EP/019067693, dated Sep. 24, 2007.
Farley, “Building Storage Networks,” pp. 328-331, Osborne/McGraw-Hill, 2000.
Gait, “The Optical File Cabinet: A Random-Access File system for Write-Once Optical Disks,” IEEE Computer, vol. 21, No. 6, pp. 11-22 (1988).
Gibson, “Network Attached Storage Architecture,” pp. 37-45, ACM, Nov. 2000.
Great Britian Office Action, Application No. GB/06116685, dated Nov. 14, 2006.
http://en.wikipedia.org/wiki/Naive—Bayes—classifier, printed on Jun. 1, 2010, in 7 pages.
Indian Office Action, Application No. IN/1625/KOLNP/2006, dated May 10, 2010.
Indian Office Action, Application No. IN/656/CHENP/2005, dated Jun. 29, 2006.
International Search Report and International Preliminary Report on Patentability, PCT/US2003/029105, dated Apr. 12, 2005.
International Search Report and Written Opinion dated Nov. 13, 2009, PCT/US2007/081681.
International Search Report, PCT/US2001/002931, dated Jun. 3, 2002.
Israeli Office Action, Application No. IL/175036, dated Jul. 6, 2010.
Jander, “Launching Storage-Area Net,” Data Communications, US, McGraw Hill, NY, vol. 27, No. 4(Mar. 21, 1998), pp. 64-72.
Recycle Bin (Windows), Aug. 2007, Wikipedia, pp. 1-3.
Szor, The Art of Virus Research and Defense, Symantec Press (2005) ISBN 0-321-30454-3.
Witten et al., Data Mining: Practical Machine Learning Tools and Techniques, Ian H. Witten & Eibe Frank, Elsevier (2005) ISBN 0-12-088407-0.
Written Opinion; International Application No. PCT/US05/40606; mailed Feb. 14, 2007; 5 pages.
Written Opinion; International Application No. PCT/US05/40606; mailed Mar. 1, 2007; 5 pages.

Related Publications (1)

	Number	Date	Country
	20040225834 A1	Nov 2004	US

Provisional Applications (1)

	Number	Date	Country
	60411202	Sep 2002	US

Combined stream auxiliary copy system and method

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract