Archiving data objects using secondary copies

Description

BACKGROUND

A primary copy of data is generally a production copy or other “live” version of the data which is used by a software application and is generally in the native format of that application. Primary copy data may be maintained in a local memory or other high-speed storage device that allows for relatively fast data access if necessary. Such primary copy data is typically intended for short term retention (e.g., several hours or days) before some or all of the data is stored as one or more secondary copies, for example, to prevent loss of data in the event a problem occurred with the data stored in primary storage.

To protect primary copy data or for other purposes, such as regulatory compliance, secondary copies (alternatively referred to as “data protection copies”) can be made. Examples of secondary copies include a backup copy, a snapshot copy, a hierarchical storage management (“HSM”) copy, an archive copy, and other types of copies.

A backup copy is generally a point-in-time copy of the primary copy data stored in a backup format as opposed to in native application format. For example, a backup copy may be stored in a backup format that is optimized for compression and efficient long-term storage. Backup copies generally have relatively long retention periods and may be stored on media with slower retrieval times than other types of secondary copies and media. In some cases, backup copies may be stored at an offsite location.

After an initial, full backup of a data set is performed, periodic, intermittent, or continuous incremental backup operations may be subsequently performed on the data set. Each incremental backup operation copies only the primary copy data that has changed since the last full or incremental backup of the data set was performed. In this way, even if the entire set of primary copy data that is backed up is large, the amount of data that must be transferred during each incremental backup operation may be significantly smaller, since only the changed data needs to be transferred to secondary storage. Combined, one or more full backup and subsequent incremental copies may be utilized together to periodically or intermittently create a synthetic full backup copy. More details regarding synthetic storage operations are found in commonly-assigned U.S. patent application Ser. No. 12/510,059, entitled “Snapshot Storage and Management System with Indexing and User Interface,” filed Jul. 27, 2009, now U.S. Pat. No. 7,873,806, which is hereby incorporated herein in its entirety.

An archive copy is generally a copy of the primary copy data, but typically includes only a subset of the primary copy data that meets certain criteria and is usually stored in a format other than the native application format. For example, an archive copy might include only that data from the primary copy that is larger than a given size threshold or older than a given age threshold and that is stored in a backup format. Often, archive data is removed from the primary copy, and a stub is stored in the primary copy to indicate its new location. When a user requests access to the archive data that has been removed or migrated, systems use the stub to locate the data and often make recovery of the data appear transparent, even though the archive data may be stored at a location different from the remaining primary copy data.

Archive copies are typically created and tracked independently of other secondary copies, such as other backup copies. For example, to create a backup copy, the data storage system transfers a secondary copy of primary copy data to secondary storage and tracks the backup copy using a backup index separate from the archive index. To create an archive copy, a conventional data storage system transfers the primary copy data to be archived to secondary storage to create an archive copy, replaces the primary copy data with a stub, and tracks the archive copy using an archive index. Accordingly, the data storage system will transfer two separate times to secondary storage a primary copy data object that is both archived and backed-up.

Since each transfer consumes network and computing resources, the data storage system may not be able to devote such resources to other tasks. Moreover, the data storage system is required to devote resources to maintaining each separate index. In some cases, the archive index may be unaware of the other secondary copy and the other secondary index may be unaware of the archive copy, which may lead to further inefficiencies. Moreover, in some cases, in the event that an archive copy is moved or transferred (e.g., to another tier of secondary storage), the archive index may not be able to be updated to reflect the move or transfer. In such cases, the data storage system may be unable to use the stub to locate the archived data object.

Also, in conventional systems, archiving operations may require the transfer of large quantities of data during a single archive operation. For example, the retention criteria for an organization may specify that data objects more than two years old should be archived. On the first day of the organization's operation, it may be entirely unnecessary to archive any data, since the only data that exists at that point is newly created and thus ineligible for archiving. However, over the course of two years of operations, the organization may amass large quantities of data. Thus, when the first archive operation finally occurs, e.g., approximately two years into the operation of the organization, it may be necessary to transfer a large amount of the organization's data.

Additionally, backup, archive, and other secondary storage operations may unnecessarily preserve secondary copies of data created from primary data that has been deleted or is otherwise no longer being actively used as production data by a computing system, such as a workstation or server. Thus, secondary storage requirements may increasingly and unnecessarily bloat over time.

The need exists for systems and methods that overcome the above problems, as well as systems and methods that provide additional benefits. Overall, the examples herein of some prior or related systems and methods and their associated limitations are intended to be illustrative and not exclusive. Other limitations of existing or prior systems and methods will become apparent to those of skill in the art upon reading the following Detailed Description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an environment in which a system for archiving data objects using secondary copies operates.

FIG. 2 is a flow diagram illustrating a process implemented by the system in connection with archiving data objects using secondary copies.

FIG. 3 is a flow diagram illustrating a process implemented by the system in connection with reclaiming space used to store secondary copies.

FIGS. 4A-4C are data structure diagrams illustrating data structures used by the system.

FIG. 5 is a block diagram illustrating a data storage system in which the system operates.

DETAILED DESCRIPTION

The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the disclosure.

Overview

A software, firmware, and/or hardware system for archiving data objects using secondary copies (the “system”) is disclosed. The system creates one or more secondary copies of primary copy data (e.g., production data stored by a production computing system). The primary copy data contains multiple data objects (e.g., multiple files, emails, or other logical groupings or collections of data). The system maintains a first data structure that tracks the data objects for which the system has created secondary copies and the locations of the secondary copies.

To archive data objects in the primary copy data, the system applies rules to determine which data objects are to be archived. The system then verifies that previously-created secondary copies of data objects to be archived exist and replaces the data objects with stubs, pointers or logical addresses. The system maintains a second data structure that both tracks the stubs and refers to the first data structure, thereby creating an association between the stubs and the locations of the secondary copies. Notably, the system archives data objects without creating an additional or other secondary copy of the data objects. Instead, the association between the two data structures allows stubs to point to or refer to the previously-created secondary copy of the data objects. Accordingly, the existence of the previously-created secondary copy of the data objects allows the system to forego creating an additional or other secondary copy of the data objects, thereby saving resources.

The system may also perform a process to reclaim space used to store secondary copies. To do so, the system scans or analyzes the primary copy data to identify the data objects that exist in the primary copy data and stores the results of the scan or analysis in a third data structure. The system then compares the first and third data structures (e.g., the system performs a difference of the first and third data structures) to determine which data objects in the primary copy data have been deleted. For each deleted data object, the system updates the corresponding entry in the first data structure. Then the system accesses the first data structure and determines 1) which data objects in the primary copy data have not been deleted and 2) which have been deleted, but whose deletion occurred less than a predetermined period of time ago. For each data object determined in this fashion, the system then creates, from the first secondary copy of the data object, a second secondary copy of the data object. The system can then create a new first data structure or update the existing first data structure to reflect the second secondary copies of the data objects.

Various examples of the invention will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One skilled in the relevant art will understand, however, that the invention may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that the invention may include many other obvious features not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description.

The terminology used below is to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the invention. Indeed, certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section.

Illustrative Environment

FIG. 1 is a block diagram illustrating an environment 100 in which the system may operate. The environment 100 includes one or more clients 130, one or more primary data stores 160, a secondary storage computing device 165 (alternatively referred to as a “media agent”), and one or more storage devices 115. Each of the clients 130 is a computing device, examples of which are described herein. Clients may be, as non-exclusive examples, servers, workstations, personal computers, computerized tablets, PDAs, smart phones, or other computers having social networking data, such as a Facebook data. The clients 130 are each connected to one or more associated primary data stores 160 and to the secondary storage computing device 165. The secondary storage computing device 165 is connected to the storage device 115. The primary data stores 160 and storage device 115 may each be any type of storage suitable for storing data, such as Directly-Attached Storage (DAS) such as hard disks, a Storage Area Network (SAN), e.g., a Fibre Channel SAN, an iSCSI SAN or other type of SAN, Network-Attached Storage (NAS), a tape library, or any other type of storage. The clients 130 and the secondary storage computing device 165 typically include application software to perform desired operations and an operating system on which the application software runs. The clients 130 and the secondary storage computing device 165 typically also include a file system that facilitates and controls file access by the operating system and application software. The file system facilitates access to local and remote storage devices for file or data access and storage.

The clients 130, as part of their functioning, utilize data, which includes files, directories, metadata (e.g., ACLs, descriptive metadata, and any other streams associated with the data), and other data objects, which may be stored in the primary data store 160. The data of a client 130 is generally a primary copy (e.g., a production copy). Although described as a “client” of the secondary storage computing device 165, a client 130 may in fact be a production server, such as a file server or Exchange server, which provides live production data to multiple user workstations as part of its function. Each client 130 includes a data agent 195 (described in more detail with reference to FIG. 5). During a copy, backup, archive, or other storage operation, the data agents 195 send a copy of data objects in a primary data store 160 to the secondary storage computing device 165.

The secondary storage computing device 165 includes a memory 114. The memory 114 includes software 116 incorporating components 118 and data 119 typically used by the system. The components 118 include a secondary copy component 128 that performs secondary copy operations and a pruning component 129 that performs space reclamation or pruning operations. The data 119 includes secondary copy data structure 122, stubs data structure 124, and primary copy data structure 126. The system uses the data 119 to, among other things, track data objects copied during archive and other secondary copy operations and to track data objects in primary copy data.

While items 118 and 119 are illustrated as stored in memory 114, those skilled in the art will appreciate that these items, or portions of them, may be transferred between memory 114 and a persistent storage device 106 (for example, a magnetic hard drive, a tape of a tape library, etc.) for purposes of memory management, data integrity, and/or other purposes.

The secondary storage computing device 165 further includes one or more central processing units (CPU) 102 for executing software 116, and a computer-readable media drive 104 for reading information or installing software 116 from tangible computer-readable storage media, such as a floppy disk, a CD-ROM, a DVD, a USB flash drive, and/or other tangible computer-readable storage media. The secondary storage computing device 165 also includes one or more of the following: a network connection device 108 for connecting to a network, an information input device 110 (for example, a mouse, a keyboard, etc.), and an information output device 112 (for example, a display).

Illustrative Archiving Process and Data Structures

FIG. 2 is a flow diagram illustrating a process 200 implemented by the system in connection with archiving data objects using secondary copies in some examples. The process 200 begins at step 205, where the system creates a full secondary copy of the primary copy data of a client 130, by creating a secondary copy of the entire primary copy data and transferring the secondary copy to the storage device 115. The system may also create one or more incremental copies of the primary copy data by transferring only the primary copy data that has changed since the time of the full copy or a previous incremental copy. For example, the system may perform only a single full backup of all the primary copy data that is to be protected (as defined, for example, by a storage policy or other criteria) and store the full backup on the storage device 115. Thereafter, the system may then create weekly, daily, periodic, intermittent or continuous incremental backup copies of only the primary copy data that has changed since the system performed the last backup operation. In such examples, periodically the system may use one or more of the full backup, incremental backups, and/or previous synthetic full backups to generate a new synthetic full backup copy via a synthetic full operation. As part of a synthetic full backup operation, the system may process data objects that have been deleted from the primary copy of the data and remove these data objects from the synthetic full copy. In some examples, the generation of a new synthetic full backup copy or other synthetic full operation requires reading one or more previous backup copies or other types of secondary copies, rehydrating or decompressing the previous secondary copy or copies, and re-deduplicating the previous secondary copy or copies. In other examples, the generation of a new synthetic full backup copy or other synthetic operation does not require reading, rehydrating, or re-deduplicating a previous backup or other secondary copy. Instead, reference counts may be updated and metadata may be added to the synthetic full copy.

At step 210 the system adds entries to the secondary copy data structure 122. FIG. 4A is a data structure diagram illustrating the secondary copy data structure 122. The secondary copy data structure 122 contains rows, such as rows 425a and 425b, each divided into the following columns: an ID column 405 containing an identifier of a data object (e.g., a globally unique identifier—GUID), a primary copy location column 410 containing the location of the primary copy of the data object, a secondary copy location column 415 containing the location of the secondary copy of the data object, and a deletion time column 420 containing a time stamp of when the primary copy of the data object was deleted. The secondary copy data structure 122 may also include other columns that may contain additional data about data objects.

Although absolute locations for the primary copy and the secondary copy are shown in FIG. 4A, the system may additionally or alternatively use relative locations to indicate the locations of data objects in the secondary copy data structure 122. For example, the system may store secondary copies of data objects using a logical archive file and specify a relative location within the logical archive file for a secondary copy location. As another example, the system may store secondary copies of data objects on tape and specify a tape and an offset within the tape for a secondary copy location. Those of skill in the art will understand that secondary copies can be stored using varied techniques and that the system is not limited to the techniques expressly illustrated or described in this disclosure.

Moreover, although FIG. 4A illustrates entries corresponding to files in the secondary copy data structure 122, the disclosed techniques may also be used with other types of data objects, such as emails and email attachments, database or spreadsheet objects, data blocks, and other data objects stored in other data repositories. Accordingly, the disclosure is not to be construed as limited solely to files.

The system may utilize a single secondary copy data structure 122 for each client 130 (or subclient thereof) or for each set of data subject to data protection operations, which may be the data of a single client 130 or the data of multiple clients 130. Additionally or alternatively, the system may use a single secondary copy data structure 122 for multiple clients 130 or for multiple sets of data subject to data protection operations, which may be the data of a single client 130 or the data of multiple clients 130. In such a case, the secondary copy data structure 122 may contain additional columns containing data that allows for differentiation of data associated with different clients 130 or different sets of data.

In adding entries for each new copy of a data object, the system adds a new row 425 to the secondary copy data structure 122. The system may generate the identifier for each secondary copy of a data object created and, in the new row 425, add the identifier to column 405, add the primary copy location of the data object to column 410, and add the secondary copy location to column 415. The system may also store additional data as part of step 210, such as in other columns of the secondary copy data structure 122 or in other data structures.

Returning to FIG. 2, at step 215 the system identifies data objects in the primary copy data that are to be archived. For example, the system may apply one or more rules or criteria based on any combination of data object type, data object age, data object size, percentage of disk quota, remaining storage, metadata (e.g., a flag or tag indicating importance) and/or other factors. At step 220 the system verifies that a secondary copy of each data object has been made. To do so, the system may access the secondary copy data structure 122 to determine that secondary copies of the identified data objects exist. Also at step 220, the system obtains a token for each identified data object. The token represents confirmation or verification that a secondary copy of a data object was previously created, and is typically unique for each data object. At step 225, the system replaces each of the identified data objects in the primary copy data with a stub containing the token. The stub is typically a small data object that indicates, points to, or refers to the location of the secondary copy of the data object and facilitates recovery of the data object. More details as to archiving operations may be found in the commonly-assigned currently pending U.S. Patent Application Number 2008/0229037, the entirety of which is incorporated by reference herein.

At step 230 the system copies the stubs in the primary copy data to the storage device 115. At step 235 the system adds entries to the stubs data structure 124. FIG. 4B is a data structure diagram illustrating the stubs data structure 124. The stubs data structure 124 contains rows, such as rows 465a and 465b, each divided into the following columns: an ID column 455 containing the identifier of a data object (e.g., the GUID) and a token column 460 containing the token previously created or generated for the data object. The stubs data structure 124 may also include other columns that may contain additional data about data objects. The system may utilize a single stubs data structure 124 for a single data objects data structure 122, a single stubs data structure 124 for multiple data objects data structures 122, and/or multiple stubs data structures 124 for multiple data objects data structures 122.

In adding entries, the system adds a new row 465 to the stubs data structure 124. In the new row 465 the system adds the identifier that corresponds to the data object associated with the stub to column 455 and the token obtained in step 220 to column 460. The system may also store additional data as part of step 235, such as in other columns of the stubs data structure 124 or in other data structures. The entries in rows 465a and 465b indicate that the system archived the data objects identified in rows 425a and 425d, respectively, of the secondary copy data structure 122. Also in step 235 the system adds entries to the secondary copy data structure 122 for the stubs. In FIG. 4A, rows 425f and 425g correspond to the entries for the stubs.

Returning to FIG. 2, at step 240, the system determines which data objects in the primary copy data have been deleted. The system may use various techniques to determine which data objects in the primary copy data have been deleted. For example, the system may scan or analyze the primary copy data on a periodic or ad-hoc basis, and populate a data structure that contains entries for each of the data objects in the primary copy data. FIG. 4C is a data structure diagram illustrating the primary copy data structure 126. The primary copy data structure 126 (alternatively referred to as an “image map”) is generally similar to the secondary copy data structure 122 but contains entries only for data objects existing in the primary copy data as of the most recent scan or analysis of the primary copy data. To determine the data objects that have been deleted, the system can compare the secondary copy data structure 122 with the primary copy data structure 126. The data objects that are in the secondary copy data structure 122 but not in the primary copy data structure 126 are the data objects that have been deleted. Additionally or alternatively, the system can use other techniques to determine when a data object in the primary copy data has been deleted, such as by receiving information from a driver or file system filter on the client 130 that detects such deletions. Additionally or alternatively, the system can predict if and when a data object in primary copy data has been deleted based upon information available to the system, such as heuristics or historical data.

Returning to FIG. 2, at step 245 the system updates the entries in the secondary copy data structure 122 corresponding to the deleted data objects to include their deletion times. The system may use the time of the last scan or analysis as the deletion times or may use the actual deletion times of the data objects. After step 245, the process 200 concludes.

Those of skill in the art will understand that the process 200 may be varied while still coming within the general scope of the process 200. For example, if the system cannot verify that a secondary copy of the data object was previously created, the system may not archive the data object in the primary copy data. Alternatively, in such a case, the system may create a secondary copy of the data object and add an entry to the secondary copy data structure 122 before archiving the data object. Alternatively, the system may flag the data object for later archiving after the system has created a secondary copy of the data object at a later time. The system may perform other variations of the process 200.

Illustrative Space Reclamation Process

FIG. 3 is a flow diagram illustrating a process 300 implemented by the system in connection with reclaiming space used to store secondary copies in some examples (alternatively referred to as “pruning data”). The process 300 begins at step 305 where the system accesses the secondary copy data structure 122. At step 310, the system begins iterating through each entry in the secondary copy data structure 122. At step 315, the system determines whether the data object in the primary copy data identified in the entry has been deleted. If not, the process 300 continues to step 320, where the system creates a second secondary copy of the data object from the first secondary copy, and may delete the first secondary copy either immediately or at a later time, e.g., at the conclusion of the process 300. For example, the data object identified in row 425a of the secondary copy data structure 122, because it has no deletion time, has not been deleted. The system can create the second secondary copy of the data object on the same media as the first secondary copy or on different media (e.g., if the first secondary copy is stored on disk, the system can create the second secondary copy on another disk, on tape, and/or on a cloud storage service).

If the system determines that the data object in the primary copy data has been deleted, the process 300 continues to step 335, where the system determines whether the deletion time of the data object is longer ago than a predetermined, configurable, period of time (e.g., longer than one year ago). For example, the data object identified in row 425b, because it has a deletion time, has been deleted. If not (e.g., the data object was deleted less than a year ago), the process 300 continues to step 320, described above. If the deletion time of the data object is longer ago than the predetermined period of time, the process 300 skips step 320 (skips the step of creating a second secondary copy of the data object). Additionally, the system may delete the secondary copy of the long-deleted data object either immediately or at a later time, e.g., at the conclusion of the process 300. For example, if the system is performing the process 300 on Sep. 30, 2010 and the predetermined period of time is 90 days, then the system would not create a second secondary copy of the data object identified in row 425b because it was deleted on Jun. 25, 2010. However, the system would create a second secondary copy of the data object identified in row 425e because it was deleted on Jul. 10, 2010, which is less than 90 days before Sep. 30, 2010.

The predetermined period of time acts as a timer that starts when a data object in primary copy data has been deleted (or when the system detects the deletion). After the timer has expired, the system no longer needs to store the secondary copy of the data object. Storing the secondary copy of the data object for a period of time past the deletion time of the data object in primary copy data allows the secondary copy of the data object to be retrieved or recalled if, for example, the data object needed to be recovered to satisfy an e-discovery or legal hold request. The predetermined period of time can be set according to archival rules or storage policies (e.g., to comply with e-discovery or other requirements). The predetermined period may vary based on the type of data object. For example, certain types of data objects (e.g., financial data) may have a longer predetermined period of time than other types of data (e.g., personal emails). The system may determine the data type by content indexing the data objects or by accessing data classifications of the data objects.

Moreover, the predetermined period of time allows for data objects to be recovered in the case of accidental or unintended deletion or in case data objects appear to have been deleted. For example, if a user accidentally or unintentionally deletes a data object in primary copy data, the user has until at least the expiration of the predetermined period of time to discover the accidental or unintended deletion and request that the deleted data object be recovered. As another example, if a volume containing a set of data objects becomes unmounted, upon scanning or analyzing the primary copy data, the system would determine that the data objects have been deleted and accordingly update the corresponding entries in the secondary copy data structure 122. As long as the volume is remounted prior to the predetermined periods of time, the system will not delete the secondary copies of the data objects. When the volume is remounted, the system can recognize that the data objects are already tracked in the secondary copy data structure 122 and remove the deletion times from the corresponding entries in the secondary copy data structure 122.

At step 325 the system moves to the next entry in the secondary copy data structure 122 and performs the above steps with respect to the data object identified in the next entry. After the system has iterated through all of the entries in the secondary copy data structure 122, the process 300 continues at step 330, where the system generates a new secondary copy data structure 122 that includes entries corresponding to only the data objects for which the system created second secondary copies. The new secondary copy data structure 122 also includes the locations of the second secondary copies of the data objects. At step 330, the system may also delete the old secondary copy data structure. After step 330 the process 300 concludes.

Those of skill in the art will understand that the process 300 may be varied while still coming within the general scope of the process 300. For example, to prune data, instead of creating second secondary copies of data objects from the first secondary copies of data objects, the system may instead delete certain first secondary copies of data objects, e.g., those data objects having a deletion time longer ago than a predetermined, configurable, period of time. Instead of or in addition to creating a new secondary copy data structure 122, the system may delete rows from the existing secondary copy data structure 122 corresponding to the data objects having a deletion time longer ago than a predetermined, configurable, period of time, for which the system did not create second secondary copies. The system may also update the secondary copy locations of the rows corresponding to the data objects for which the system did create second secondary copies. As another example, instead of pruning a secondary copy of a data object in response to the deletion of the data object in the primary copy data, the system may additionally or alternatively prune a secondary copy of a data object when other criteria are met, such as criteria relating to the creation time, modification time, size, file type, or other characteristics of the data object in the primary copy data. The system may perform other variations of the process 300.

One advantage of the techniques described herein is that the system can avoid creating additional secondary copies of data objects in primary copy data when archiving the data objects. Instead, the system can use the associations between the secondary copy data structure 122 and the stubs data structure 124 to point or refer stubs to the previously-created secondary copy of the data objects. Accordingly, the existence of the previously-created secondary copy of the data objects allows the system to forego creating another secondary copy of the data objects when archiving the data objects, thereby saving resources. Since the system only transfers a data object from primary storage to secondary storage once instead of twice (e.g., once for backup, once for archive), it may save network bandwidth and processing capacity. Moreover, since the system often transfers a set of data objects from primary storage to secondary storage during the course of several incremental secondary copy operations (e.g., during several incremental backup operations), the system may avoid a single, large data transfer when it later archives the same set of data objects. Instead, the set of data objects in primary storage may simply be replaced with stubs when the time comes to archive them. As another example, since the system only stores a single copy of each data object in secondary storage, instead of two copies, the total secondary storage capacity needed by the system may be reduced.

Yet another advantage of the techniques described herein is that the system can use a common set of data structures to track both archive operations and other secondary copy operations, thereby potentially simplifying the tracking of both types of operations. Another advantage is that since only one secondary copy of a data object needs to be created, other ancillary processes such as content-indexing, encryption, compression, data classification and/or deduplication or single-instancing of the secondary copy need only be performed once on the single secondary copy, instead of multiple times on each secondary copy.

Another advantage of the techniques described herein is that the secondary copy data structure 122 can be updated to account for moved or transferred secondary copies (e.g., data objects moved to another tier of secondary storage). Accordingly, the stub of a data object whose secondary copy was moved or transferred can still be used to locate and recall the moved or transferred data object.

Still another advantage of the techniques described herein is that by pruning data, e.g., in response to the deletion of corresponding primary data, the secondary storage capacity requirements are reduced.

Suitable Data Storage System

FIG. 5 illustrates an example of one arrangement of resources in a computing network, comprising a data storage system 500. The resources in the data storage system 500 may employ the processes and techniques described herein. The system 500 includes a storage manager 105, one or more data agents 195, one or more secondary storage computing devices 165, one or more storage devices 115, one or more computing devices 130 (called clients 130), one or more data or information stores 160 and 162, a single instancing database 123, an index 111, a jobs agent 120, an interface agent 125, and a management agent 131. The system 500 may represent a modular storage system such as the CommVault QiNetix system, and also the CommVault GALAXY backup system, available from CommVault Systems, Inc. of Oceanport, N.J., aspects of which are further described in the commonly-assigned U.S. patent application Ser. No. 09/610,738, now U.S. Pat. No. 7,035,880, the entirety of which is incorporated by reference herein. The system 500 may also represent a modular storage system such as the CommVault Simpana system, also available from CommVault Systems, Inc.

The system 500 may generally include combinations of hardware and software components associated with performing storage operations on electronic data. Storage operations include copying, backing up, creating, storing, retrieving, and/or migrating primary storage data (e.g., data stores 160 and/or 162) and secondary storage data (which may include, for example, snapshot copies, backup copies, hierarchical storage management (HSM) copies, archive copies, and other types of copies of electronic data stored on storage devices 115). The system 500 may provide one or more integrated management consoles for users or system processes to interface with in order to perform certain storage operations on electronic data as further described herein. Such integrated management consoles may be displayed at a central control system or several similar consoles distributed throughout multiple network locations to provide global or geographically specific network data storage information.

In one example, storage operations may be performed according to various storage preferences, for example, as expressed by a user preference, a storage policy, a schedule policy, and/or a retention policy. A “storage policy” is generally a data structure or other information source that includes a set of preferences and other storage criteria associated with performing a storage operation. The preferences and storage criteria may include, but are not limited to, a storage location, relationships between system components, network pathways to utilize in a storage operation, data characteristics, compression or encryption requirements, preferred system components to utilize in a storage operation, a deduplication, single instancing or variable instancing policy to apply to the data, and/or other criteria relating to a storage operation. For example, a storage policy may indicate that certain data is to be stored in the storage device 115, retained for a specified period of time before being aged to another tier of secondary storage, copied to the storage device 115 using a specified number of data streams, etc.

A “schedule policy” may specify a frequency with which to perform storage operations and a window of time within which to perform them. For example, a schedule policy may specify that a storage operation is to be performed every Saturday morning from 2:00 a.m. to 4:00 a.m. In some cases, the storage policy includes information generally specified by the schedule policy. (Put another way, the storage policy includes the schedule policy.) A “retention policy” may specify how long data is to be retained at specific tiers of storage or what criteria must be met before data may be pruned or moved from one tier of storage to another tier of storage. Storage policies, schedule policies and/or retention policies may be stored in a database of the storage manager 105, to archive media as metadata for use in restore operations or other storage operations, or to other locations or components of the system 500.

The system 500 may comprise a storage operation cell that is one of multiple storage operation cells arranged in a hierarchy or other organization. Storage operation cells may be related to backup cells and provide some or all of the functionality of backup cells as described in the assignee's U.S. patent application Ser. No. 09/354,058, now U.S. Pat. No. 7,395,282, which is incorporated herein by reference in its entirety. However, storage operation cells may also perform additional types of storage operations and other types of storage management functions that are not generally offered by backup cells.

Storage operation cells may contain not only physical devices, but also may represent logical concepts, organizations, and hierarchies. For example, a first storage operation cell may be configured to perform a first type of storage operations such as HSM operations, which may include backup or other types of data migration, and may include a variety of physical components including a storage manager 105 (or management agent 131), a secondary storage computing device 165, a client 130, and other components as described herein. A second storage operation cell may contain the same or similar physical components; however, it may be configured to perform a second type of storage operations, such as storage resource management (SRM) operations, and may include monitoring a primary data copy or performing other known SRM operations.

Thus, as can be seen from the above, although the first and second storage operation cells are logically distinct entities configured to perform different management functions (i.e., HSM and SRM, respectively), each storage operation cell may contain the same or similar physical devices. Alternatively, different storage operation cells may contain some of the same physical devices and not others. For example, a storage operation cell configured to perform SRM tasks may contain a secondary storage computing device 165, client 130, or other network device connected to a primary storage volume, while a storage operation cell configured to perform HSM tasks may instead include a secondary storage computing device 165, client 130, or other network device connected to a secondary storage volume and not contain the elements or components associated with and including the primary storage volume. (The term “connected” as used herein does not necessarily require a physical connection; rather, it could refer to two devices that are operably coupled to each other, communicably coupled to each other, in communication with each other, or more generally, refer to the capability of two devices to communicate with each other.) These two storage operation cells, however, may each include a different storage manager 105 that coordinates storage operations via the same secondary storage computing devices 165 and storage devices 115. This “overlapping” configuration allows storage resources to be accessed by more than one storage manager 105, such that multiple paths exist to each storage device 115 facilitating failover, load balancing, and promoting robust data access via alternative routes.

Alternatively or additionally, the same storage manager 105 may control two or more storage operation cells (whether or not each storage operation cell has its own dedicated storage manager 105). Moreover, in certain embodiments, the extent or type of overlap may be user-defined (through a control console) or may be automatically configured to optimize data storage and/or retrieval.

Data agent 195 may be a software module or part of a software module that is generally responsible for performing storage operations on the data of the client 130 stored in data store 160/162 or other memory location. Each client 130 may have at least one data agent 195 and the system 500 can support multiple clients 130. Data agent 195 may be distributed between client 130 and storage manager 105 (and any other intermediate components), or it may be deployed from a remote location or its functions approximated by a remote process that performs some or all of the functions of data agent 195.

The overall system 500 may employ multiple data agents 195, each of which may perform storage operations on data associated with a different application. For example, different individual data agents 195 may be designed to handle Microsoft Exchange data, UNIX data, Lotus Notes data, Microsoft Windows file system data, Microsoft Active Directory Objects data, and other types of data known in the art. Other embodiments may employ one or more generic data agents 195 that can handle and process multiple data types rather than using the specialized data agents described above.

If a client 130 has two or more types of data, one data agent 195 may be required for each data type to perform storage operations on the data of the client 130. For example, to back up, migrate, and restore all the data on a Microsoft Exchange server, the client 130 may use one Microsoft Exchange Mailbox data agent 195 to back up the Exchange mailboxes, one Microsoft Exchange 2000 Database data agent 195 to back up the Exchange databases, one Microsoft Exchange 2000 Public Folder data agent 195 to back up the Exchange 2000 Public Folders, and one Microsoft Windows File System data agent 195 to back up the file system of the client 130. These data agents 195 would be treated as four separate data agents 195 by the system even though they reside on the same client 130.

Alternatively, the overall system 500 may use one or more generic data agents 195, each of which may be capable of handling two or more data types. For example, one generic data agent 195 may be used to back up, migrate and restore Microsoft Exchange 2000 Mailbox data and Microsoft Exchange Database data while another generic data agent 195 may handle Microsoft Exchange Public Folder data and Microsoft Windows File System data, etc.

Data agents 195 may be responsible for arranging or packing data to be copied or migrated into a certain format such as an archive file. Nonetheless, it will be understood that this represents only one example, and any suitable packing or containerization technique or transfer methodology may be used if desired. Such an archive file may include metadata, a list of files or data objects copied, the file, and data objects themselves. Moreover, any data moved by the data agents may be tracked within the system by updating indexes associated with appropriate storage managers 105 or secondary storage computing devices 165. As used herein, a file or a data object refers to any collection or grouping of bytes of data that can be viewed as one or more logical units.

Generally speaking, storage manager 105 may be a software module or other application that coordinates and controls storage operations performed by the system 500. Storage manager 105 may communicate with some or all elements of the system 500, including clients 130, data agents 195, secondary storage computing devices 165, and storage devices 115, to initiate and manage storage operations (e.g., backups, migrations, data recovery operations, etc.).

Storage manager 105 may include a jobs agent 120 that monitors the status of some or all storage operations previously performed, currently being performed, or scheduled to be performed by the system 500. (One or more storage operations are alternatively referred to herein as a “job” or “jobs.”) Jobs agent 120 may be communicatively coupled to an interface agent 125 (e.g., a software module or application). Interface agent 125 may include information processing and display software, such as a graphical user interface (“GUI”), an application programming interface (“API”), or other interactive interface through which users and system processes can retrieve information about the status of storage operations. For example, in an arrangement of multiple storage operations cell, through interface agent 125, users may optionally issue instructions to various storage operation cells regarding performance of the storage operations as described and contemplated herein. For example, a user may modify a schedule concerning the number of pending snapshot copies or other types of copies scheduled as needed to suit particular needs or requirements. As another example, a user may employ the GUI to view the status of pending storage operations in some or all of the storage operation cells in a given network or to monitor the status of certain components in a particular storage operation cell (e.g., the amount of storage capacity left in a particular storage device 115).

Storage manager 105 may also include a management agent 131 that is typically implemented as a software module or application program. In general, management agent 131 provides an interface that allows various management agents 131 in other storage operation cells to communicate with one another. For example, assume a certain network configuration includes multiple storage operation cells hierarchically arranged or otherwise logically related in a WAN or LAN configuration. With this arrangement, each storage operation cell may be connected to the other through each respective interface agent 125. This allows each storage operation cell to send and receive certain pertinent information from other storage operation cells, including status information, routing information, information regarding capacity and utilization, etc. These communications paths may also be used to convey information and instructions regarding storage operations.

For example, a management agent 131 in a first storage operation cell may communicate with a management agent 131 in a second storage operation cell regarding the status of storage operations in the second storage operation cell. Another illustrative example includes the case where a management agent 131 in a first storage operation cell communicates with a management agent 131 in a second storage operation cell to control storage manager 105 (and other components) of the second storage operation cell via management agent 131 contained in storage manager 105.

Another illustrative example is the case where management agent 131 in a first storage operation cell communicates directly with and controls the components in a second storage operation cell and bypasses the storage manager 105 in the second storage operation cell. If desired, storage operation cells can also be organized hierarchically such that hierarchically superior cells control or pass information to hierarchically subordinate cells or vice versa.

Storage manager 105 may also maintain an index, a database, or other data structure 111. The data stored in database 111 may be used to indicate logical associations between components of the system, user preferences, management tasks, media containerization and data storage information or other useful data. For example, the storage manager 105 may use data from database 111 to track logical associations between secondary storage computing device 165 and storage devices 115 (or movement of data as containerized from primary to secondary storage).

Generally speaking, the secondary storage computing device 165, which may also be referred to as a media agent, may be implemented as a software module that conveys data, as directed by storage manager 105, between a client 130 and one or more storage devices 115 such as a tape library, a magnetic media storage device, an optical media storage device, or any other suitable storage device. In one embodiment, secondary storage computing device 165 may be communicatively coupled to and control a storage device 115. A secondary storage computing device 165 may be considered to be associated with a particular storage device 115 if that secondary storage computing device 165 is capable of routing and storing data to that particular storage device 115.

In operation, a secondary storage computing device 165 associated with a particular storage device 115 may instruct the storage device to use a robotic arm or other retrieval means to load or eject a certain storage media, and to subsequently archive, migrate, or restore data to or from that media. Secondary storage computing device 165 may communicate with a storage device 115 via a suitable communications path such as a SCSI or Fibre Channel communications link. In some embodiments, the storage device 115 may be communicatively coupled to the storage manager 105 via a SAN.

Each secondary storage computing device 165 may maintain an index, a database, or other data structure 161 that may store index data generated during storage operations for secondary storage (SS) as described herein, including creating a metabase (MB). For example, performing storage operations on Microsoft Exchange data may generate index data. Such index data provides a secondary storage computing device 165 or other external device with a fast and efficient mechanism for locating data stored or backed up. Thus, a secondary storage computing device index 161, or a database 111 of a storage manager 105, may store data associating a client 130 with a particular secondary storage computing device 165 or storage device 115, for example, as specified in a storage policy, while a database or other data structure in secondary storage computing device 165 may indicate where specifically the data of the client 130 is stored in storage device 115, what specific files were stored, and other information associated with storage of the data of the client 130. In some embodiments, such index data may be stored along with the data backed up in a storage device 115, with an additional copy of the index data written to index cache in a secondary storage device. Thus the data is readily available for use in storage operations and other activities without having to be first retrieved from the storage device 115.

Generally speaking, information stored in cache is typically recent information that reflects certain particulars about operations that have recently occurred. After a certain period of time, this information is sent to secondary storage and tracked. This information may need to be retrieved and uploaded back into a cache or other memory in a secondary computing device before data can be retrieved from storage device 115. In some embodiments, the cached information may include information regarding format or containerization of archives or other files stored on storage device 115.

One or more of the secondary storage computing devices 165 may also maintain one or more single instance databases 123. Single instancing (alternatively called data deduplication) generally refers to storing in secondary storage only a single instance of each data object (or data block) in a set of data (e.g., primary data). More details as to single instancing may be found in one or more of the following commonly-assigned U.S. patent applications: 1) U.S. patent application Ser. No. 11/269,512 (entitled SYSTEM AND METHOD TO SUPPORT SINGLE INSTANCE STORAGE OPERATIONS; 2) U.S. patent application Ser. No. 12/145,347 (entitled APPLICATION-AWARE AND REMOTE SINGLE INSTANCE DATA MANAGEMENT; or 3) U.S. patent application Ser. No. 12/145,342 (entitled APPLICATION-AWARE AND REMOTE SINGLE INSTANCE DATA MANAGEMENT, 4) U.S. patent application Ser. No. 11/963,623 (entitled SYSTEM AND METHOD FOR STORING REDUNDANT INFORMATION; 5) U.S. patent application Ser. No. 11/950,376 (entitled SYSTEMS AND METHODS FOR CREATING COPIES OF DATA SUCH AS ARCHIVE COPIES; or 6) U.S. Pat App. No. 61/100,686 (entitled SYSTEMS AND METHODS FOR MANAGING SINGLE INSTANCING DATA, each of which is incorporated by reference herein in its entirety.

In some examples, the secondary storage computing devices 165 maintain one or more variable instance databases. Variable instancing generally refers to storing in secondary storage one or more instances, but fewer than the total number of instances, of each data block (or data object) in a set of data (e.g., primary data). More details as to variable instancing may be found in the commonly-assigned U.S. Pat. App. No. 61/164,803 (entitled STORING A VARIABLE NUMBER OF INSTANCES OF DATA OBJECTS).

In some embodiments, certain components may reside and execute on the same computer. For example, in some embodiments, a client 130 such as a data agent 195, or a storage manager 105, coordinates and directs local archiving, migration, and retrieval application functions as further described in the previously-referenced U.S. patent application Ser. No. 09/610,738. This client 130 can function independently or together with other similar clients 130.

As shown in FIG. 5, each secondary storage computing devices 165 has its own associated metabase 161. Each client 130 may also have its own associated metabase 170. However in some embodiments, each “tier” of storage, such as primary storage, secondary storage, tertiary storage, etc., may have multiple metabases or a centralized metabase, as described herein. For example, rather than a separate metabase or index associated with each client 130 in FIG. 5, the metabases on this storage tier may be centralized. Similarly, second and other tiers of storage may have either centralized or distributed metabases. Moreover, mixed architecture systems may be used if desired, that may include a first tier centralized metabase system coupled to a second tier storage system having distributed metabases and vice versa, etc.

Moreover, in operation, a storage manager 105 or other management module may keep track of certain information that allows the storage manager 105 to select, designate, or otherwise identify metabases to be searched in response to certain queries as further described herein. Movement of data between primary and secondary storage may also involve movement of associated metadata and other tracking information as further described herein.

In some examples, primary data may be organized into one or more sub-clients. A sub-client is a portion of the data of one or more clients 130, and can contain either all of the data of the clients 130 or a designated subset thereof. As depicted in FIG. 5, the data store 162 includes two sub-clients. For example, an administrator (or other user with the appropriate permissions; the term administrator is used herein for brevity) may find it preferable to separate email data from financial data using two different sub-clients having different storage preferences, retention criteria, etc.

CONCLUSION

Systems and modules described herein may comprise software, firmware, hardware, or any combination(s) of software, firmware, or hardware suitable for the purposes described herein. Software and other modules may reside on servers, workstations, personal computers, computerized tablets, PDAs, smart phones, and other devices suitable for the purposes described herein. Modules described herein may be executed by a general-purpose computer, e.g., a server computer, wireless device, or personal computer. Those skilled in the relevant art will appreciate that aspects of the invention can be practiced with other communications, data processing, or computer system configurations, including: Internet appliances, hand-held devices (including personal digital assistants (PDAs)), wearable computers, all manner of cellular or mobile phones, multi-processor systems, microprocessor-based or programmable consumer electronics, set-top boxes, network PCs, mini-computers, mainframe computers, and the like. Indeed, the terms “computer,” “server,” “host,” “host system,” and the like, are generally used interchangeably herein and refer to any of the above devices and systems, as well as any data processor. Furthermore, aspects of the invention can be embodied in a special purpose computer or data processor that is specifically programmed, configured, or constructed to perform one or more of the computer-executable instructions explained in detail herein.

Software and other modules may be accessible via local memory, a network, a browser, or other application in an ASP context, or via another means suitable for the purposes described herein. Examples of the technology can also be practiced in distributed computing environments where tasks or modules are performed by remote processing devices, which are linked through a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN), or the Internet. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. Data structures described herein may comprise computer files, variables, programming arrays, programming structures, or any electronic information storage schemes or methods, or any combinations thereof, suitable for the purposes described herein. User interface elements described herein may comprise elements from graphical user interfaces, command line interfaces, and other interfaces suitable for the purposes described herein.

Examples of the technology may be stored or distributed on computer-readable media, including magnetically or optically readable computer disks, hard-wired or preprogrammed chips (e.g., EEPROM semiconductor chips), nanotechnology memory, biological memory, or other data storage media. Indeed, computer-implemented instructions, data structures, screen displays, and other data under aspects of the invention may be distributed over the Internet or over other networks (including wireless networks), on a propagated signal on a propagation medium (e.g., an electromagnetic wave(s), a sound wave, etc.) over a period of time, or they may be provided on any analog or digital network (packet switched, circuit switched, or other scheme).

Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” As used herein, the terms “connected,” “coupled,” or any variant thereof, means any connection or coupling, either direct or indirect, between two or more elements; the coupling or connection between the elements can be physical, logical, or a combination thereof. Additionally, the words “herein,” “above,” “below,” and words of similar import, when used in this application, refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above Detailed Description using the singular or plural number may also include the plural or singular number respectively. The word “or,” in reference to a list of two or more items, covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list.

The above Detailed Description is not intended to be exhaustive or to limit the invention to the precise form disclosed above. While specific examples for the invention are described above for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. For example, while processes or blocks are presented in a given order, alternative implementations may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed or implemented in parallel, or may be performed at different times. Further any specific numbers noted herein are only examples: alternative implementations may employ differing values or ranges.

The teachings of the invention provided herein can be applied to other systems, not necessarily the systems described herein. The elements and acts of the various examples described above can be combined to provide further implementations of the invention.

Any patents and applications and other references noted above, including any that may be listed in accompanying filing papers, are incorporated herein by reference. Aspects of the invention can be modified, if necessary, to employ the systems, functions, and concepts of the various references described above to provide yet further implementations of the invention.

These and other changes can be made to the invention in light of the above Detailed Description. While the above description describes certain examples of the invention and describes the best mode contemplated, no matter how detailed the above appears in text, the invention can be practiced in many ways. Details of the system may vary considerably in its specific implementation, while still being encompassed by the invention disclosed herein. As noted above, particular terminology used when describing certain features or aspects of the invention should not be taken to imply that the terminology is being redefined herein to be restricted to any specific characteristics, features, or aspects of the invention with which that terminology is associated. In general, the terms used in the following claims should not be construed to limit the invention to the specific examples disclosed in the specification, unless the above Detailed Description section explicitly defines such terms. Accordingly, the actual scope of the invention encompasses not only the disclosed examples, but also all equivalent ways of practicing or implementing the invention under the claims.

While certain examples are presented below in certain forms, the applicant contemplates the various aspects of the invention in any number of claim forms. Accordingly, the applicant reserves the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the invention.

Claims

1. A computer-implemented method for archiving multiple primary data objects, the computer-implemented method comprising: via one or more computing devices, comprising one or more processors: receiving, from a source computing device, both full and incremental backup copies of primary data associated with the source computing device;creating a secondary copy of multiple data objects comprising the primary data by using the received full and incremental backup copies of the primary data;for each of the multiple data objects for which a secondary copy was created, adding an entry for corresponding data object to a first data structure, wherein the entry includes an identifier associated with the corresponding data object;after creating the secondary copy, identifying one or more of the multiple data objects that satisfy one or more predetermined archival criteria, wherein the one or more predetermined archival criteria are specified by a storage policy assigned to the primary data associated with the source computing device; andfor each identified data object of the identified one or more of the multiple data objects: verifying that a secondary copy of the identified data object exists in secondary storage by querying the first data structure using the identifier associated with the identified data object;replacing the identified data object in the primary data with a stub referencing the identified data object within the secondary copy of the multiple data objects, wherein the secondary copy was created in association with a prior backup job; andupdating a second data structure with the identifier associated with the identified data object.
2. The computer-implemented method of claim 1, the computer-implemented method further comprising: receiving a token for the identified data object, wherein the token represents a verification that the secondary copy was created.
3. The computer-implemented method of claim 2, wherein the token is included in the stub.
4. The computer-implemented method of claim 1, wherein the one or more predetermined archival criteria comprises at least one of: a data object type, a data object age, a data object size, a percentage of disk quota, remaining storage, and metadata.
5. The computer-implemented method of claim 1, the computer-implemented method further comprising: receiving information regarding a first data object included in the primary data from a driver or file system that detects deletions;using the received information to determine that the first data object has been deleted from the primary data and a corresponding deletion time; andin response to determining that the corresponding deletion time is more than a predetermined period of time ago, deleting the secondary copy of the first data object.
6. The computer-implemented method of claim 5, wherein the predetermined period of time is determined at least in part by an object type of the first data object.
7. The computer-implemented method of claim 5, wherein the predetermined period of time is determined by the storage policy assigned to the source computing device.
8. The computer-implemented method of claim 1, the computer-implemented method further comprising: after replacing the identified data object in the primary data with the stub referencing the identified data object, performing at least one of following operations on the created secondary copy of the multiple data objects comprising the primary data: deduplication, decompression, compression, content-indexing, encryption, decryption, or data classification.
9. The computer-implemented method of claim 1, wherein the second data structure further comprises information indicating where a secondary copy of the identified data object is stored.
10. A system for archiving data objects using secondary copies, the system comprising: at least one processor;at least one memory coupled to the at least one processor;a first software component configured to create one or more secondary copies of primary data comprising multiple data objects;a first data structure comprising a mapping between the multiple data objects and locations of the one or more secondary copies;a second data structure that stores, for each data object for which a secondary copy had been created, a unique token; anda second software component configured to: identify data objects to be archived,generate corresponding tokens for identified data objects to be archived,verify that previously-created secondary copies of the identified data objects exist by confirming that corresponding tokens are present in the second data structure, andreplace the identified data objects with stubs.
11. The system of claim 10, wherein the second software component is further configured to: determine that a first data object included in the primary data satisfies predetermined criteria; andin response to determining that the first data object satisfies the predetermined criteria, delete the secondary copy of the first data object.
12. A computer-implemented method for archiving data objects using secondary copies, the computer-implemented method comprising: creating one or more secondary copies of primary data comprising multiple data objects, wherein a mapping between the multiple data objects and locations of the one or more secondary copies is stored in a first data structure; andwith a second software component: identifying data objects to be archived,generating corresponding tokens for identified data objects to be archived,verifying that previously-created secondary copies of the identified data objects exist by confirming that corresponding tokens are present in a second data structure, wherein the second data structure, for each data object for which a secondary copy had been created, stores a unique token, andreplacing the identified data objects with stubs.
13. The computer-implemented method of claim 12, the computer-implemented method further comprising: receiving a token for the identified data object, wherein the token represents a verification that the secondary copy was created.
14. The computer-implemented method of claim 12, the computer-implemented method further comprising: receiving information regarding a first data object included in the primary data from a driver or file system that detects deletions;using the received information to determine that the first data object has been deleted from the primary data and a corresponding deletion time; andin response to determining that the corresponding deletion time is more than a predetermined period of time ago, deleting the secondary copy of the first data object.
15. The computer-implemented method of claim 12, the computer-implemented method further comprising: performing at least one of following operations on the created secondary copy of the multiple data objects comprising the primary data: deduplication, decompression, compression, content-indexing, encryption, decryption, or data classification.
16. The computer-implemented method of claim 12, the computer-implemented method further comprising: determining that a first data object included in the primary data satisfies predetermined criteria; andin response to determining that the first data object satisfies the predetermined criteria, deleting the secondary copy of the first data object.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 16/934,432, filed Jul. 21, 2020, which is a continuation of U.S. patent application Ser. No. 15/476,613, filed Mar. 31, 2017, issued as U.S. Pat. No. 10,762,036, which is a continuation of U.S. patent application Ser. No. 15/013,138, filed Feb. 2, 2016, issued as U.S. Pat. No. 9,639,563, which is a continuation of U.S. patent application Ser. No. 14/595,984, filed Jan. 13, 2015, issued as U.S. Pat. No. 9,262,275, which is a continuation of U.S. patent application Ser. No. 13/250,824, filed Sep. 30, 2011, issued as U.S. Pat. No. 8,935,492, which claims the benefit of U.S. Patent Application No. 61/388,566, filed Sep. 30, 2010, each of which is hereby incorporated herein by reference in its entirety.

US Referenced Citations (494)

Number	Name	Date	Kind
4686620	Ng	Aug 1987	A
4713755	Worley, Jr. et al.	Dec 1987	A
4995035	Cole et al.	Feb 1991	A
5005122	Griffin et al.	Apr 1991	A
5093912	Dong et al.	Mar 1992	A
5133065	Cheffetz et al.	Jul 1992	A
5193154	Kitajima et al.	Mar 1993	A
5212772	Masters	May 1993	A
5226157	Nakano et al.	Jul 1993	A
5239647	Anglin et al.	Aug 1993	A
5241668	Eastridge et al.	Aug 1993	A
5241670	Eastridge et al.	Aug 1993	A
5276860	Fortier et al.	Jan 1994	A
5276867	Kenley et al.	Jan 1994	A
5287500	Stoppani, Jr.	Feb 1994	A
5321816	Rogan et al.	Jun 1994	A
5333315	Saether et al.	Jul 1994	A
5347653	Flynn et al.	Sep 1994	A
5410700	Fecteau et al.	Apr 1995	A
5437012	Mahajan	Jul 1995	A
5448724	Hayashi et al.	Sep 1995	A
5491810	Allen	Feb 1996	A
5495607	Pisello et al.	Feb 1996	A
5504873	Martin et al.	Apr 1996	A
5544345	Carpenter et al.	Aug 1996	A
5544347	Yanai et al.	Aug 1996	A
5559957	Balk	Sep 1996	A
5604862	Midgely et al.	Feb 1997	A
5606686	Tarui et al.	Feb 1997	A
5619644	Crockett et al.	Apr 1997	A
5628004	Gormley et al.	May 1997	A
5634052	Morris	May 1997	A
5638509	Dunphy et al.	Jun 1997	A
5673381	Huai et al.	Sep 1997	A
5699361	Ding et al.	Dec 1997	A
5729743	Squibb	Mar 1998	A
5742792	Yanai	Apr 1998	A
5751997	Kullick et al.	May 1998	A
5758359	Saxon	May 1998	A
5761677	Senator et al.	Jun 1998	A
5764972	Crouse et al.	Jun 1998	A
5778395	Whiting et al.	Jul 1998	A
5794229	French et al.	Aug 1998	A
5806057	Gormley et al.	Sep 1998	A
5812398	Nielsen	Sep 1998	A
5813008	Benson	Sep 1998	A
5813009	Johnson et al.	Sep 1998	A
5813017	Morris	Sep 1998	A
5822780	Schutzman	Oct 1998	A
5862325	Reed et al.	Jan 1999	A
5875478	Blumenau	Feb 1999	A
5887134	Ebrahim	Mar 1999	A
5901327	Ofek	May 1999	A
5924102	Perks	Jul 1999	A
5940833	Benson	Aug 1999	A
5950205	Aviani, Jr.	Sep 1999	A
5974563	Beeler, Jr.	Oct 1999	A
5990810	Williams	Nov 1999	A
6021415	Cannon et al.	Feb 2000	A
6026414	Anglin	Feb 2000	A
6052735	Ulrich et al.	Apr 2000	A
6073133	Chrabaszcz	Jun 2000	A
6076148	Kedem et al.	Jun 2000	A
6094416	Ying	Jul 2000	A
6125369	Wu	Sep 2000	A
6131095	Low et al.	Oct 2000	A
6131190	Sidwell	Oct 2000	A
6148412	Cannon et al.	Nov 2000	A
6154787	Urevig et al.	Nov 2000	A
6161111	Mutalik et al.	Dec 2000	A
6167402	Yeager	Dec 2000	A
6173291	Jenevein	Jan 2001	B1
6212512	Barney et al.	Apr 2001	B1
6260069	Anglin	Jul 2001	B1
6269431	Dunham	Jul 2001	B1
6275953	Vahalia et al.	Aug 2001	B1
6301592	Aoyama et al.	Oct 2001	B1
6311252	Raz	Oct 2001	B1
6324544	Alam	Nov 2001	B1
6324581	Xu et al.	Nov 2001	B1
6328766	Long	Dec 2001	B1
6330570	Crighton et al.	Dec 2001	B1
6330642	Carteau	Dec 2001	B1
6343324	Hubis et al.	Jan 2002	B1
RE37601	Eastridge et al.	Mar 2002	E
6356801	Goodman et al.	Mar 2002	B1
6356915	Chtchetkine	Mar 2002	B1
6363400	Chtchetkine	Mar 2002	B1
6389432	Pothapragada et al.	May 2002	B1
6418478	Ignatius	Jul 2002	B1
6421711	Blumenau et al.	Jul 2002	B1
6477544	Bolosky et al.	Nov 2002	B1
6487561	Ofek et al.	Nov 2002	B1
6513051	Bolosky	Jan 2003	B1
6519679	Devireddy et al.	Feb 2003	B2
6538669	Lagueux, Jr. et al.	Mar 2003	B1
6564228	O'Connor	May 2003	B1
6609157	Deo	Aug 2003	B2
6609183	Ohran	Aug 2003	B2
6609187	Merrell	Aug 2003	B1
6658526	Nguyen et al.	Dec 2003	B2
6675177	Webb	Jan 2004	B1
6704730	Moulton	Mar 2004	B2
6708195	Borman	Mar 2004	B1
6745304	Playe	Jun 2004	B2
6757699	Lowry	Jun 2004	B2
6757794	Cabrera	Jun 2004	B2
6795903	Schultz	Sep 2004	B2
6810398	Moulton	Oct 2004	B2
6839819	Martin	Jan 2005	B2
6862674	Dice	Mar 2005	B2
6865655	Andersen	Mar 2005	B1
6868417	Kazar	Mar 2005	B2
6889297	Krapp et al.	May 2005	B2
6901493	Maffezzoni	May 2005	B1
6912645	Dorward	Jun 2005	B2
6928459	Sawdon	Aug 2005	B1
6952758	Chron	Oct 2005	B2
6959368	St. Pierre	Oct 2005	B1
6973553	Archibald et al.	Dec 2005	B1
6976039	Chefalas	Dec 2005	B2
7017113	Bourbakis	Mar 2006	B2
7035876	Kawai et al.	Apr 2006	B2
7035880	Crescenti	Apr 2006	B1
7035943	Yamane	Apr 2006	B2
7085904	Mizuno	Aug 2006	B2
7089383	Ji	Aug 2006	B2
7089395	Jacobson	Aug 2006	B2
7092956	Ruediger	Aug 2006	B2
7103740	Colgrove et al.	Sep 2006	B1
7107298	Prahlad et al.	Sep 2006	B2
7107418	Ohran	Sep 2006	B2
7111173	Scheidt	Sep 2006	B1
7117246	Christenson	Oct 2006	B2
7139808	Anderson et al.	Nov 2006	B2
7143091	Charnock	Nov 2006	B2
7143108	George	Nov 2006	B1
7191290	Ackaouy	Mar 2007	B1
7200604	Forman	Apr 2007	B2
7200621	Beck et al.	Apr 2007	B2
7246272	Cabezas	Jul 2007	B2
7272606	Borthakur	Sep 2007	B2
7277941	Ignatius et al.	Oct 2007	B2
7287252	Bussiere	Oct 2007	B2
7290102	Lubbers et al.	Oct 2007	B2
7310655	Dussud	Dec 2007	B2
7315923	Retnamma et al.	Jan 2008	B2
7320059	Armangau	Jan 2008	B1
7325110	Kubo	Jan 2008	B2
7330997	Odom	Feb 2008	B1
7343459	Prahlad	Mar 2008	B2
7370003	Pych	May 2008	B2
7376805	Stroberger et al.	May 2008	B2
7383304	Shimada et al.	Jun 2008	B2
7383462	Osaki	Jun 2008	B2
7389345	Adams	Jun 2008	B1
7395282	Crescenti	Jul 2008	B1
7403942	Bayliss	Jul 2008	B1
7409522	Fair et al.	Aug 2008	B1
7440982	Lu et al.	Oct 2008	B2
7444382	Malik	Oct 2008	B2
7444387	Douceur	Oct 2008	B2
7451166	Damani et al.	Nov 2008	B2
7478096	Margolus et al.	Jan 2009	B2
7478113	De Spiegeleer	Jan 2009	B1
7480782	Garthwaite	Jan 2009	B2
7487245	Douceur	Feb 2009	B2
7490207	Amarendran	Feb 2009	B2
7493314	Huang	Feb 2009	B2
7493456	Brittain et al.	Feb 2009	B2
7496604	Sutton	Feb 2009	B2
7512745	Gschwind et al.	Mar 2009	B2
7516208	Kerrison	Apr 2009	B1
7519726	Pallyill	Apr 2009	B2
7533331	Brown et al.	May 2009	B2
7536440	Budd et al.	May 2009	B2
7568080	Prahlad	Jul 2009	B2
7577687	Bank et al.	Aug 2009	B2
7590639	Ivanova	Sep 2009	B1
7603529	MacHardy et al.	Oct 2009	B1
7613748	Brockway	Nov 2009	B2
7617297	Bruce	Nov 2009	B2
7631120	Darcy	Dec 2009	B2
7631194	Wahlert	Dec 2009	B2
7636824	Tormasov	Dec 2009	B1
7647462	Wolfgang	Jan 2010	B2
7657550	Prahlad	Feb 2010	B2
7661028	Erofeev	Feb 2010	B2
7668884	Prahlad	Feb 2010	B2
7672779	Fuchs	Mar 2010	B2
7672981	Faibish et al.	Mar 2010	B1
7673089	Hinchey	Mar 2010	B2
7676590	Silverman	Mar 2010	B2
7685126	Patel	Mar 2010	B2
7685177	Hagerstrom	Mar 2010	B1
7685384	Shavit	Mar 2010	B2
7685459	De Spiegeleer	Mar 2010	B1
7698699	Rogers	Apr 2010	B2
7716445	Bonwick	May 2010	B2
7721292	Frasier et al.	May 2010	B2
7734581	Gu et al.	Jun 2010	B2
7739381	Ignatius et al.	Jun 2010	B2
7747579	Prahlad	Jun 2010	B2
7747584	Jernigan, IV	Jun 2010	B1
7747659	Bacon et al.	Jun 2010	B2
7778979	Hatonen et al.	Aug 2010	B2
7786881	Burchard et al.	Aug 2010	B2
7788230	Dile	Aug 2010	B2
7814142	Mamou	Oct 2010	B2
7818287	Torii	Oct 2010	B2
7818495	Tanaka et al.	Oct 2010	B2
7818531	Barrall	Oct 2010	B2
7830889	Lemaire	Nov 2010	B1
7831707	Bardsley	Nov 2010	B2
7831793	Chakravarty et al.	Nov 2010	B2
7831795	Prahlad	Nov 2010	B2
7836161	Scheid	Nov 2010	B2
7840537	Gokhale	Nov 2010	B2
7853750	Stager et al.	Dec 2010	B2
7856414	Zee	Dec 2010	B2
7865470	Fries	Jan 2011	B2
7865678	Arakawa	Jan 2011	B2
7870105	Arakawa	Jan 2011	B2
7870486	Wang	Jan 2011	B2
7873599	Ishii	Jan 2011	B2
7873806	Prahlad	Jan 2011	B2
7882077	Gokhale	Feb 2011	B2
7899990	Moll et al.	Mar 2011	B2
7921077	Ting et al.	Apr 2011	B2
7953706	Prahlad	May 2011	B2
7962452	Anglin	Jun 2011	B2
8028106	Bondurant et al.	Sep 2011	B2
8037028	Prahlad	Oct 2011	B2
8041907	Wu et al.	Oct 2011	B1
8051367	Arai et al.	Nov 2011	B2
8054765	Passey et al.	Nov 2011	B2
8055618	Anglin	Nov 2011	B2
8055627	Prahlad	Nov 2011	B2
8055745	Atluri	Nov 2011	B2
8078603	Chandratillake	Dec 2011	B1
8086799	Mondal et al.	Dec 2011	B2
8095756	Somavarapu	Jan 2012	B1
8108429	Sim-Tang	Jan 2012	B2
8112357	Mueller	Feb 2012	B2
8131687	Bates et al.	Mar 2012	B2
8140786	Bunte	Mar 2012	B2
8156092	Hewett	Apr 2012	B2
8156279	Tanaka et al.	Apr 2012	B2
8161003	Kavuri	Apr 2012	B2
8165221	Zheng	Apr 2012	B2
8166263	Prahlad	Apr 2012	B2
8170994	Tsaur	May 2012	B2
8190823	Waltermann et al.	May 2012	B2
8190835	Yueh	May 2012	B1
8213540	Rickey	Jul 2012	B1
8219524	Gokhale	Jul 2012	B2
8234444	Bates et al.	Jul 2012	B2
8239348	Bezbaruah	Aug 2012	B1
8244914	Nagarkar	Aug 2012	B1
8271992	Chatley	Sep 2012	B2
8285683	Gokhale	Oct 2012	B2
8295875	Masuda	Oct 2012	B2
8296260	Ting et al.	Oct 2012	B2
8296301	Lunde	Oct 2012	B2
8315984	Frandzel	Nov 2012	B2
8346730	Srinivasan	Jan 2013	B2
8352422	Prahlad et al.	Jan 2013	B2
8364652	Vijayan et al.	Jan 2013	B2
8375008	Gomes	Feb 2013	B1
8380957	Prahlad	Feb 2013	B2
8386436	Ben-Dyke	Feb 2013	B2
8392677	Bunte et al.	Mar 2013	B2
8401996	Muller	Mar 2013	B2
8412677	Klose	Apr 2013	B2
8412682	Zheng et al.	Apr 2013	B2
8484162	Prahlad et al.	Jul 2013	B2
8504515	Prahlad et al.	Aug 2013	B2
8548953	Wong	Oct 2013	B2
8572340	Vijayan et al.	Oct 2013	B2
8577851	Vijayan et al.	Nov 2013	B2
8578109	Vijayan et al.	Nov 2013	B2
8578120	Attarde	Nov 2013	B2
8620845	Stoakes et al.	Dec 2013	B2
8626723	Ben-Shaul	Jan 2014	B2
8712969	Prahlad	Apr 2014	B2
8712974	Datuashvili	Apr 2014	B2
8725687	Klose	May 2014	B2
8725698	Prahlad et al.	May 2014	B2
8769185	Chung	Jul 2014	B2
8782368	Lillibridge et al.	Jul 2014	B2
8880797	Yueh	Nov 2014	B2
8909881	Bunte et al.	Dec 2014	B2
8930306	Ngo et al.	Jan 2015	B1
8935492	Gokhale	Jan 2015	B2
8954446	Vijayan Retnamma et al.	Feb 2015	B2
8965852	Jayaraman	Feb 2015	B2
8997020	Chambers et al.	Mar 2015	B2
9015181	Kottomtharayil et al.	Apr 2015	B2
9020890	Kottomtharayil et al.	Apr 2015	B2
9020900	Vijayan Retnamma et al.	Apr 2015	B2
9026498	Kumarasamy	May 2015	B2
9058117	Attarde et al.	Jun 2015	B2
9063938	Kumarasamy et al.	Jun 2015	B2
9069799	Vijayan	Jun 2015	B2
9104623	Retnamma et al.	Aug 2015	B2
9116850	Vijayan Retnamma et al.	Aug 2015	B2
9213540	Rickey et al.	Dec 2015	B1
9218374	Muller et al.	Dec 2015	B2
9218375	Muller et al.	Dec 2015	B2
9218376	Muller et al.	Dec 2015	B2
9223597	Deshpande et al.	Dec 2015	B2
9236079	Prahlad et al.	Jan 2016	B2
9251186	Muller et al.	Feb 2016	B2
9262275	Gokhale	Feb 2016	B2
9275086	Kumarasamy et al.	Mar 2016	B2
9276871	Freitas	Mar 2016	B1
9286110	Mitkar et al.	Mar 2016	B2
9372479	Phillips	Jun 2016	B1
9575673	Mitkar et al.	Feb 2017	B2
9633022	Vijayan et al.	Apr 2017	B2
9633033	Vijayan et al.	Apr 2017	B2
9633056	Attarde et al.	Apr 2017	B2
9639563	Gokhale	May 2017	B2
9646166	Cash	May 2017	B2
9652283	Mitkar et al.	May 2017	B2
9665591	Vijayan et al.	May 2017	B2
9773025	Muller et al.	Sep 2017	B2
9781000	Kumar	Oct 2017	B1
9848046	Mehta et al.	Dec 2017	B2
9928144	Kumarasamy	Mar 2018	B2
9939981	Varadharajan et al.	Apr 2018	B2
10089337	Senthilnathan et al.	Oct 2018	B2
10169162	Hammer	Jan 2019	B2
10223212	Kumarasamy et al.	Mar 2019	B2
10310953	Vijayan et al.	Jun 2019	B2
10324897	Amarendran et al.	Jun 2019	B2
10338823	Kottomtharayil et al.	Jul 2019	B2
10339106	Vijayan et al.	Jul 2019	B2
10481824	Vijayan et al.	Nov 2019	B2
10742735	Kumar et al.	Aug 2020	B2
20010037323	Moulton et al.	Nov 2001	A1
20020055972	Weinman	May 2002	A1
20020065892	Malik	May 2002	A1
20020099806	Balsamo	Jul 2002	A1
20020107877	Whiting	Aug 2002	A1
20030004922	Schmidt et al.	Jan 2003	A1
20030074600	Tamatsu	Apr 2003	A1
20030110190	Achiwa	Jun 2003	A1
20030135480	Van Arsdale	Jul 2003	A1
20030167318	Robbin	Sep 2003	A1
20030172368	Alumbaugh et al.	Sep 2003	A1
20030177149	Coombs	Sep 2003	A1
20030236763	Kilduff	Dec 2003	A1
20040128287	Keller	Jul 2004	A1
20040148306	Moulton	Jul 2004	A1
20040177319	Horn	Sep 2004	A1
20040220975	Carpentier	Nov 2004	A1
20040230817	Ma	Nov 2004	A1
20050033756	Kottomtharayil	Feb 2005	A1
20050060643	Glass et al.	Mar 2005	A1
20050066190	Martin	Mar 2005	A1
20050097150	McKeon et al.	May 2005	A1
20050108435	Nowacki et al.	May 2005	A1
20050131961	Margolus	Jun 2005	A1
20050138081	Alshab et al.	Jun 2005	A1
20050195660	Kavuri	Sep 2005	A1
20050203864	Schmidt	Sep 2005	A1
20050203887	Joshi	Sep 2005	A1
20050210460	Rogers	Sep 2005	A1
20050234823	Schimpf	Oct 2005	A1
20050254072	Hirai	Nov 2005	A1
20050262193	Mamou et al.	Nov 2005	A1
20050283461	Sell	Dec 2005	A1
20050286466	Tagg et al.	Dec 2005	A1
20060005048	Osaki et al.	Jan 2006	A1
20060053305	Wahlert et al.	Jan 2006	A1
20060010227	Atluri	Mar 2006	A1
20060047894	Okumura	Mar 2006	A1
20060047978	Kawakami	Mar 2006	A1
20060056623	Gligor	Mar 2006	A1
20060089954	Anschutz	Apr 2006	A1
20060095470	Cochran	May 2006	A1
20060126615	Angtin	Jun 2006	A1
20060129576	Carpentier	Jun 2006	A1
20060129771	Dasgupta	Jun 2006	A1
20060174112	Wray	Aug 2006	A1
20060206547	Kulkarni	Sep 2006	A1
20060206621	Toebes	Sep 2006	A1
20060224846	Amarendran	Oct 2006	A1
20060230081	Craswell	Oct 2006	A1
20060230244	Amarendran et al.	Oct 2006	A1
20060259587	Ackerman	Nov 2006	A1
20070022145	Kavuri	Jan 2007	A1
20070067399	Kulkarni	Mar 2007	A1
20070079170	Zimmer	Apr 2007	A1
20070106863	Bonwick	May 2007	A1
20070118573	Gadiraju et al.	May 2007	A1
20070136200	Frank	Jun 2007	A1
20070156998	Gorobets	Jul 2007	A1
20070179995	Prahlad	Aug 2007	A1
20070226535	Gokhale	Sep 2007	A1
20070233638	Carroll et al.	Oct 2007	A1
20070260476	Smolen	Nov 2007	A1
20070271316	Hollebeek	Nov 2007	A1
20070288534	Zak	Dec 2007	A1
20080047935	Hinchey	Feb 2008	A1
20080082714	Hinchey	Apr 2008	A1
20080082736	Chow et al.	Apr 2008	A1
20080098083	Shergill	Apr 2008	A1
20080104291	Hinchey	May 2008	A1
20080126543	Hamada	May 2008	A1
20080162518	Bollinger	Jul 2008	A1
20080162597	Tysowski	Jul 2008	A1
20080229037	Bunte	Sep 2008	A1
20080243769	Arbour	Oct 2008	A1
20080243914	Prahlad	Oct 2008	A1
20080244172	Kano	Oct 2008	A1
20080244204	Cremelie et al.	Oct 2008	A1
20080307000	Paterson	Dec 2008	A1
20090012984	Ravid et al.	Jan 2009	A1
20090049260	Upadhyayula	Feb 2009	A1
20090083341	Parees	Mar 2009	A1
20090083344	Inoue et al.	Mar 2009	A1
20090106369	Chen et al.	Apr 2009	A1
20090112870	Ozzie	Apr 2009	A1
20090119678	Shih	May 2009	A1
20090150498	Branda et al.	Jun 2009	A1
20090204636	Li et al.	Aug 2009	A1
20090204650	Wong	Aug 2009	A1
20090228446	Anzai	Sep 2009	A1
20090268903	Bojinov et al.	Oct 2009	A1
20090271454	Anglin et al.	Oct 2009	A1
20090281847	Hamilton, II	Nov 2009	A1
20090319534	Gokhale	Dec 2009	A1
20090327625	Jaquette	Dec 2009	A1
20100036887	Anglin et al.	Feb 2010	A1
20100082529	Mace et al.	Apr 2010	A1
20100082672	Kottomtharayil	Apr 2010	A1
20100088296	Periyagaram	Apr 2010	A1
20100138500	Consul	Jun 2010	A1
20100281081	Stager	Nov 2010	A1
20100332401	Prahlad	Dec 2010	A1
20110125711	Meisenheimer	May 2011	A1
20120102286	Holt	Apr 2012	A1
20120150818	Vijayan Retnamma et al.	Jun 2012	A1
20120159098	Cheung	Jun 2012	A1
20120233417	Kalach	Sep 2012	A1
20120271793	Gokhale	Oct 2012	A1
20120311581	Balmin	Dec 2012	A1
20130041872	Aizman	Feb 2013	A1
20130086007	Bandopadhyay	Apr 2013	A1
20130117305	Varakin	May 2013	A1
20130218350	Phillips	Aug 2013	A1
20130262394	Kumarasamy et al.	Oct 2013	A1
20130262801	Sancheti	Oct 2013	A1
20130290598	Fiske	Oct 2013	A1
20130339298	Muller et al.	Dec 2013	A1
20130339310	Muller et al.	Dec 2013	A1
20140006382	Barber	Jan 2014	A1
20140012814	Bercovici et al.	Jan 2014	A1
20140067764	Prahlad et al.	Mar 2014	A1
20140129961	Zubarev	May 2014	A1
20140181079	Ghazal	Jun 2014	A1
20140188532	Liu	Jul 2014	A1
20140201485	Ahn et al.	Jul 2014	A1
20140250088	Klose et al.	Sep 2014	A1
20140310232	Plattner	Oct 2014	A1
20150178277	Singhal	Jun 2015	A1
20150199242	Attarde et al.	Jul 2015	A1
20150205678	Kottomtharayil et al.	Jul 2015	A1
20150205817	Kottomtharayil et al.	Jul 2015	A1
20150212889	Amarendran et al.	Jul 2015	A1
20150269035	Vijayan	Sep 2015	A1
20150363270	Hammer	Dec 2015	A1
20160019224	Ahn et al.	Jan 2016	A1
20160124658	Prahlad	May 2016	A1
20160179435	Haley	Jun 2016	A1
20160210064	Dornemann	Jul 2016	A1
20160253254	Krishnan et al.	Sep 2016	A1
20160299818	Vijayan et al.	Oct 2016	A1
20160342633	Senthilnathan	Nov 2016	A1
20160342661	Kumarasamy	Nov 2016	A1
20170031707	Mitkar et al.	Feb 2017	A1
20170083408	Vijayan et al.	Mar 2017	A1
20170206206	Gokhale	Jul 2017	A1
20180137139	Bangalore et al.	May 2018	A1
20180144000	Muller	May 2018	A1
20180239772	Vijayan et al.	Aug 2018	A1
20180288150	Wang et al.	Oct 2018	A1
20180364914	Prahlad	Dec 2018	A1
20190042609	Senthilnathan	Feb 2019	A1
20190108341	Bedhapudi et al.	Apr 2019	A1
20200327017	Vijayan et al.	Oct 2020	A1
20200358621	Ngo	Nov 2020	A1

Foreign Referenced Citations (13)

Number	Date	Country
0259912	Mar 1988	EP
0405926	Jan 1991	EP
0467546	Jan 1992	EP
0774715	May 1997	EP
0809184	Nov 1997	EP
0899662	Mar 1999	EP
0981090	Feb 2000	EP
WO9513580	May 1995	WO
WO9912098	Mar 1999	WO
WO03027891	Apr 2003	WO
WO2006052872	May 2006	WO
WO2008070688	Jun 2008	WO
WO2008080140	Jul 2008	WO

Non-Patent Literature Citations (48)

Entry
Anonymous, “NTFS Sparse Files (NTFS5 Only)”, Jun. 4, 2002, pp. 1-1, https://web.archive.org/web/20020604013016/http://ntfs.com/ntfs-sparse.htm.
Armstead et al., “Implementation of a Campwide Distributed Mass Storage Service: The Dream vs. Reality,” IEEE, Sep. 11-14, 1995, pp. 190-199.
Arneson, “Mass Storage Archiving in Network Environments,” Digest of Papers, Ninth IEEE Symposium on Mass Storage Systems, Oct. 31, 1988-Nov. 3, 1988, pp. 45-50, Monterey, CA.
Cabrera et al., “ADSM: A Multi-Platform, Scalable, Backup and Archive Mass Storage System,” Digest of Papers, Compcon '95, Proceedings of the 40th IEEE Computer Society International Conference, Mar. 5, 1995-Mar. 9, 1995, pp. 420-427, San Francisco, CA.
Commvault Systems, Inc., “Continuous Data Replicator 7.0,” Product Data Sheet, 2007, 6 pages.
CommVault Systems, Inc., “Deduplication—How To,” <http://documentation.commvault.com/commvault/release_8_0_0/books_online_1/english_US/features/single_instance/single_instance_how_to.htm>, earliest known publication date: Jan. 26, 2009, 7 pages.
CommVault Systems, Inc., “Deduplication,” <http://documentation.commvault.com/commvault/release_8_0_0/books_online_1/english_US/features/single_instance/single_instance.htm>, earliest known publication date: Jan. 26, 2009, 9 pages.
Computer Hope, “File,” May 21, 2008, pp. 1-3, https://web.archive.org/web/20080513021935/https://www.computerhope.com/jargon/f/file.htm.
Diligent Technologies “HyperFactor,” <http://www.diligent.com/products:protecTIER-1:HyperFactor-1>, Internet accessed on Dec. 5, 2008, 2 pages.
Eitel, “Backup and Storage Management in Distributed Heterogeneous Environments,” IEEE, Jun. 12-16, 1994, pp. 124-126.
Enterprise Storage Management, “What Is Hierarchical Storage Management?”, Jun. 19, 2005, pp. 1, http://web.archive.org/web/20050619000521/hhttp://www.enterprisestoragemanagement.com/faq/hierarchical-storage-management-shtml.
Enterprise Storage Management, What Is A Incremental Backup?, Oct. 26, 2005, pp. 1-2, http://web.archive.org/web/w0051026010908/http://www.enterprisestoragemanagement.com/faq/incremental-backup.shtml.
Examination Report dated Dec. 14, 2018 in European Patent Application No. 09816825.5, 7 pages.
Extended European Search Report for 09816825.5; dated Oct. 27, 2015, 15 pages.
Extended European Search Report for EP07865192.4; dated May 2, 2013, 7 pages.
Federal Information Processing Standards Publication 180-2, “Secure Hash Standard”, Aug. 1, 2002, <http://csrc.nist.gov/publications/fips/fips1 80-2/fips 1 80-2withchangenotice. pdf>, 83 pages.
FlexHex, “NTFS Sparse Files for Programmers”, Feb. 22, 2006, pp. 1-4, https://web.archive.org/web/20060222050807/http://www.flexhex.com/docs/articles/sparse-files.phtml.
Gait, J., “The Optical File Cabinet: A Random-Access File System For Write-Once Optical Disks,” IEEE Computer, vol. 21, No. 6, pp. 11-22 (Jun. 1988).
Geer, D., “Reducing The Storage Burden Via Data Deduplication,” IEEE, Computer Journal, vol. 41, Issue 12, Dec. 2008, pp. 15-17.
Handy, Jim, “The Cache Memory Book: The Authoritative Reference on Cache Design,” Second Edition, 1998, pp. 64-67 and pp. 204-205.
International Preliminary Report on Patentability and Written Opinion for PCT/US2007/086421, dated Jun. 18, 2009, 8 pages.
International Preliminary Report on Patentability and Written Opinion for PCT/US2011/054378, dated Apr. 11, 2013, 5 pages.
International Search Report and Written Opinion for PCT/US07/86421, dated Apr. 18, 2008, 9 pages.
International Search Report for Application No. PCT/US09/58137, dated Dec. 23, 2009, 14 pages.
International Search Report for Application No. PCT/US10/34676, dated Nov. 29, 2010, 9 pages.
International Search Report for Application No. PCT/US11/54378, dated May 2, 2012, 8 pages.
Jander, M., “Launching Storage-Area Net,” Data Communications, US, McGraw Hill, NY, vol. 27, No. 4 (Mar. 21, 1998), pp. 64-72.
Kornblum, Jesse, “Identifying Almost Identical Files Using Context Triggered Piecewise Hashing,” www.sciencedirect.com, Digital Investigation 3S (2006), pp. S91-S97.
Kulkarni P. et al., “Redundancy elimination within large collections of files,” Proceedings of the Usenix Annual Technical Conference, Jul. 2, 2004, pp. 59-72.
Lortu Software Development, “Kondar Technology-Deduplication,” <http://www.lortu.com/en/deduplication.asp>, Internet accessed on Dec. 5, 2008, 3 pages.
Menezes et al., “Handbook Of Applied Cryptography”, CRC Press, 1996, <http://www.cacr.math.uwaterloo.ca/hac/aboutlchap9.pdf>, 64 pages.
Microsoft, “Computer Dictionary”, p. 249, Fifth Edition, 2002, 3 pages.
Microsoft, “Computer Dictionary”, pp. 142, 150, 192, and 538, Fifth Edition, 2002, 6 pages.
Microsoft, “Computer Dictionary,” Fifth Edition, 2002, p. 220.
Overland Storage, “Data Deduplication,” <http://www.overlandstorage.com/topics/data_deduplication.html>, Internet accessed on Dec. 5, 2008, 2 pages.
Partial Supplementary European Search Report in Application No. 09816825.5, dated Apr. 15, 2015, 6 pages.
Quantum Corporation, “Data De-Duplication Background: A Technical White Paper,” May 2008, 13 pages.
Rosenblum et al., “The Design and Implementation of a Log-Structured File System,” Operating Systems Review SIGOPS, vol. 25, No. 5, New York, US, pp. 1-15 (May 1991).
Searchstorage, “File System”, Nov. 1998, <http://searchstorage.techtarget.com/definition/file-system>, 10 pages.
Sharif, A., “Cache Memory,” Sep. 2005, http://searchstorage.techtarget.com/definition/cache-memory, pp. 1-26.
Techterms.com, “File,” May 17, 2008, 1 page, <https://web.archive.org/web/20080517102848/https://techterms.com/definition/file>.
Webopedia, “Cache,” Apr. 11, 2001, http://web.archive.org/web/20010411033304/http://www.webopedia.com/TERM/c/cache.html pp. 1-4.
Webopedia, “Data Duplication”, Aug. 31, 2006, <http://web.archive.org/web/20060913030559/http://www.webopedia.com/TERMID/data_deduplication.html>, 2 pages.
Webopedia, “File,” May 21, 2008, pp. 1-3, <https://web.archive.org/web/20080521094529/https://www.webopedia.com/TERM/F/file.html>.
Webopedia, “Folder”, Aug. 9, 2002, <https://web.archive.org/web/20020809211001/http://www.webopedia.com/TERM/F/folder.html> pp. 1-2.
Webopedia, “Logical Drive”, Aug. 13, 2004, pp. 1-2, https://web.archive.org/web/20040813033834/http://www.webopedia.com/TERM/L/logical_drive.html.
Webopedia, “LPAR”, Aug. 8, 2002, pp. 1-2, https://web.archive.org/web/20020808140639/http://www.webopedia.com/TERM/L/LPAR.html.
Webopedia, “Metadata”, Apr. 5, 2001, <https://web.archive.org/web/20010405235507/http://www.webopedia.com/TERM/M/metadata.html>, pp. 1-2.

Related Publications (1)

	Number	Date	Country
	20220309032 A1	Sep 2022	US

Provisional Applications (1)

	Number	Date	Country
	61388566	Sep 2010	US

Continuations (5)

	Number	Date	Country
Parent	16934432	Jul 2020	US
Child	17841575		US
Parent	15476613	Mar 2017	US
Child	16934432		US
Parent	15013138	Feb 2016	US
Child	15476613		US
Parent	14595984	Jan 2015	US
Child	15013138		US
Parent	13250824	Sep 2011	US
Child	14595984		US

Archiving data objects using secondary copies

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Abstract