Synchronized data deduplication

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet as filed with the present application are hereby incorporated by reference under 37 CFR 1.57.

BACKGROUND
Technical Field

The present invention generally relates to data deduplication, and more particularly, some embodiments relate to systems and methods for facilitating shared deduplication information.

Description of the Related Art

The storage and retrieval of data is an age-old art that has evolved as methods for processing and using data have evolved. In the early 18th century, Basile Bouchon is purported to have used a perforated paper loop to store patterns used for printing cloth. In the mechanical arts, similar technology in the form of punch cards and punch tape were also used in the 18th century in textile mills to control mechanized looms. Two centuries later, early computers also used punch cards and paper punch tape to store data and to input programs.

However, punch cards were not the only storage mechanism available in the mid-20th century. Drum memory was widely used in the 1950s and 1960s with capacities approaching about 10 kb, and the first hard drive was developed in the 1950s and is reported to have used 50 24-inch discs to achieve a total capacity of almost 5 MB. However, these were large and costly systems and although punch cards were inconvenient, their lower cost contributed to their longevity as a viable alternative.

In 1980 the hard drive broke the 1 GB capacity mark with the introduction of the IBM 3380, which could store more than two gigabytes of data. The IBM 3380, however, was about as large as a refrigerator, weighed ¼ ton, and cost in the range of approximately $97,000 to $142,000, depending on the features selected. This is in stark contrast to contemporary storage systems that provide for storage of hundreds of terabytes of data or more for seemingly instantaneous access by networked devices. Even handheld electronic devices such as digital cameras, MP3 players and others are capable of storing gigabytes of data, and today's desktop computers boast hundreds of gigabytes of storage capacity.

However, with the advent of networked computing, storage of electronic data has migrated from the individual computer to network-accessible storage devices. These include, for example, optical libraries, Redundant Arrays of Inexpensive Disks (RAID), CD-ROM jukeboxes, drive pools and other mass storage technologies. These storage devices are accessible to and can be shared by individual computers such as via a Local Area Network (LAN), a Wide Area Network (WAN), or a Storage Area Network (SAN) to name a few. These client computers not only access their own local storage devices but also storage devices of the network to perform backups, transaction processing, file sharing, and other storage-related operations.

The large volumes of data often stored and shared by networked devices can cause overloading of the limited network bandwidth. For example, during operations such as system backups, transaction processing, file copying and transfer, and other similar operations, the communication bandwidth of the network often becomes the rate-determining factor.

In addition, even with large capacity storage systems, computing enterprises are being overloaded by vast amounts of data. Documents sent via email, for example, can be copied and resent multiple times and several instances of the very same document might be stored many times in many different locations. IT administrators are struggling to keep up with the seemingly exponential increase in the volume of documents, media and other data. This problem is severely compounded by other factors such as the large file sizes often associated with multi-media files, and file proliferation through email and other content sharing mechanisms. However, additional storage capacity requires capital expenditures, consumes power, takes up floor space and burdens administrative overhead. Even with additional storage capacity, the sheer volume of data becomes a strain on backup and data recovery plans, leading to greater risk in data integrity.

As an alternative to simply increasing the amount of storage capacity, contemporary enterprises have turned to compression and other like technologies to reduce the volume of data. One such technology that can be used is known as data deduplication. Data deduplication in its various forms eliminates or reduces the amount of redundant data by implementing policies that strive to reduce the quantity of, or even eliminate, instances of redundant data blocks in storage. With data deduplication, data is broken up into segments or blocks. As new data enters the system, the segments are checked to see if they already exist in storage. If a segment already exists, rather than store that segment again, a pointer to the location of the existing segment is stored.

The segment size selected for data deduplication can be defined at various levels, from small segment sizes (for example, 1 kB or less) to much larger segment sizes, and to entire files. A larger segment size can yield greater space or bandwidth savings on a per-instance basis, however, the opportunities for identifying redundancies may be reduced with larger segment sizes. These tradeoffs can depend on the system with which deduplication is implemented and the types of data or files it handles.

As indicated above, in some instances, deduplication can be performed on a file-by-file basis. With such a system, rather than storing multiple copies of the same file, one instance of the file is stored, for example, in a central repository, and pointers to the file are stored in place of the redundant copies. However, deduplication at the file level can suffer in efficiencies as compared to deduplication using smaller segment sizes because even a small change in the file generally requires that an entire copy of the file be re-stored.

In addition to reducing the amount of storage space consumed, data deduplication can also help to relieve congestion on crowded communication pathways. In addition, the more efficient use of disk space can often allow data retention periods to increase, adding more integrity to the enterprise. Data deduplication is frequently used in conjunction with other forms of data reduction, including conventional data compression algorithms and delta difference storage.

Data deduplication often relies on hashing algorithms that hash the data segments to generate an identifying signature for the segments. Accordingly, each segment is processed using the hashing algorithm to generate a hash value. The resultant hash value is compared against hash values stored in a hash table to determine whether the segment already exists. If so, the segment is replaced with a pointer to the entry in the table containing the appropriate hash value or pointing to the location of the data in storage. Otherwise, the new data is stored and its hash value is added to the table along with an address for the data.

Because hash functions are not perfect, the same hash value can in some cases be returned for segments containing different data. When such a false-positive occurs, the system can mistake new data for already-stored data and fail to store the new segment. Accordingly, multiple hash algorithms and other techniques can be employed to reduce the likelihood of these so-called hash collisions.

BRIEF SUMMARY OF EMBODIMENTS OF THE INVENTION

According to various embodiments, systems and methods are provided for data deduplication. Particularly, in some embodiments, techniques for performing reference table distribution and synchronization are provided. Accordingly, a reference table generated as a result of the deduplication process at a storage repository can be shared among a plurality of client systems that utilize a repository for data storage. This can be implemented to allow the client systems to perform local data deduplication before their data is sent to the repository. Likewise, this can also allow the client systems to receive deduplicated data from the storage repository. Accordingly, systems and methods can be implemented to allow deduplicated data to be transferred among a plurality of computing systems thereby reducing bandwidth requirements for data storage and retrieval operations.

In some embodiments, rather than distribute the entire reference table to each client for synchronization, a proper subset of reference table entries can be identified and shared with the client devices for synchronization. This can be implemented so as to reduce the amount of bandwidth required to synchronize the reference table among the computing systems. In further embodiments, the subset can be identified based on data utilization criteria.

According to an embodiment of the invention, systems and methods are provided for performing data deduplication for data used by a plurality of computing systems. The systems and methods can be configured to perform the steps of receiving at a shared storage repository data from the plurality of computing systems, performing a data deduplication operation on the received data, and transmitting an instantiation of a reference table for the deduplication to determined ones of the plurality of computing systems to allow deduplication to be performed by the determined ones of the plurality of computing systems.

The deduplication operation can include defining a segment of the received data; applying an algorithm to the defined data segment to generate a signature for the defined data segment; comparing the signature for the defined data segment with one or more signatures stored in a reference table for one or more previously defined data segments to determine whether the defined segment is already stored in the shared storage repository; and updating the reference table to include the signature for the defined data segment and a reference for the defined data segment if the defined data segment is not in the shared storage repository.

In one embodiment, a first instantiation of the reference table is sent to a first group of one or more of the plurality of computing systems and a second instantiation of the reference table is sent to a second group of one or more of the plurality of computing systems, wherein the first instantiation of the reference table is different from the second instantiation of the reference table.

The operation can further include a step of determining the instantiation of the reference table to be transmitted, and wherein the instantiation of the reference table is a proper subset of the reference table. The step of determining the instantiation of the reference table can include selecting one or more entries of the reference table based on at least one of utilization rate of data segments represented by the entries and size of the data segments represented by the entries. In another embodiment, the step of determining the instantiation of the reference table comprises selecting one or more entries of the reference table based on a combination of utilization rate of data segments represented by the entries and size of the data segments represented by the entries. Any of a number of combinations can be used. For example, the combination can be a weighted combination of utilization rate of data segments represented by the entries and size of the data segments represented by the entries.

Other features and aspects of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, which illustrate, by way of example, the features in accordance with embodiments of the invention. The summary is not intended to limit the scope of the invention, which is defined solely by the claims attached hereto.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The drawings are provided for purposes of illustration only and merely depict typical or example embodiments of the invention. These drawings are provided to facilitate the reader's understanding of the invention and shall not be considered limiting of the breadth, scope, or applicability of the invention. It should be noted that for clarity and ease of illustration these drawings are not necessarily made to scale.

FIG. 1 is a diagram illustrating an example process for data deduplication in accordance with various embodiments.

FIG. 2 is a diagram illustrating an example environment in which data from multiple computing systems is stored in one or more shared storage facilities.

FIG. 3 is a simplified block diagram illustrating another example environment with which embodiments of the invention can be implemented.

FIG. 4 is a diagram illustrating an example process for reference table synchronization in accordance with one embodiment of the invention.

FIG. 5 is a diagram illustrating an example of reference table synchronization in accordance with one embodiment of the invention.

FIG. 6 is a diagram illustrating an example of client groupings for targeted reference table synchronization in accordance with one embodiment of the invention.

FIG. 7 is a diagram illustrating an example process for reference table subset updating for synchronization in accordance with one embodiment of the invention.

FIG. 8 is a diagram illustrating another example process for reference table subset updating for synchronization in accordance with one embodiment of the invention.

FIG. 9 is a diagram illustrating an example computing system with which aspects of the systems and methods described herein can be implemented in accordance with one embodiment of the invention.

The figures are not intended to be exhaustive or to limit the invention to the precise form disclosed. It should be understood that the invention can be practiced with modification and alteration, and that the invention be limited only by the claims and the equivalents thereof.

DETAILED DESCRIPTION OF THE EMBODIMENTS OF THE INVENTION

The present invention is directed toward a system and method for data deduplication, and more particularly various embodiments are directed toward systems and methods for synchronization of reference tables to facilitate data deduplication. In various embodiments, data is stored for a plurality of clients in a shared storage environment, and rather than transfer large amounts of data among the clients and the shared storage, the data in shared storage is deduplicated and the hash table or other reference table for the data is shared among some or all of the clients. When a client has data to transfer to or place in the shared storage, that client can run a deduplication algorithm on segments of the data and use its own representative instantiation of the reference table to determine whether the data segments already exist in a shared data store. Accordingly, for a given segment, the client can determine whether to send the entire data segment to the shared storage or just send a reference or pointer or other information from the reference table if the segment is duplicative of what is already in the data store. In a situation where the analyzed segment is not in the data store, the client device can send the hash value or other reference table information to the central storage (or other location maintaining the main reference table) so that the primary reference table can be updated with the information on the newly added segment.

In various embodiments, the data store or shared storage repository can comprise any of a number of data storage architectures. For example, in one application, the shared storage can comprise one or more data storage systems accessible by and shared among multiple client systems such as, for example, one or more dedicated storage repositories or centralized storage repositories. In another example, the shared storage repository can comprise a plurality of storage locations distributed across some or all of the multiple clients among which the data is shared, or a combination of distributed and centralized storage devices.

In further embodiments of the invention, rather than send an entire reference table to each of the clients or client groups that are sharing the common storage, a subset of the table can be identified and only that subset is transmitted to the client systems to synchronize the reference tables among all of the devices. Additionally, different client systems or groups of client systems can receive different subsets of the reference table. Subsets can be defined for a given client or group of clients based on data utilization or other factors. Transferring a subset of the reference table rather than the entire reference table can reduce bandwidth consumption across the network, result in increased efficiency and reduce synchronization problems with the database.

For clarification, in the various embodiments described herein, the term synchronization is not intended to require that all client devices be updated with a representative instantiation of the reference table at the same time or that all client devices receive the same reference table. Although in some embodiments the reference table or updates thereto can be broadcast to all participating systems simultaneously (or close to simultaneously), in other embodiments the reference table or its updates can be sent to different client devices or groups of client devices at different times. Likewise, in some embodiments the same reference table, or the same subset can be sent to all clients. However, in other embodiments, subsets can be tailored for a given client or group of clients.

Before describing the invention in detail, it is useful to describe a few example environments with which the invention can be implemented. One such example is that of a straightforward data deduplication algorithm with which the systems and methods described herein can be implemented. FIG. 1 is a diagram illustrating an example process for data deduplication in accordance with various embodiments. It will be appreciated after reading the description provided herein that the various embodiments of the invention not limited to applicability with this example data deduplication process, but can be implemented with any of a variety of forms of data deduplication.

Referring now to FIG. 1, the illustrated example assumes an incoming data stream 120. For purposes of deduplication, the data stream can be segmented into a plurality of preferably equal-length segments. For example, in some embodiments, the data is broken up into segments that are 128 kB in length. In the illustrated example, incoming data stream 120 includes the segments A, B, C, A, C and D in that order. To perform the data deduplication, the computing system 132 receives the data, segments the data, and runs a hash function or other signature generation algorithm against each segment. The computing system 132 checks the resultant hash value for a given segment with hash values stored in hash table 127. If the resultant hash value does not match a hash value already stored, this indicates that the segment is a new segment of data. In this case, this new segment of data is written into file system 124, its hash value is added to hash table 127 as is a pointer to its address in the file system 124.

On the other hand, if the resultant hash value for a given segment already exists in the hash table 127, this indicates that the same data sequence in that segment is already stored in file system 124. Accordingly, rather than storing the entire segment in the file, only the pointer or address to the same segment that was previously stored needs to be retained. Following this methodology for data stream 120 results in the example file 129 illustrated as being stored in file system 124. This example file includes the first instances of segments A, B, C and D. However, for the subsequent instances of segments A and C that occurred in the incoming data stream 120, the file includes a pointer to the originally stored segments A and C. This is illustrated by ADDRESS A and ADDRESS C in file system 124.

To re-create the original data stream, segments are retrieved from file system 124 and assembled in order. Where an address exists in place of the actual data elements of the segment (ADDRESS A and ADDRESS C in the instant example), that address is accessed, the data retrieved and packaged into the reassembled data stream. In this example, resultant data stream 121 contains the same data as existed in original data stream 120.

Although the illustrated example depicts a system that utilizes a simple reference table having a hash value and pointer value for each segment, more complex systems can also make up an environment for the systems and methods described herein. For example, for the hash values the reference table can also include the source or sources of the data segment represented by the hash, a counter of the number of times that a given segment is encountered, the location of where the segments occur on client devices, and so on. As such, the reference table can be implemented as a hybrid of a hash table and a file allocation table (FAT). This can be useful as a backup in the event that a client system crashes or otherwise goes off line.

The above example is described in terms of an individual computing system (having one or more computing devices) performing local data deduplication for local data storage using a hash function. Data deduplication can also be performed for a plurality of computing systems using shared or local data storage or a combination thereof. For example, the data segments need not be stored at a central location such as file system 124 but can be stored at one or more client locations or at a combination of client locations and central storage locations. Accordingly, the pointers or addresses stored in the reference table can point to the actual storage location of the referenced segment whether that location be at a client storage location or in a central storage repository.

In addition, techniques other than hash functions can be used for data deduplication. Other algorithms can be used to generate a signature for the blocks of data. Likewise, other deduplication methods can also be used to identify redundancies or duplicate entries. Accordingly, the terms hash table, signature table, or reference table might be used in this document interchangeably to refer to the table, index, or other like mechanism used to track the data deduplication process, regardless of the actual file structure and regardless of the function used to arrive at the signatures.

As illustrated in the example of FIG. 1, data deduplication can be implemented with a single or small computing system using local data storage. In other examples data deduplication can be implemented for data communications as well as for environments where information storage is at a centralized or other shared location, or spread across storage devices associated with multiple computing systems. FIG. 2 is a diagram illustrating an example environment in which data from multiple computing systems is stored in one or more shared storage facilities. Referring now to FIG. 2, the illustrated example includes a plurality of computer systems 132 connected via one or more networks 147, 149 to two network-accessible storage facilities. These storage facilities in this example include a storage area network 128 and a network attached storage facility 137. Networks 147, 149 can be implemented utilizing any of a number of network technologies or topologies. The physical layer can include, for example fiber, copper, or wireless communication channels.

In this example, storage area network 128 can include a plurality of data storage devices 122 to provide sufficient quantities of data storage for the networked computing systems 132. For example, hard disk drives, disk arrays, optical storage drives and other high-volume memory or storage devices can be included with storage area network 128 to provide desired amounts of data storage at specified access speeds. Similarly, network attached storage can include any variety of data storage devices 122 to provide sufficient quantities of data storage at desired access speeds. Illustrated in this example, network attached storage 137 includes removable storage media 124, although fixed media can also be used. Likewise, data storage 122 associated with storage area network 128 can also use fixed or removable media.

Computing systems 132 connected to networks 147, 149 typically include application software 122 to perform desired operations. Although not illustrated, computing systems 132 typically also include an operating system on which the application software 122 runs. The file system 124 can be provided to facilitate and control file access by the operating system and application software 122. File systems 122 can facilitate access to local and remote storage devices for file or data access and storage. As also illustrated, computer systems 132 can include local storage such as a media module media drive 126 with fixed or removable media 136.

FIG. 3 is a simplified block diagram illustrating another example environment with which embodiments of the invention can be implemented. In the example illustrated in FIG. 3, a plurality of computer systems 132 rely on a centralized server repository system 151 for data storage. In such an environment, computing systems 132 may retain some level of local data storage but may also rely on repository system 151 for larger-volume data storage. In such environments, computer systems 132 can transfer data files and the like to repository system 151 for storage via, for example, a communication network. Preferably, to reduce the volume of storage at repository system 151, data deduplication can be performed on data items that are received for storage using any of a number of different data deduplication techniques.

In one example, as data is received from a computer system 132 for storage, repository system 151 performs the data deduplication in an in-line or post-processing methodology for storage. For example, in terms of the exemplary deduplication methodology described above with respect to FIG. 1, repository system 151 can break up the received data into a plurality of data segments or chunks; hash or otherwise process the bit patterns in each segment to generate a hash value or other signature; and compare the signature value of the newly received segment to signatures already stored in table 127. If the signature value already exists in table 127, this indicates the same bit sequence is already in the data storage and accordingly, the pointer to that segment is retrieved from table 127 and inserted in the file in place of that segment. The reference table 127 can be updated to include information reflecting this instance of the data segment. For example, a reference counter for the signature value can be incremented, the source of this instance of the data segment can be added to the table, the location of where the segment existed on the client can be added and so on.

If, on the other hand, the signature value does not already exist in table 127, the bit sequence is not in data storage. In such a case, this segment is placed into storage and the signature is placed in a new entry in table 127 along with a pointer to the storage location of the new segment. The reference table 127 can also include additional information reflecting information about the data segment such as, for example, the source of this first instance of the data segment, the location of where the segment existed on the client, and a reference counter indicating the number of times the segment was encountered.

As the example of FIG. 3 illustrates, in environments where data is stored and deduplicated at a centralized repository, large volumes of data that have not been deduplicated might still be stored or utilized locally and communicated across the network or other communication channels between the clients and the repository. The same scenario can hold true with other environments including the example illustrated in FIG. 2. Accordingly, in various embodiments, the hash table or other like reference table can be shared among the various computer systems 132 so that synchronized deduplication can be performed. This can be accomplished by sending an instantiation of the reference table to the client devices 132. This instantiation can be the entire reference table itself, or a subset of the reference table.

Sharing the reference table with the client computing systems 132 allows the client systems 132 to deduplicate the data before it is passed to repository system 151 for storage. Further, this can allow the client systems 132 to consider segments already stored in the repository system 151 when doing their local deduplication. With such an arrangement, and assuming again the example duplication process described above with respect to FIG. 1, a client system 132 can segment its data, hash the data to obtain a signature, and compare the obtain signature with the signatures in its local hash table. If the signature already exists, this indicates that the segment already exists in storage, assuming it is not a false positive. Accordingly, rather than transmit the entire segment to the repository system 141, client computing system 132 can retrieve the designated pointer for the signature from its local table and pass that pointer along in place of the data.

For large networks or other large computing environments, the hash table or other reference table 127 can grow to be quite large. Where this is the case, a substantial amount of bandwidth can be consumed by synchronizing the reference table 127 amongst repository system 151 and the various client computing systems 132. This situation can be compounded where large amounts of data are being stored by a large number of computing systems 132. In addition, as the repository is pruned, further reference-table synchronization opportunities are presented, leading to additional bandwidth consumption. Accordingly, embodiments can be implemented wherein a proper subset or portion of the reference table 127 is shared among the multiple computing systems to reduce, minimize or avoid sending the entire reference table 127 to the various systems for synchronization. For example, in one implementation, an entire reference table 127 can be sent to the client devices initially, and updates to synchronize the table to account for ongoing changes can be done by sending a proper subset containing less than the entire original table. As another example, in another embodiment, a proper subset is defined and sent initially rather than sending the entire table. Then, updates to the subset are made on an ongoing basis.

FIG. 4 is a diagram illustrating an example process for reference table synchronization in accordance with one embodiment of the invention. This example is described with reference to the exemplary environment of FIG. 3 and assumes the exemplary data deduplication process described above with reference to FIG. 1. Description of this and other embodiments in terms of this exemplary environment an exemplary data deduplication process is made for ease of discussion purposes only. After reading these descriptions, one of ordinary skill will understand how the various embodiments described herein can be used in other data storage and communication environments and with other deduplication algorithms or processes.

Referring now to FIG. 4, in step 181 data for deduplication and storage IS received. In terms of the exemplary environments described above, data can be received from a client device 132 by storage repository 151 for storage. Storage repository 151 deduplicates the data, creating a reference table in the process. For example, in terms of the example described above with respect to FIG. 1, storage repository 151 segments of data, hashes each segment to create a hash value signature, and compares the signature to existing entries in the reference table 127. Accordingly, a result of step 181 is storage of deduplicated data and creation of the reference table (such as, for example a hash table 127).

As illustrated at step 185, as additional data is received by storage repository 151 and other data removed from storage repository 151, the reference table is updated and maintained by storage repository 151. For example, new entries to the reference table are made to include signatures and pointers for new data segments received and old signatures and pointers are removed from the table as segments are deleted from the repository.

At step 188, the system performs reference table selection to identify a subset of entries in the reference table for streamlined reference-table-synchronization operations. Various algorithms or processes can be used to identify or define a subset of entries for streamlined synchronization. Generally, in one embodiment, algorithms are implemented to identify those data segments being stored in repository system 151 that have the highest utilization for highest likelihood of occurrence. In such an embodiment, synchronization of a relatively small portion of the reference table can result in bandwidth savings of a relatively larger proportion.

At step 192, the reference table is synchronized with one or more client devices 132. In the illustrated example, it is the subset identified in step 188 that is shared with (for example, sent to) client devices 132 to synchronize or update their local instances of the reference table. The process can continue through subsequent deduplication operations in which the reference table is updated with new entries, relevant subsets of the entries are identified in light of the changes to the reference table, and synchronization performed based on the reference table subset. This is indicated by step 195, which shows the reference-table subset generation and synchronization being performed on a periodic basis. In addition to adding new entries to the reference table for newly stored segments, updates to the reference table can also include updates to reference counter values, sources of data segments, and so on. In one embodiment, the synchronization operations can be run on a periodic basis based on temporal criteria such as the expiration of a period of time, or they can be triggered based on throughput metrics or other criteria. Examples of throughput criteria can include criteria such as, for example, the amount of new data stored in or removed from the system, the number of updates made to the reference table, and so on.

As indicated above with reference to step 188, one criteria that can be used to define the relevant subsets for reference table is based on data utilization. For example, in one embodiment, the system tracks not only the existence of a segment in the repository but also the utilization of each of the segments. One way in which utilization can be tracked is by tracking the quantity or frequency of occurrences of a given segment or the number of times it is accessed. The segments can be scored based on the utilization or access rates and ranked accordingly. This can be used in one embodiment to define or identify segments whose signatures will appear on the reference table subset.

As one example, the number of times that a given segment is presented to repository 151 for storage can be tracked by the system. This number can, in many applications, be directly proportional to the amount of communication bandwidth that is being consumed by transferring the segment from client devices 132 to storage repository 151. Accordingly, these higher utilization segments tend to have a larger impact on system bandwidth than segments within lower utilization. In such environments, defining the reference table subset based on utilization can allow a trade-off between to be made reference table size and marginal improvements in bandwidth savings. In other words, where a reference table for synchronization includes entries for infrequently used data segments, inclusion of these entries in the table for synchronization could consume more bandwidth than is saved by allowing these entries to be used for deduplication at the client side.

Another way to define a subset of the reference table for table synchronization can be to identify changes to the table since the last synchronization operation, such as new table entries or deletions. With changes identified, the system can be configured to send only those changes to the clients to update their local copies of the table. As noted above, in some embodiments not all clients are updated at the same time. Accordingly, changes to the reference table can be tracked on a client-by-client basis, or on the basis of groups of clients so that the updates can be managed based on the actual need of given clients.

FIG. 5 is a diagram illustrating an example of reference table synchronization in accordance with one embodiment of the invention. Referring now to FIG. 5, in this example a reference table 211 is maintained in server repository system 151 for data deduplication purposes. Reference table 215 represents distribution of reference table 211 or portions thereof to the client computer systems 132. As indicated above, reference table 215 can be a subset of reference table 211 and can be updated using subsets identified for reference table 211 such as, for example, in accordance with the embodiment described above with reference to FIG. 4. In an optimized system, a trade-off is made between the number of entries of the subset reference table pushed to each client system 132 and the actual or anticipated bandwidth saved by the inclusion of each additional entry.

In various environments, the process might be further streamlined by defining the reference table subset differently for different clients or different groups of clients 132. Thus, for example, different subsets of reference table 211 can be sent to different computing devices resulting in some or all of the client devices 132 having a reference table that is different from other client devices 132. As another example, client devices 132 can be combined into groupings based on various characteristics (described below) and different subsets of reference table 211 can be sent to the different groupings of client devices. FIG. 6 is a diagram illustrating an example of client groupings for targeted reference table synchronization in accordance with one embodiment of the invention. Referring now to FIG. 6, client systems 132 are illustrated as being broken into two groupings 201, 202. This example also illustrates that two different reference tables 216, 217 are distributed to groupings 201,202, respectively.

A number of factors or criteria can be considered when identifying targeted reference tables for a client and a group of clients. For example, the system can be configured to analyze the traffic received from each client or client grouping and perform client-specific utilization measurements. The most utilized entries can be identified on a client-by-client basis and the reference table subset identified accordingly. In another example, the reference table subsets for particular clients or groups of clients can be identified based on a number of other criteria including, for example, they size of the segments utilized by each of the clients, the type of processes being performed by each client, the client environment, characteristics of the client system, and so on. Each of these may have an effect on the quantity and type of data to be stored. As noted above, in some embodiments data in the reference table can be included to indicate the source of the data, its storage location, the number of occurrences and so on. Such data can be used in making the determination as to which clients or groups of clients will receive which subsets of the reference table.

As stated above, both segment size and utilization can be used as a metric to identify a subset of entries for the targeted reference tables for a client or group of clients. For example, in embodiments where stored segments can be of different sizes, selecting reference table data for larger segments would generally result in a greater gain in efficiencies than sharing reference data for smaller segments. Likewise, selecting reference table data for more frequently encountered segments generally results in a greater gain in efficiencies than sharing of reference data for infrequently used data. However, these generalizations are not always without exception. Consider for example a large segment of data that is very rarely used, or consider the opposite case of a smaller segment of data that is frequently utilized by a client or group of clients. In the first case, selecting the rarely used large segment for the subset may not yield the greatest gain in efficiencies. Therefore, a combination of factors, such as a combination of object size and utilization can be used to determine the subset of entries for synchronization.

Such a combination can be made, for example, by multiplying the size of an object by its utilization frequency. As another example such a combination can be made as a weighted combination of utilization frequency and object size. As a further illustration of this latter example, objects can be weighted by their size, with larger objects being weighted higher than smaller objects. Their weight can then be multiplied by their utilization frequency to rank them for subset selection. Weighting can be done on a sliding scale or it can be done in groupings based on data object sizes in the system. As an example of such a grouping, the top 20% of objects by size can be given a weighting of 100%, the next 20% by size a weighting of 80% and so on. As another example, size thresholds can be defined such that objects above a certain size threshold can be given a certain weighting. As a specific example of this, objects above 1 MB might be weighted 100%, objects between 750 kB and 1 MB weighted 90% and so on. As these examples illustrate, there are a number of specific implementations that can be used to define a subset based on a combination of factors such as block size and frequency of utilization.

FIG. 7 is a diagram illustrating an example process for reference table subset updating in accordance with one embodiment of the invention. At step 227, client systems 132 that utilize repository system 151 can be identified. In embodiments where reference tables are targeted to particular clients or groups of clients, client characteristics can also be identified. At steps 231 and 234, a subset of the reference table is identified for a given client or client grouping and that subset is sent to that client or client group for synchronization. As illustrated by step 237, this process can be repeated for a plurality of clients or groups of clients.

At step 240, the reference table is monitored as data is received and deduplicated. The table is checked to determine whether changes are made to the reference table as a result of the new data. This check can also be made for data that is deleted from storage. This monitoring can be performed at the client side, the central repository, or at both locations. If changes are made, the process of identifying the subset and resynchronizing one or more of the clients with the new subset can be repeated as illustrated by step 244. In one example, the repository can receive data from a client, process and deduplicate the data for storage, and identify and send a representative hash table to the client for use in future storage operations. Because the data and hence the reference table can change over time, the subset can be re-defined and redistributed to the clients. As noted above, in one embodiment the subset is resent while in another embodiment only deltas to the subset are sent to the client to update the client's reference table.

FIG. 8 is a diagram illustrating another example process for reference table subset updating for synchronization in accordance with one embodiment of the invention. At step 262 and 266, a client system obtains data for storage and hashes representative segments of that data to determine de duplication signatures. Any of a number of techniques can be used to identify the representative segments that are determined for this process. For example, the first several segments can be chosen, a periodic or random sampling can be made, the most utilized segments from previous de duplication operations can be chosen, or other criteria used to identify a representative sampling. In steps 270 and 275, the signatures for these representative segments are sent to repository 151 and compared with signatures already existing at the central repository. As illustrated by steps 280 in 284, if changes are made the tables can be resynchronized.

The deduplication table subsets defined in the various embodiments can be used by the client devices for deduplication of data before it is sent to a central repository for storage. Because reference table updates can be client specific and because they can also be sent to different clients at different times, in some embodiments the synchronization does not result in an exact copy of the reference table being instantiated at each client device. Instead, at any given time, different clients can have different instantiations of the reference table and these instantiations are preferably selected based on size, utilization and other factors specific to the client or client group.

As used herein, the term module might describe a given unit of functionality that can be performed in accordance with one or more embodiments of the present invention. As used herein, a module might be implemented utilizing any form of hardware, software, or a combination thereof. For example, one or more processors, controllers, ASICs, PLAs, logical components, software routines or other mechanisms might be implemented to make up a module. In implementation, the various modules described herein might be implemented as discrete modules or the functions and features described can be shared in part or in total among one or more modules. In other words, as would be apparent to one of ordinary skill in the art after reading this description, the various features and functionality described herein may be implemented in any given application and can be implemented in one or more separate or shared modules in various combinations and permutations. Even though various features or elements of functionality may be individually described or claimed as separate modules, one of ordinary skill in the art will understand that these features and functionality can be shared among one or more common software and hardware elements, and such description shall not require or imply that separate hardware or software components are used to implement such features or functionality.

Where components or modules of the invention are implemented in whole or in part using software, in one embodiment, these software elements can be implemented to operate with a computing or processing module capable of carrying out the functionality described with respect thereto. One such example-computing module is shown in FIG. 9. Various embodiments are described in terms of this example-computing module 300. After reading this description, it will become apparent to a person skilled in the relevant art how to implement the invention using other computing modules or architectures.

Referring now to FIG. 9, computing module 300 may represent, for example, computing or processing capabilities found within desktop, laptop and notebook computers; hand-held computing devices (PDA's, smart phones, cell phones, palmtops, etc.); mainframes, supercomputers, workstations or servers; or any other type of special-purpose or general-purpose computing devices as may be desirable or appropriate for a given application or environment. Computing module 300 might also represent computing capabilities embedded within or otherwise available to a given device. For example, a computing module might be found in other electronic devices such as, for example, digital cameras, navigation systems, cellular telephones, portable computing devices, modems, routers, WAPs, terminals and other electronic devices that might include some form of processing capability.

Computing module 300 might include, for example, one or more processors, controllers, control modules, or other processing devices, such as a processor 304. Processor 304 might be implemented using a general-purpose or special-purpose processing engine such as, for example, a microprocessor, controller, or other control logic. In the example illustrated in FIG. 9, processor 304 is connected to a bus 302, although any communication medium can be used to facilitate interaction with other components of computing module 300 or to communicate externally.

Computing module 300 might also include one or more memory modules, simply referred to herein as main memory 308. For example, preferably random access memory (RAM) or other dynamic memory might be used for storing information and instructions to be executed by processor 304. Main memory 308 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 304. Computing module 300 might likewise include a read only memory (“ROM”) or other static storage device coupled to bus 302 for storing static information and instructions for processor 304.

The computing module 300 might also include one or more various forms of information storage mechanism 310, which might include, for example, a media drive 312 and a storage unit interface 320. The media drive 312 might include a drive or other mechanism to support fixed or removable storage media 314. F or example, a hard disk drive, a floppy disk drive, a magnetic tape drive, an optical disk drive, a CD or DVD drive (R or RW), or other removable or fixed media drive might be provided. Accordingly, storage media 314, might include, for example, a hard disk, a floppy disk, magnetic tape, cartridge, optical disk, a CD or DVD, or other fixed or removable medium that is read by, written to or accessed by media drive 312. As these examples illustrate, the storage media 314 can include a computer usable storage medium having stored therein computer software or data.

In alternative embodiments, information storage mechanism 310 might include other similar instrumentalities for allowing computer programs or other instructions or data to be loaded into computing module 300. Such instrumentalities might include, for example, a fixed or removable storage unit 322 and an interface 320. Examples of such storage units 322 and interfaces 320 can include a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory module) and memory slot, a PCMCIA slot and card, and other fixed or removable storage units 322 and interfaces 320 that allow software and data to be transferred from the storage unit 322 to computing module 300.

Computing module 300 might also include a communications interface 324. Communications interface 324 might be used to allow software and data to be transferred between computing module 300 and external devices. Examples of communications interface 324 might include a modem or softmodem, a network interface (such as an Ethernet, network interface card, WiMedia, IEEE 802.XX or other interface), a communications port (such as for example, a USB port, IR port, RS232 port Bluetooth® interface, or other port), or other communications interface. Software and data transferred via communications interface 324 might typically be carried on signals, which can be electronic, electromagnetic (which includes optical) or other signals capable of being exchanged by a given communications interface 324. These signals might be provided to communications interface 324 via a channel 328. This channel 328 might carry signals and might be implemented using a wired or wireless communication medium. These signals can deliver the software and data from memory or other storage medium in one computing system to memory or other storage medium in computing system 300. Some examples of a channel might include a phone line, a cellular link, an RF link, an optical link, a network interface, a local or wide area network, and other wired or wireless communications channels.

In this document, the terms “computer program medium” and “computer usable medium” are used to generally refer to physical storage media such as, for example, memory 308, storage unit 320, and media 314. These and other various forms of computer program media or computer usable media may be involved in storing one or more sequences of one or more instructions to a processing device for execution. Such instructions embodied on the medium, are generally referred to as “computer program code” or a “computer program product” (which may be grouped in the form of computer programs or other groupings). When executed, such instructions might enable the computing module 300 to perform features or functions of the present invention as discussed herein.

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not of limitation. Likewise, the various diagrams may depict an example architectural or other configuration for the invention, which is done to aid in understanding the features and functionality that can be included in the invention. The invention is not restricted to the illustrated example architectures or configurations, but the desired features can be implemented using a variety of alternative architectures and configurations. Indeed, it will be apparent to one of skill in the art how alternative functional, logical or physical partitioning and configurations can be implemented to implement the desired features of the present invention. Also, a multitude of different constituent module names other than those depicted herein can be applied to the various partitions. Additionally, with regard to flow diagrams, operational descriptions and method claims, the order in which the steps are presented herein shall not mandate that various embodiments be implemented to perform the recited functionality in the same order unless the context dictates otherwise.

Although the invention is described above in terms of various exemplary embodiments and implementations, it should be understood that the various features, aspects and functionality described in one or more of the individual embodiments are not limited in their applicability to the particular embodiment with which they are described, but instead can be applied, alone or in various combinations, to one or more of the other embodiments of the invention, whether or not such embodiments are described and whether or not such features are presented as being a part of a described embodiment. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments.

Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. As examples of the foregoing: the term “including” should be read as meaning “including, without limitation” or the like; the term “example” is used to provide exemplary instances of the item in discussion, not an exhaustive or limiting list thereof; the terms “a” or “an” should be read as meaning “at least one,” “one or more” or the like; and adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known” and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. Likewise, where this document refers to technologies that would be apparent or known to one of ordinary skill in the art, such technologies encompass those apparent or known to the skilled artisan now or at any time in the future.

The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent. The use of the term “module” does not imply that the components or functionality described or claimed as part of the module are all configured in a common package. Indeed, any or all of the various components of a module, whether control logic or other components, can be combined in a single package or separately maintained and can further be distributed in multiple groupings or packages or across multiple locations.

Additionally, the various embodiments set forth herein are described in terms of exemplary block diagrams, flow charts and other illustrations. As will become apparent to one of ordinary skill in the art after reading this document, the illustrated embodiments and their various alternatives can be implemented without confinement to the illustrated examples. For example, block diagrams and their accompanying description should not be construed as mandating a particular architecture or configuration.

Claims

1. A computer-implemented data deduplication method, the method comprising: with one or more computing systems of a shared storage system that maintains a deduplicated data store accessible by a plurality of client computing systems: maintaining a first reference table including a plurality of data segment references corresponding to a plurality of data segments stored in the deduplicated data store;based on one or more of data segment size information and data segment utilization frequency information, determining a first subset of the data segment references in the first reference table to transmit for inclusion in a second reference table accessible by a first client computing system of the plurality of client computing systems, the first subset including a reference to the first data segment;transmitting the first subset for inclusion in the second reference table accessible by the first client computing system of the plurality of client computing systems;based on one or more of data segment size information and data segment utilization frequency information, determining a second subset of the data segment references in the first reference table to transmit for inclusion in a third reference table accessible by a second client computing system of the plurality of client computing systems, the second subset different than the first subset; andtransmitting the second subset for inclusion in the third reference table accessible by the second client computing system of the plurality of client computing systems.
2. The method of claim 1 wherein the data segment references are hash signatures calculated based on the corresponding data segments.
3. The method of claim 1, wherein the second and third reference tables are different partial versions of the first reference table.
4. The method of claim 1 wherein the second reference table is local to the first client computing system and the third reference table is local to the second client computing system, and the first reference table is remote from the first and second client computing systems.
5. The method of claim 1, wherein said determining the first subset and said determining the second subset are based on data segment utilization frequency information and not data segment size information.
6. The method of claim 1, wherein said determining the first subset and said determining the second subset are based on data segment size information and not data segment utilization frequency information.
7. The method of claim 1, wherein said determining the first subset is based on a weighted combination of data segment utilization frequency information and data segment size information.
8. The method of claim 1 wherein said determining the first subset is in response to receiving the first data segment at the shared storage system from the first client computing system.
9. The method of claim 1 further comprising, subsequent to said the second subset for inclusion in the third reference table, receiving a signature corresponding to the first data segment from the first client computing system, without receiving the first data segment itself.
10. A non-transitory computer readable-medium comprising computer program code that, when executed by a computing device of a shared storage system that maintains a deduplicated data store accessible by a plurality of client computing systems, causes the computing device to perform operations comprising: maintaining a first reference table including a plurality of data segment references corresponding to a plurality of data segments stored in the deduplicated data store;based on one or more of data segment size information and data segment utilization frequency information, determining a first subset of the data segment references in the first reference table to transmit for inclusion in a second reference table accessible by a first client computing system of the plurality of client computing systems, the first subset including a reference to the first data segment;transmitting the first subset for inclusion in the second reference table accessible by the first client computing system of the plurality of client computing systems;based on one or more of data segment size information and data segment utilization frequency information, determining a second subset of the data segment references in the first reference table to transmit for inclusion in a third reference table accessible by a second client computing system of the plurality of client computing systems, the second subset different than the first subset; andtransmitting the second subset for inclusion in the third reference table accessible by the second client computing system of the plurality of client computing systems.
11. The non-transitory computer readable-medium of claim 10, wherein the data segment references are hash signatures calculated based on the corresponding data segments.
12. The non-transitory computer readable-medium of claim 10, wherein the second and third reference tables are different partial versions of the first reference table.
13. The non-transitory computer readable-medium of claim 10, wherein the second reference table is local to the first client computing system and the third reference table is local to the second client computing system, and the first reference table is remote from the first and second client computing systems.
14. The non-transitory computer readable-medium of claim 10, where said determining the first subset is based on a weighted combination of data segment utilization frequency information and data segment size information.
15. A system comprising: one or more memory devices containing a deduplicated data store accessible by a plurality of client computing systems; andone or more physical processors configured to execute instructions to cause a computing system to: maintain a first reference table including a plurality of data segment references corresponding to a plurality of data segments stored in the deduplicated data store;based on one or more of data segment size information and data segment utilization frequency information, determine a first subset of the data segment references in the first reference table to transmit for inclusion in a second reference table accessible by a first client computing system of the plurality of client computing systems, the first subset including a reference to the first data segment;transmit the first subset for inclusion in the second reference table accessible by the first client computing system of the plurality of client computing systems;based on one or more of data segment size information and data segment utilization frequency information, determine a second subset of the data segment references in the first reference table to transmit for inclusion in a third reference table accessible by a second client computing system of the plurality of client computing systems, the second subset different than the first subset; andtransmit the second subset for inclusion in the third reference table accessible by the second client computing system of the plurality of client computing systems.
16. The system of claim 15, wherein the data segment references are hash signatures calculated based on the corresponding data segments.
17. The system of claim 15, wherein the second and third reference tables are different partial versions of the first reference table.
18. The system of claim 15, wherein the second reference table is local to the first client computing system and the third reference table is local to the second client computing system, and the first reference table is remote from the first and second client computing system.
19. The system of claim 15, wherein the determination of the first subset and the determination of the second subset are based on data segment utilization frequency information and not data segment size information.
20. The system of claim 15, wherein the determination of the first subset and the determination of the second subset are based on data segment size information and not data segment utilization frequency information.

US Referenced Citations (632)

Number	Name	Date	Kind
4084231	Capozzi et al.	Apr 1978	A
4267568	Dechant et al.	May 1981	A
4283787	Chambers	Aug 1981	A
4417321	Chang et al.	Nov 1983	A
4641274	Swank	Feb 1987	A
4654819	Stiffler et al.	Mar 1987	A
4686620	Ng	Aug 1987	A
4912637	Sheedy et al.	Mar 1990	A
4995035	Cole et al.	Feb 1991	A
5005122	Griffin et al.	Apr 1991	A
5093912	Dong et al.	Mar 1992	A
5133065	Cheffetz et al.	Jul 1992	A
5193154	Kitajima et al.	Mar 1993	A
5212772	Masters	May 1993	A
5226157	Nakano et al.	Jul 1993	A
5239647	Anglin et al.	Aug 1993	A
5241668	Eastridge et al.	Aug 1993	A
5241670	Eastridge et al.	Aug 1993	A
5276860	Fortier et al.	Jan 1994	A
5276867	Kenley et al.	Jan 1994	A
5287500	Stoppani, Jr.	Feb 1994	A
5301286	Rajani	Apr 1994	A
5321816	Rogan et al.	Jun 1994	A
5333315	Saether et al.	Jul 1994	A
5347653	Flynn et al.	Sep 1994	A
5403639	Belsan	Apr 1995	A
5410700	Fecteau et al.	Apr 1995	A
5420996	Aoyagi	May 1995	A
5448724	Hayashi et al.	Sep 1995	A
5454099	Myers et al.	Sep 1995	A
5491810	Allen	Feb 1996	A
5495607	Pisello et al.	Feb 1996	A
5499367	Bamford et al.	Mar 1996	A
5504873	Martin et al.	Apr 1996	A
5544345	Carpenter et al.	Aug 1996	A
5544347	Yanai et al.	Aug 1996	A
5559957	Balk	Sep 1996	A
5559991	Kanfi	Sep 1996	A
5619644	Crockett et al.	Apr 1997	A
5625793	Mirza	Apr 1997	A
5638509	Dunphy et al.	Jun 1997	A
5642496	Kanfi	Jun 1997	A
5673381	Huai et al.	Sep 1997	A
5699361	Ding et al.	Dec 1997	A
5720026	Uemura	Feb 1998	A
5729743	Squibb	Mar 1998	A
5732240	Caccavale	Mar 1998	A
5751997	Kullick et al.	May 1998	A
5758359	Saxon	May 1998	A
5761677	Senator et al.	Jun 1998	A
5764972	Crouse et al.	Jun 1998	A
5765173	Cane et al.	Jun 1998	A
5778395	Whiting et al.	Jul 1998	A
5790828	Jost	Aug 1998	A
5812398	Nielsen	Sep 1998	A
5813008	Benson et al.	Sep 1998	A
5813009	Johnson et al.	Sep 1998	A
5813017	Morris	Sep 1998	A
5875478	Blumenau	Feb 1999	A
5875481	Ashton	Feb 1999	A
5878408	Van Huben	Mar 1999	A
5887134	Ebrahim	Mar 1999	A
5901327	Ofek	May 1999	A
5907672	Matze	May 1999	A
5924102	Perks	Jul 1999	A
5930831	Marsh et al.	Jul 1999	A
5940833	Benson	Aug 1999	A
5950205	Aviani, Jr.	Sep 1999	A
5956519	Wise	Sep 1999	A
5974563	Beeler, Jr.	Oct 1999	A
5990810	Williams	Nov 1999	A
6021415	Cannon et al.	Feb 2000	A
6026414	Anglin	Feb 2000	A
6038379	Fletcher et al.	Mar 2000	A
6044437	Reinders	Mar 2000	A
6052735	Ulrich et al.	Apr 2000	A
6076148	Kedem et al.	Jun 2000	A
6094416	Ying	Jul 2000	A
6131095	Low et al.	Oct 2000	A
6131190	Sidwell	Oct 2000	A
6148412	Cannon et al.	Nov 2000	A
6154787	Urevig et al.	Nov 2000	A
6161111	Mutalik et al.	Dec 2000	A
6163856	Dion	Dec 2000	A
6167402	Yeager	Dec 2000	A
6212512	Barney et al.	Apr 2001	B1
6260069	Anglin	Jul 2001	B1
6269431	Dunham	Jul 2001	B1
6275953	Vahalia et al.	Aug 2001	B1
6286084	Wexler et al.	Sep 2001	B1
6289432	Ault	Sep 2001	B1
6301592	Aoyama et al.	Oct 2001	B1
6324581	Xu et al.	Nov 2001	B1
6328766	Long	Dec 2001	B1
6330570	Crighton	Dec 2001	B1
6330642	Cartea	Dec 2001	B1
6343324	Hubis et al.	Jan 2002	B1
RE37601	Eastridge et al.	Mar 2002	E
6353878	Dunham	Mar 2002	B1
6356801	Goodman et al.	Mar 2002	B1
6366986	St. Pierre	Apr 2002	B1
6366988	Skiba	Apr 2002	B1
6374336	Peters	Apr 2002	B1
6389432	Pothapragada et al.	May 2002	B1
6389433	Bolosky et al.	May 2002	B1
6397308	Ofek	May 2002	B1
6418478	Ignatius et al.	Jul 2002	B1
6421711	Blumenau et al.	Jul 2002	B1
6425057	Cherkasova et al.	Jul 2002	B1
6438368	Phillips	Aug 2002	B1
6487561	Ofek et al.	Nov 2002	B1
6496850	Bowman-Amuah	Dec 2002	B1
6519679	Devireddy et al.	Feb 2003	B2
6538669	Lagueux, Jr. et al.	Mar 2003	B1
6542972	Ignatius et al.	Apr 2003	B2
6557030	Hoang	Apr 2003	B1
6557089	Reed	Apr 2003	B1
6564228	O'Connor	May 2003	B1
6625623	Midgley et al.	Sep 2003	B1
6658436	Oshinsky et al.	Dec 2003	B2
6658526	Nguyen et al.	Dec 2003	B2
6662198	Satyanarayanan	Dec 2003	B2
6665815	Goldstein	Dec 2003	B1
6704730	Moulton et al.	Mar 2004	B2
6721767	De Meno et al.	Apr 2004	B2
6732125	Autry	May 2004	B1
6757794	Cabrera et al.	Jun 2004	B2
6760723	Oshinsky et al.	Jul 2004	B2
6760812	Degenaro et al.	Jul 2004	B1
6779093	Gupta	Aug 2004	B1
6789161	Blendermann	Sep 2004	B1
6799258	Linde	Sep 2004	B1
6810398	Moulton	Oct 2004	B2
6823377	Wu et al.	Nov 2004	B1
6865655	Andersen	Mar 2005	B1
6886020	Zahavi	Apr 2005	B1
6912629	West et al.	Jun 2005	B1
6952758	Chron et al.	Oct 2005	B2
6983351	Gibble	Jan 2006	B2
7003641	Prahlad et al.	Feb 2006	B2
7028096	Lee	Apr 2006	B1
7035880	Crescenti et al.	Apr 2006	B1
7065619	Zhu et al.	Jun 2006	B1
7082441	Zahavi	Jul 2006	B1
7085904	Mizuno et al.	Aug 2006	B2
7100089	Phelps	Aug 2006	B1
7103617	Phatak	Sep 2006	B2
7107298	Prahlad et al.	Sep 2006	B2
7107395	Ofek	Sep 2006	B1
7117246	Christenson et al.	Oct 2006	B2
7130860	Pachet	Oct 2006	B2
7130970	Devassy et al.	Oct 2006	B2
7143091	Charnock	Nov 2006	B2
7155465	Lee	Dec 2006	B2
7155633	Tuma	Dec 2006	B2
7162496	Ran et al.	Jan 2007	B2
7174433	Kottomtharayil et al.	Feb 2007	B2
7194454	Hansen	Mar 2007	B2
7197665	Goldstein	Mar 2007	B2
7225210	Guthrie, II	May 2007	B2
7243163	Friend et al.	Jul 2007	B1
7246207	Kottomtharayil et al.	Jul 2007	B2
7246272	Cabezas et al.	Jul 2007	B2
7272606	Borthakur et al.	Sep 2007	B2
7284030	Ackaouy et al.	Oct 2007	B2
7287252	Bussiere et al.	Oct 2007	B2
7315923	Retnamma et al.	Jan 2008	B2
7343356	Prahlad	Mar 2008	B2
7343453	Prahlad et al.	Mar 2008	B2
7343459	Prahlad	Mar 2008	B2
7346751	Prahlad	Mar 2008	B2
7383462	Osaki et al.	Jun 2008	B2
7389311	Crescenti et al.	Jun 2008	B1
7395282	Crescenti et al.	Jul 2008	B1
7412583	Burton	Aug 2008	B2
7437388	DeVos	Oct 2008	B1
7440982	Lu et al.	Oct 2008	B2
7454569	Kavuri et al.	Nov 2008	B2
7472238	Gokhale et al.	Dec 2008	B1
7472242	Deshmukh et al.	Dec 2008	B1
7490207	Amarendran et al.	Feb 2009	B2
7500053	Kavuri et al.	Mar 2009	B1
7512595	McBride et al.	Mar 2009	B1
7516186	Borghetti et al.	Apr 2009	B1
7519726	Palliyll et al.	Apr 2009	B2
7529782	Prahlad et al.	May 2009	B2
7536291	Vijayan Retnamma et al.	May 2009	B1
7539710	Haustein	May 2009	B1
7543125	Gokhale	Jun 2009	B2
7546324	Prahlad et al.	Jun 2009	B2
7552358	Asgar-Deen et al.	Jun 2009	B1
7567188	Anglin et al.	Jul 2009	B1
7568080	Prahlad et al.	Jul 2009	B2
7574692	Herscu	Aug 2009	B2
7577806	Rowan	Aug 2009	B2
7581077	Ignatius et al.	Aug 2009	B2
7584338	Bricker	Sep 2009	B1
7603386	Ran et al.	Oct 2009	B2
7606844	Kottomtharayil	Oct 2009	B2
7613748	Brockway et al.	Nov 2009	B2
7613752	Prahlad et al.	Nov 2009	B2
7617253	Prahlad et al.	Nov 2009	B2
7617262	Prahlad et al.	Nov 2009	B2
7620710	Kottomtharayil et al.	Nov 2009	B2
7631194	Wahlert et al.	Dec 2009	B2
7636743	Erofeev	Dec 2009	B2
7651593	Prahlad et al.	Jan 2010	B2
7657550	Prahlad et al.	Feb 2010	B2
7660807	Prahlad et al.	Feb 2010	B2
7661028	Prahlad et al.	Feb 2010	B2
7664771	Usters	Feb 2010	B2
7685126	Patel et al.	Mar 2010	B2
7702782	Pai	Apr 2010	B1
7720841	Gu et al.	May 2010	B2
7730113	Payette et al.	Jun 2010	B1
7734669	Kottomtharayil et al.	Jun 2010	B2
7734820	Ranade et al.	Jun 2010	B1
7739235	Rousseau	Jun 2010	B2
7743051	Kashyap et al.	Jun 2010	B1
7747577	Cannon et al.	Jun 2010	B2
7747579	Prahlad et al.	Jun 2010	B2
7761425	Erickson et al.	Jul 2010	B1
7779032	Garfinkel	Aug 2010	B1
7797279	Starling et al.	Sep 2010	B1
7801864	Prahlad et al.	Sep 2010	B2
7809914	Kottomtharayil et al.	Oct 2010	B2
7814074	Anglin et al.	Oct 2010	B2
7814149	Stringham	Oct 2010	B1
7822939	Veprinsky et al.	Oct 2010	B1
7827150	Wu et al.	Nov 2010	B1
7831795	Prahlad et al.	Nov 2010	B2
7840533	Prahlad et al.	Nov 2010	B2
7899871	Kumar et al.	Mar 2011	B1
7962452	Anglin et al.	Jun 2011	B2
8041907	Wu et al.	Oct 2011	B1
8074043	Zeis	Dec 2011	B1
8095756	Somavarapu	Jan 2012	B1
8108446	Christiaens	Jan 2012	B1
8108638	Kishi	Jan 2012	B2
8131669	Cannon et al.	Mar 2012	B2
8136025	Zhu et al.	Mar 2012	B1
8145614	Zimran et al.	Mar 2012	B1
8156086	Lu et al.	Apr 2012	B2
8170995	Prahlad et al.	May 2012	B2
8199911	Tsaur et al.	Jun 2012	B1
8200638	Zheng et al.	Jun 2012	B1
8200923	Healey et al.	Jun 2012	B1
8204862	Paulzagade et al.	Jun 2012	B1
8209334	Doerner	Jun 2012	B1
8224875	Christiaens et al.	Jul 2012	B1
8229954	Kottomtharayil et al.	Jul 2012	B2
8230195	Ran et al.	Jul 2012	B2
8261240	Hoban	Sep 2012	B2
8280854	Emmert	Oct 2012	B1
8285681	Prahlad et al.	Oct 2012	B2
8307177	Prahlad et al.	Nov 2012	B2
8352422	Prahlad et al.	Jan 2013	B2
8364652	Vijayan et al.	Jan 2013	B2
8370315	Efstathopoulos et al.	Feb 2013	B1
8370542	Lu et al.	Feb 2013	B2
8375008	Gomes	Feb 2013	B1
8375181	Kekre et al.	Feb 2013	B1
8407190	Prahlad et al.	Mar 2013	B2
8452739	Jain et al.	May 2013	B2
8468320	Stringham	Jun 2013	B1
8479304	Clifford	Jul 2013	B1
8484162	Prahlad et al.	Jul 2013	B2
8510573	Muller et al.	Aug 2013	B2
8527469	Hwang et al.	Sep 2013	B2
8549350	Dutch et al.	Oct 2013	B1
8572055	Wu et al.	Oct 2013	B1
8572340	Vijayan et al.	Oct 2013	B2
8577851	Vijayan et al.	Nov 2013	B2
8578109	Vijayan et al.	Nov 2013	B2
8578120	Attarde et al.	Nov 2013	B2
8595191	Prahlad et al.	Nov 2013	B2
8621240	Auchmoody et al.	Dec 2013	B1
8645320	Prahlad et al.	Feb 2014	B2
8719264	Varadharajan	May 2014	B2
8725688	Lad	May 2014	B2
8726242	Ngo	May 2014	B2
8745105	Erofeev	Jun 2014	B2
8775823	Gokhale et al.	Jul 2014	B2
8825720	Xie et al.	Sep 2014	B1
8849762	Kumarasamy et al.	Sep 2014	B2
8909980	Lewis et al.	Dec 2014	B1
8930306	Ngo et al.	Jan 2015	B1
8938481	Kumarasamy et al.	Jan 2015	B2
8954446	Vijayan Retnamma et al.	Feb 2015	B2
9015181	Kottomtharayil et al.	Apr 2015	B2
9020900	Vijayan Retnamma et al.	Apr 2015	B2
9092441	Patiejunas et al.	Jul 2015	B1
9098495	Gokhale	Aug 2015	B2
9104623	Retnamma et al.	Aug 2015	B2
9110602	Vijayan et al.	Aug 2015	B2
9116850	Vijayan Retnamma et al.	Aug 2015	B2
9128901	Nickurak	Sep 2015	B1
9171008	Prahlad et al.	Oct 2015	B2
9208160	Prahlad et al.	Dec 2015	B2
9218374	Muller et al.	Dec 2015	B2
9218375	Muller et al.	Dec 2015	B2
9218376	Muller et al.	Dec 2015	B2
9239687	Vijayan et al.	Jan 2016	B2
9244779	Littlefield et al.	Jan 2016	B2
9251186	Muller et al.	Feb 2016	B2
9298386	Baldwin et al.	Mar 2016	B2
9298715	Kumarasamy et al.	Mar 2016	B2
9298724	Patil et al.	Mar 2016	B1
9323820	Lauinger et al.	Apr 2016	B1
9336076	Baldwin et al.	May 2016	B2
9342537	Kumarasamy et al.	May 2016	B2
9405631	Prahlad et al.	Aug 2016	B2
9405763	Prahlad et al.	Aug 2016	B2
9442806	Bardale	Sep 2016	B1
9483486	Christiaens et al.	Nov 2016	B1
9575673	Mitkar et al.	Feb 2017	B2
9619480	Vijayan et al.	Apr 2017	B2
9633033	Vijayan et al.	Apr 2017	B2
9633056	Attarde et al.	Apr 2017	B2
9639289	Vijayan et al.	May 2017	B2
9665591	Vijayan et al.	May 2017	B2
9678968	Taylor et al.	Jun 2017	B1
9858156	Muller et al.	Jan 2018	B2
9898225	Vijayan et al.	Feb 2018	B2
9898478	Vijayan et al.	Feb 2018	B2
9934238	Mitkar et al.	Apr 2018	B2
9990253	Rajimwale et al.	Jun 2018	B1
10061663	Vijayan et al.	Aug 2018	B2
10126973	Vijayan et al.	Nov 2018	B2
10176053	Muller et al.	Jan 2019	B2
10191816	Vijayan et al.	Jan 2019	B2
10229133	Vijayan et al.	Mar 2019	B2
10255143	Vijayan et al.	Apr 2019	B2
10310953	Vijayan et al.	Jun 2019	B2
10339106	Vijayan et al.	Jul 2019	B2
10380072	Attarde et al.	Aug 2019	B2
10387269	Muller et al.	Aug 2019	B2
10445293	Attarde et al.	Oct 2019	B2
10474638	Mitkar et al.	Nov 2019	B2
10481824	Vijayan et al.	Nov 2019	B2
10481825	Vijayan et al.	Nov 2019	B2
10481826	Vijayan et al.	Nov 2019	B2
10540327	Ngo et al.	Jan 2020	B2
10592357	Vijayan et al.	Mar 2020	B2
10740295	Vijayan et al.	Aug 2020	B2
10877856	Vijayan et al.	Dec 2020	B2
10956275	Muller et al.	Mar 2021	B2
10956286	Vijayan et al.	Mar 2021	B2
11016859	Prahlad et al.	May 2021	B2
11113246	Mitkar et al.	Sep 2021	B2
11119984	Attarde et al.	Sep 2021	B2
11157450	Vijayan et al.	Oct 2021	B2
20010052015	Lin et al.	Dec 2001	A1
20020062439	Cotugno et al.	May 2002	A1
20020065892	Malik	May 2002	A1
20020083055	Pachet	Jun 2002	A1
20020107877	Whiting et al.	Aug 2002	A1
20020133601	Kennamer et al.	Sep 2002	A1
20020143892	Mogul	Oct 2002	A1
20020144250	Yen	Oct 2002	A1
20020169934	Krapp et al.	Nov 2002	A1
20030033308	Patel et al.	Feb 2003	A1
20030084076	Sekiguchi et al.	May 2003	A1
20030105716	Lorin, Jr. et al.	Jun 2003	A1
20030115346	McHenry et al.	Jun 2003	A1
20030149750	Franzenburg	Aug 2003	A1
20030172130	Fruchtman et al.	Sep 2003	A1
20030174648	Wang et al.	Sep 2003	A1
20030182310	Charnock et al.	Sep 2003	A1
20030187917	Cohen	Oct 2003	A1
20030188106	Cohen	Oct 2003	A1
20040010562	Itonaga	Jan 2004	A1
20040128442	Hinshaw et al.	Jul 2004	A1
20040148306	Moulton et al.	Jul 2004	A1
20040181519	Anwar	Sep 2004	A1
20040215746	McCanne et al.	Oct 2004	A1
20040230753	Amiri et al.	Nov 2004	A1
20050033756	Kottomtharayil et al.	Feb 2005	A1
20050060643	Glass et al.	Mar 2005	A1
20050066118	Perry	Mar 2005	A1
20050066225	Rowan	Mar 2005	A1
20050108292	Burton	May 2005	A1
20050114450	DeVos	May 2005	A1
20050117558	Angermann et al.	Jun 2005	A1
20050144202	Chen	Jun 2005	A1
20050204108	Ofek et al.	Sep 2005	A1
20050216659	Ogawa et al.	Sep 2005	A1
20050243609	Yang et al.	Nov 2005	A1
20050246393	Coates et al.	Nov 2005	A1
20050268068	Ignatius et al.	Dec 2005	A1
20050273654	Chen et al.	Dec 2005	A1
20060004808	Hsu et al.	Jan 2006	A1
20060005048	Osaki	Jan 2006	A1
20060010227	Atluri	Jan 2006	A1
20060020660	Prasad et al.	Jan 2006	A1
20060064456	Kalthoff et al.	Mar 2006	A1
20060074957	Yamamoto et al.	Apr 2006	A1
20060089954	Anschutz	Apr 2006	A1
20060095527	Malik	May 2006	A1
20060101096	Fuerst	May 2006	A1
20060129537	Torii	Jun 2006	A1
20060136685	Griv	Jun 2006	A1
20060167900	Pingte et al.	Jul 2006	A1
20060168318	Twiss	Jul 2006	A1
20060179261	Twiss	Aug 2006	A1
20060179405	Chao et al.	Aug 2006	A1
20060224846	Ran et al.	Oct 2006	A1
20060277154	Lunt et al.	Dec 2006	A1
20070006018	Thompson	Jan 2007	A1
20070038714	Sell	Feb 2007	A1
20070043757	Benton et al.	Feb 2007	A1
20070050526	Abe et al.	Mar 2007	A1
20070067263	Syed	Mar 2007	A1
20070073814	Kamat et al.	Mar 2007	A1
20070156966	Sundarrajan et al.	Jul 2007	A1
20070162462	Zhang et al.	Jul 2007	A1
20070179990	Zimran et al.	Aug 2007	A1
20070179995	Prahlad et al.	Aug 2007	A1
20070192444	Ackaouy et al.	Aug 2007	A1
20070192542	Frolund et al.	Aug 2007	A1
20070192544	Frolund et al.	Aug 2007	A1
20070203937	Prahlad et al.	Aug 2007	A1
20070250670	Fineberg et al.	Oct 2007	A1
20070255758	Zheng et al.	Nov 2007	A1
20080005141	Zheng et al.	Jan 2008	A1
20080005509	Smith et al.	Jan 2008	A1
20080016131	Sandorfi et al.	Jan 2008	A1
20080028149	Pardikar et al.	Jan 2008	A1
20080089342	Lansing et al.	Apr 2008	A1
20080091655	Gokhale et al.	Apr 2008	A1
20080091725	Hwang et al.	Apr 2008	A1
20080098041	Chidambaran et al.	Apr 2008	A1
20080098083	Shergill et al.	Apr 2008	A1
20080133561	Dubnicki et al.	Jun 2008	A1
20080140630	Sato	Jun 2008	A1
20080159331	Mace et al.	Jul 2008	A1
20080229037	Bunte et al.	Sep 2008	A1
20080243769	Arbour et al.	Oct 2008	A1
20080243879	Gokhale et al.	Oct 2008	A1
20080243914	Prahlad et al.	Oct 2008	A1
20080243953	Wu et al.	Oct 2008	A1
20080243957	Prahlad et al.	Oct 2008	A1
20080243958	Prahlad et al.	Oct 2008	A1
20080244172	Kano	Oct 2008	A1
20080244199	Nakamura et al.	Oct 2008	A1
20080244204	Cremelie et al.	Oct 2008	A1
20080244205	Amano	Oct 2008	A1
20080250204	Kavuri et al.	Oct 2008	A1
20080256326	Patterson et al.	Oct 2008	A1
20080256431	Hornberger	Oct 2008	A1
20080281908	McCanne et al.	Nov 2008	A1
20080294660	Patterson et al.	Nov 2008	A1
20080294696	Frandzel	Nov 2008	A1
20080313236	Vijayakumar et al.	Dec 2008	A1
20080320151	McCanne et al.	Dec 2008	A1
20090013129	Bondurant	Jan 2009	A1
20090013258	Hintermeister et al.	Jan 2009	A1
20090043767	Joshi et al.	Feb 2009	A1
20090055425	Evans et al.	Feb 2009	A1
20090055471	Kozat et al.	Feb 2009	A1
20090077140	Anglin et al.	Mar 2009	A1
20090138481	Chatley et al.	May 2009	A1
20090144416	Chatley et al.	Jun 2009	A1
20090144422	Chatley et al.	Jun 2009	A1
20090171888	Anglin	Jul 2009	A1
20090172139	Wong et al.	Jul 2009	A1
20090182789	Sandorfi et al.	Jul 2009	A1
20090183162	Kindel et al.	Jul 2009	A1
20090204636	Li et al.	Aug 2009	A1
20090204649	Wong et al.	Aug 2009	A1
20090210431	Marinkovic et al.	Aug 2009	A1
20090228599	Anglin et al.	Sep 2009	A1
20090243846	Yuuki	Oct 2009	A1
20090254507	Hosoya et al.	Oct 2009	A1
20090268903	Bojinov et al.	Oct 2009	A1
20090271454	Anglin et al.	Oct 2009	A1
20090276454	Smith	Nov 2009	A1
20090307251	Heller et al.	Dec 2009	A1
20090319534	Gokhale	Dec 2009	A1
20090319585	Gokhale	Dec 2009	A1
20090327625	Jaquette et al.	Dec 2009	A1
20100005259	Prahlad	Jan 2010	A1
20100011178	Feathergill	Jan 2010	A1
20100031086	Leppard	Feb 2010	A1
20100036887	Anglin et al.	Feb 2010	A1
20100042790	Mondal et al.	Feb 2010	A1
20100049926	Fuente	Feb 2010	A1
20100049927	Fuente	Feb 2010	A1
20100070478	Anglin	Mar 2010	A1
20100077161	Stoakes et al.	Mar 2010	A1
20100082558	Anglin et al.	Apr 2010	A1
20100082672	Kottomtharayil et al.	Apr 2010	A1
20100088296	Periyagaram et al.	Apr 2010	A1
20100094817	Ben-Shaul et al.	Apr 2010	A1
20100100529	Erofeev	Apr 2010	A1
20100114833	Mu	May 2010	A1
20100153511	Lin et al.	Jun 2010	A1
20100169287	Klose	Jul 2010	A1
20100180075	McCloskey et al.	Jul 2010	A1
20100198864	Vid et al.	Aug 2010	A1
20100223495	Leppard	Sep 2010	A1
20100250501	Mandagere et al.	Sep 2010	A1
20100250549	Muller et al.	Sep 2010	A1
20100250896	Matze	Sep 2010	A1
20100257142	Murphy et al.	Oct 2010	A1
20100257346	Sosnosky et al.	Oct 2010	A1
20100257403	Virk et al.	Oct 2010	A1
20100306283	Johnson et al.	Dec 2010	A1
20100312752	Zeis et al.	Dec 2010	A1
20100318759	Hamilton et al.	Dec 2010	A1
20100332401	Prahlad et al.	Dec 2010	A1
20100332454	Prahlad et al.	Dec 2010	A1
20110010498	Lay et al.	Jan 2011	A1
20110060940	Taylor et al.	Mar 2011	A1
20110072291	Murase	Mar 2011	A1
20110113012	Gruhl et al.	May 2011	A1
20110113013	Reddy et al.	May 2011	A1
20110113016	Gruhl et al.	May 2011	A1
20110119741	Kelly et al.	May 2011	A1
20110153570	Kim et al.	Jun 2011	A1
20110161723	Taleck et al.	Jun 2011	A1
20110167221	Pangal et al.	Jul 2011	A1
20110258161	Constantinescu et al.	Oct 2011	A1
20110276543	Matze	Nov 2011	A1
20110289281	Spackman	Nov 2011	A1
20110302140	Gokhale et al.	Dec 2011	A1
20110314070	Brown et al.	Dec 2011	A1
20110314400	Mital et al.	Dec 2011	A1
20120011101	Fang et al.	Jan 2012	A1
20120016839	Yueh	Jan 2012	A1
20120016845	Bates	Jan 2012	A1
20120078881	Crump et al.	Mar 2012	A1
20120084272	Garces-Erice et al.	Apr 2012	A1
20120089574	Doerner	Apr 2012	A1
20120150818	Vijayan Retnamma et al.	Jun 2012	A1
20120166403	Kim et al.	Jun 2012	A1
20120185437	Pavlov	Jul 2012	A1
20120221817	Yueh	Aug 2012	A1
20120233417	Kalach	Sep 2012	A1
20120303622	Dean et al.	Nov 2012	A1
20130006943	Chavda et al.	Jan 2013	A1
20130219470	Chintala et al.	Aug 2013	A1
20130238562	Kumarasamy et al.	Sep 2013	A1
20130238572	Prahlad et al.	Sep 2013	A1
20130262396	Kripalani et al.	Oct 2013	A1
20130262801	Sancheti et al.	Oct 2013	A1
20130339298	Muller et al.	Dec 2013	A1
20130339310	Muller et al.	Dec 2013	A1
20140032940	Sartirana et al.	Jan 2014	A1
20140046904	Kumarasamy	Feb 2014	A1
20140115287	Schnapp et al.	Apr 2014	A1
20140181028	Prahlad et al.	Jun 2014	A1
20140195749	Colgrove et al.	Jul 2014	A1
20140196037	Gopalan et al.	Jul 2014	A1
20140201142	Varadharajan et al.	Jul 2014	A1
20140201150	Kumarasamy et al.	Jul 2014	A1
20140201153	Vijayan et al.	Jul 2014	A1
20140229451	Venkatesh et al.	Aug 2014	A1
20140250076	Lad	Sep 2014	A1
20140258245	Estes	Sep 2014	A1
20140281758	Klein et al.	Sep 2014	A1
20140289225	Chan et al.	Sep 2014	A1
20140337285	Gokhale et al.	Nov 2014	A1
20140337664	Gokhale et al.	Nov 2014	A1
20150012698	Bolla et al.	Jan 2015	A1
20150088821	Blea et al.	Mar 2015	A1
20150089185	Brandyberry et al.	Mar 2015	A1
20150134611	Avati et al.	May 2015	A1
20150154220	Ngo et al.	Jun 2015	A1
20150161015	Kumarasamy et al.	Jun 2015	A1
20150212893	Pawar et al.	Jul 2015	A1
20150212894	Pawar et al.	Jul 2015	A1
20150212895	Pawar et al.	Jul 2015	A1
20150212896	Pawar et al.	Jul 2015	A1
20150212897	Pawar et al.	Jul 2015	A1
20150248466	Jernigan, IV et al.	Sep 2015	A1
20150269032	Muthyala et al.	Sep 2015	A1
20150269212	Kramer et al.	Sep 2015	A1
20150278104	Moon et al.	Oct 2015	A1
20150347306	Gschwind	Dec 2015	A1
20150378839	Langouev et al.	Dec 2015	A1
20160026405	Dhuse	Jan 2016	A1
20160041880	Mitkar et al.	Feb 2016	A1
20160042090	Mitkar et al.	Feb 2016	A1
20160062846	Nallathambi et al.	Mar 2016	A1
20160065671	Nallathambi et al.	Mar 2016	A1
20160139836	Nallathambi et al.	May 2016	A1
20160142483	Nallathambi et al.	May 2016	A1
20160154709	Mitkar et al.	Jun 2016	A1
20160170657	Suehr et al.	Jun 2016	A1
20160188416	Muller et al.	Jun 2016	A1
20160196070	Vijayan et al.	Jul 2016	A1
20160306707	Vijayan et al.	Oct 2016	A1
20160306708	Prahlad et al.	Oct 2016	A1
20160306818	Vijayan et al.	Oct 2016	A1
20160350391	Vijayan et al.	Dec 2016	A1
20170031768	Sarab	Feb 2017	A1
20170083558	Vijayan et al.	Mar 2017	A1
20170083563	Vijayan et al.	Mar 2017	A1
20170090773	Vijayan et al.	Mar 2017	A1
20170090786	Parab et al.	Mar 2017	A1
20170168903	Dornemann et al.	May 2017	A1
20170185488	My et al.	Jun 2017	A1
20170192861	Vijayan et al.	Jul 2017	A1
20170192868	Vijayan et al.	Jul 2017	A1
20170193003	Vijayan et al.	Jul 2017	A1
20170235647	Kilaru et al.	Aug 2017	A1
20170242871	Kilaru et al.	Aug 2017	A1
20170262217	Pradhan et al.	Sep 2017	A1
20170315876	Dornquast et al.	Nov 2017	A1
20180075055	Ngo et al.	Mar 2018	A1
20180189314	Mitkar et al.	Jul 2018	A1
20190012237	Prahlad et al.	Jan 2019	A1
20190012328	Attarde et al.	Jan 2019	A1
20190026305	Vijayan et al.	Jan 2019	A1
20190179805	Prahlad et al.	Jun 2019	A1
20190188088	Muller et al.	Jun 2019	A1
20190205290	Vijayan et al.	Jul 2019	A1
20190227879	Vijayan et al.	Jul 2019	A1
20190272220	Vijayan et al.	Sep 2019	A1
20190272221	Vijayan et al.	Sep 2019	A1
20190310968	Attarde et al.	Oct 2019	A1
20200089659	Attarde et al.	Mar 2020	A1
20200104052	Vijayan et al.	Apr 2020	A1
20200104213	Muller et al.	Apr 2020	A1
20200167091	Haridas et al.	May 2020	A1
20200167240	Haridas et al.	May 2020	A1
20200327017	Vijayan et al.	Oct 2020	A1
20200334210	Vijayan et al.	Oct 2020	A1
20200358621	Ngo	Nov 2020	A1
20210279141	Vijayan et al.	Sep 2021	A1
20210294510	Vijayan et al.	Sep 2021	A1

Foreign Referenced Citations (18)

Number	Date	Country
0259912	Mar 1988	EP
0405926	Jan 1991	EP
0467546	Jan 1992	EP
0541281	May 1993	EP
0774715	May 1997	EP
0809184	Nov 1997	EP
0899662	Mar 1999	EP
0981090	Feb 2000	EP
WO 1995013580	May 1995	WO
WO 99009480	Feb 1999	WO
WO 1999012098	Mar 1999	WO
WO 2002005466	Jan 2002	WO
WO 2006052872	May 2006	WO
WO 2010013292	Feb 2010	WO
WO 2010140264	Dec 2010	WO
WO 2012044366	Apr 2012	WO
WO 2012044367	Apr 2012	WO
WO 2013188550	Dec 2013	WO

Non-Patent Literature Citations (105)

Entry
Armstead et al., “Implementation of a Campus-Wide Distributed Mass Storage Service: The Dream vs. Reality,” IEEE, 1995, pp. 190-199.
Arneson, “Mass Storage Archiving in Network Environments,” Digest of Papers, Ninth IEEE Symposium on Mass Storage Systems, Oct. 31, 1988-Nov. 3, 1988, pp. 45-50, Monterey, CA.
Ashton, et al., “Two Decades of policy-based storage management for the IBM mainframe computer”, www.research.ibm.com, 19 pages, published Apr. 10, 2003, printed Jan. 3, 2009., www.research.IBM.com, Apr. 10, 2003, pp. 19.
Bhagwat, Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup. IEEE 2009, 9 pages.
Cabrera, et al. “ADSM: A Multi-Platform, Scalable, Back-up and Archive Mass Storage System,” Digest of Papers, Compcon '95, Proceedings of the 40th IEEE Computer Society International Conference, Mar. 5, 1995-Mar. 9, 1995, pp. 420-427, San Francisco, CA.
Cohen, Edith, et al.,. “The Age Penalty and Its Effect on Cache Performance.” In USITS, pp. 73-84. 2001.
Cohen, Edith, et al.,.“Aging through cascaded caches: Performance issues in the distribution of web content.” In ACM SIGCOMM Computer Communication Review, vol. 31, No. 4, pp. 41-53. ACM, 2001.
Cohen, Edith, et al.,. “Refreshment policies for web content caches.” Computer Networks 38.6 (2002): 795-808.
CommVault Systems, Inc. “Continuous Data Replicator 7.0,” Product Data Sheet, 2007.
CommVault Systems, Inc., “Deduplication—How to,” http://documentation.commvault.com/commvault/release_8_0_0/books_online_1/english_US/features/single_instance/single_instance_how_to.htm, internet accessed on Jan. 26, 2009, 7 pages.
CommVault Systems, Inc., “Deduplication,” http://documentation.commvault.com/commvault/release_8_0_0/books_online_1/english_US/features/single_instance/single_instance.htm, internet accessed on Jan. 26, 2009, 9 pages.
Diligent Technologies HyperFactor, http://www.dilligent.com/products:protecTIER-1:HyperFactor-1, Internet accessed on Dec. 5, 2008, 2 pages.
Dubnicki, et al. “HYDRAstor: A Scalable Secondary Storage.” FAST. vol. 9.2009, 74 pages.
Eitel, “Backup and Storage Management in Distributed Heterogeneous Environments,” IEEE, 1994, pp. 124-126.
Gait, “The Optical File Cabinet: A Random-Access File system for Write-Once Optical Disks,” IEEE Computer, vol. 21, No. 6, pp. 11-22 (1988).
GRAY (#2 of 2, pp. 604-609), Jim; Reuter Andreas, Transaction Processing Concepts and Techniques, Morgan Kaufmann Publisher, USA 1994, pp. 604-609.
Guo et al., Building a High-performance Deduplication System, Jun. 15, 2011, retrieved from the Internet at <U RL: http://dl.acm.org/citation.cfm?id=2002206>, pp. 1-14.
Huff, KL, “Data Set Usage Sequence Number,” IBM Technical Disclosure Bulletin, vol. 24, No. 5, Oct. 1981 New York, US, pp. 2404-2406.
Jander, “Launching Storage-Area Net,” Data Communications, US, McGraw Hill, NY, vol. 27, No. 4(Mar. 21, 1998), pp. 64-72.
Kashyap, et al., “Professional Services Automation: A knowledge Management approach using LSI and Domain specific Ontologies”, FLAIRS-01 Proceedings, 2001, pp. 300-302.
Kornblum, Jesse, “Identifying Almost Identical Files Using Context Triggered Piecewise Hashing,” www.sciencedirect.com, Digital Investigation 3S (2006), pages S91-S97.
Lortu Software Development, “Kondar Technology-Deduplication,” http://www.lortu.com/en/deduplication.asp, Internet accessed on Dec. 5, 2008, 3 pages.
Overland Storage, “Data Deduplication,” http://www.overlandstorage.com/topics/data_deduplication.html, Internet accessed on Dec. 5, 2008, 2 pages.
Quantum Corporation, “Data De-Duplication Background: A Technical White Paper,” May 2008, 13 pages.
Rosenblum et al., “The Design and Implementation of a Log-Structure File System,” Operating Systems Review SIGOPS, vol. 25, No. 5, New York, US, pp. 1-15 (May 1991).
Wei, et al. “MAD2: A scalable high-throughput exact deduplication approach for network backup services.” Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. IEEE, 2010, 14 pages.
Wolman et al., On the scale and performance of cooperative Web proxy caching, 1999.
Wu, et al., Load Balancing and Hot Spot Relief for Hash Routing among a Collection of Proxy Caches, 1999.
Final Office Action for Japanese Application No. 2003531581, dated Mar. 24, 2009, 6 pages.
International Search Report and Written Opinion, International Application No. PCT/US2009/58137, dated Dec. 23, 2009, 14 pages.
International Search Report and Written Opinion, International Application No. PCT/US2011/030804, dated Jun. 9, 2011.
International Search Report and Written Opinion, International Application No. PCT/US2011/030814, dated Jun. 9, 2011.
International Search Report and Written Opinion, International Application No. PCT/US2013/045443 dated Nov. 14, 2013, 16 pages.
International Preliminary Report on Patentability, International Application No. PCT/US2013/045443 dated Dec. 16, 2014 11 pages.
Prahlad, et al., U.S. Appl. No. 12/724,292 Now U.S. Pat. No. 8,484,162, filed Mar. 15, 2010, De-Duplication Systems and Methods for Application-Specific Data.
Prahlad, et al., U.S. Appl. No. 13/931,654 Now U.S. Pat. No. 9,405,763, filed Jun. 28, 2013, De-Duplication Systems and Methods for Application-Specific Data.
Prahlad, et al., U.S. Appl. No. 15/198,269 Now Abandoned, filed Jun. 30, 2016, De-Duplication Systems and Methods for Application-Specific Data.
Prahlad, et al., U.S. Appl. No. 15/991,849 Published as 2019/0012237, filed May 29, 2018, De-Duplication Systems and Methods for Application-Specific Data.
Vijayan, et al., U.S. Appl. No. 12/982,086 Now U.S. Pat. No. 8,577,851, filed Dec. 30, 2010, Content Aligned Block-Based Deduplication.
Vijayan, et al., U.S. Appl. No. 12/982,071 Now U.S. Pat. No. 8,364,652, filed Dec. 30, 2010, Content Aligned Block-Based Deduplication.
Vijayan, et al., U.S. Appl. No. 13/750,105 Now U.S. Pat. No. 9,110,602, filed Jan. 25, 2013, Content Aligned Block-Based Deduplication.
Vijayan, et al., U.S. Appl. No. 14/821,418 Now U.S. Pat. No. 9,819,480, filed Aug. 7, 2015, Content Aligned Block-Based Deduplication.
Vijayan, et al., U.S. Appl. No. 15/449,246 Now U.S. Pat. No. 9,898,225, filed Mar. 3, 2017, Content Aligned Block-Based Deduplication.
Retnamma, et al., U.S. Appl. No. 13/324,884 Abandoned, filed Dec. 13, 2011, Client-Side Repository in a Networked Deduplicated Storage System.
Retnamma, et al., U.S. Appl. No. 13/324,817 Now U.S. Pat. No. 8,954,446, filed Dec. 13, 2011, Client-Side Repository in a Networked Deduplicated Storage System.
Retnamma, et al., U.S. Appl. No. 13/324,792 Now U.S. Pat. No. 9,116,850, filed Dec. 13, 2011, Client-Side Repository in a Networked Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 13/324,848 Now U.S. Pat. No. 9,104,523, filed Dec. 13, 2011, Client-Side Repository in a Networked Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 14/673,021 Now U.S. Pat. No. 10,191,816, filed Mar. 30, 2015, Client-Side Repository in a Networked Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 16/224,383, filed Dec. 18, 2018, Client-Side Repository in a Networked Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 13/324,613 Now U.S. Pat. No. 9,020,900, filed Dec. 13, 2011, Distributed Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 14/673,586 Now U.S. Pat. No. 9,898,478, filed Mar. 30, 2015, Distributed Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 15/875,896 Published as 2019/0026305, filed Jan. 19, 2018, Distributed Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 12/982,100 Now U.S. Pat. No. 8,578,109, filed Dec. 30, 2010, Systems and Methods for Retaining and Using Data Block Signatures in Data Protection Operations.
Vijayan, et al., U.S. Appl. No. 14/040,247 Now U.S. Pat. No. 9,239,687, filed Sep. 27, 2013, Systems and Methods for Retaining and Using Data Block Signatures in Data Protection Operations.
Vijayan, et al., U.S. Appl. No. 14/967,097 Published as 2016/0196070 Abandoned, filed Dec. 11, 2015, Systems and Methods for Retaining and Using Data Block Signatures in Data Protection Operations.
Vijayan, et al., U.S. Appl. No. 15/074,109 Now U.S. Pat. No. 9,639,289, filed Mar. 18, 2016, Systems and Methods for Retaining and Using Data Block Signatures in Data Protection Operations.
Vijayan, et al., U.S. Appl. No. 15/472,737 Now U.S. Pat. No. 10,126,973, filed Mar. 29, 2017, Systems and Methods for Retaining and Using Data Block Signatures in Data Protection Operations.
Vijayan, et al., U.S. Appl. No. 12/982,087 Now U.S. Pat. No. 8,572,340, filed Dec. 30, 2010, Systems and Methods for Retaining and Using Data Block Signatures in Data Protection Operations.
Muller, et al., U.S. Appl. No. 13/916,409 Now U.S. Pat. No. 9,218,374, filed Jun. 12, 2013, Collaborative Restore in a Networked Storage System.
Muller, et al., U.S. Appl. No. 14/956,185 Abandoned, filed Dec. 1, 2015, Collaborative Restore in a Networked Storage System.
Muller, et al., U.S. Appl. No. 15/067,766 Now U.S. Pat. No. 10,176,053, filed Mar. 11, 2016, Collaborative Restore in a Networked Storage System.
Muller, et al., U.S. Appl. No. 16/195,461, filed Nov. 19, 2018, Collaborative Restore in a Networked Storage System.
Muller, et al., U.S. Appl. No. 13/916,429 Published as 2013/0339310 Abandoned, filed Jun. 12, 2013, Restore Using a Client Side Signature Repository in a Networked Storage System.
Muller, et al., U.S. Appl. No. 13/916,385 Published as 2013/0339298 Abandoned, filed Jun. 12, 2013, Collaborative Backup in a Networked Storage System.
Muller, et al., U.S. Appl. No. 13/916,434 Now U.S. Pat. No. 9,251,186, filed Jun. 12, 2013, Backup Using a Client-Side Signature Repository in a Networked Storage System.
Muller, et al., U.S. Appl. No. 13/916,458 Now U.S. Pat. No. 9,218,375, filed Jun. 12, 2013, Dedicated Client-Side Signature Generator in a Networked Storage System.
Muller, et al., U.S. Appl. No. 14/956,213 Published as 2016/0188416 Abandoned, filed Dec. 1, 2015, Dedicated Client-Side Signature Generator in a Networked Storage System.
Muller, et al., U.S. Appl. No. 15/067,714 Now U.S. Pat. No. 9,858,156, filed Mar. 11, 2016, Dedicated Client-Side Signature Generator in a Networked Storage System.
Muller, et al., U.S. Appl. No. 15/820,152 Now U.S. Pat. No. 10,387,269, filed Nov. 21, 2017, Dedicated Client-Side Signature Generator in a Networked Storage System.
Muller, et al., U.S. Appl. No. 13/916,467 Now U.S. Pat. No. 9,218,376, filed Jun. 12, 2013, Intelligent Data Sourcing in a Networked Storage System.
Vijayan, et al., U.S. Appl. No. 14/152,509 Now U.S. Pat. No. 9,633,033, filed Jan. 10, 2014, High Availability Distributed Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 15/474,730 Now U.S. Pat. No. 10,229,133 filed Mar. 30, 2017, High Availability Distributed Deduplicated Storage System.
Vijayan, et al., U.S. Appl. No. 16/234,976 Published as 2019/0205290, filed Dec. 28, 2018, High Availability Distributed Deduplicated Storage System.
Attarde, et al., U.S. Appt. No. 14/216,703 Now U.S. Pat. No. 9,633,056, filed Mar. 17, 2014, Maintaining a Deduplication Database.
Attarde, et al., U.S. Appl. No. 14/216,689 Now U.S. Pat. No. 10/380,072, filed Mar. 17, 2014, Managing Deletions From a Deduplication Database.
Attarde, et al., U.S. Appl. No. 16/020,900 Now U.S. Pat. No. 10,445,293, filed Jun. 27, 2018, Managing Deletions From a Deduplication Database.
Vijayan, et al., U.S. Appl. No. 14/152,549 Now U.S. Pat. No. 9,665,591, filed Jan. 10, 2014, High Availability Distributed Deduplicated Storage System.
Mitkar, et al., U.S. Appl. No. 14/527,678 Now U.S. Pat. No. 9,575,673, filed Oct. 29, 2014, Accessing a File System Using Tiered Deduplication.
Mitkar, et al., U.S. Appl. No. 15/399,597 Now U.S. Pat. No. 9,934,238, filed Jan. 5, 2017, Accessing a File System Using Tiered Deduplication.
Mitkar, et al., U.S. Appl. No. 15/899,699 Published as 2018/0189314, filed Feb. 20, 2018, Accessing a File System Using Tiered Deduplication.
Ngo, et al., U.S. Appl. No. 12/499,717 Now U.S. Pat. No. 8,930,306, filed Jul. 8, 2009, Synchronized Data.
Ngo, et al., U.S. Appl. No. 14/555,322 Now Abandoned, filed Nov. 26, 2014, Synchronized Data Duplication.
Ngo, et al., U.S. Appl. No. 15/684,812 Published as 2018/0075055, filed Aug. 23, 2017, Synchronized Data Duplication.
Vijayan, et al., U.S. Appl. No. 14/721,971 Published as 2016/0350391, filed May 26, 2015, Replication Using Deduplicated Secondary Copy Data.
Vijayan, et al., U.S. Appl. No. 15/282,445 Published as 2017/0083563, filed Sep. 30, 2016, Replication Using Deduplicated Secondary Copy Data.
Vijayan, et al., U.S. Appl. No. 15/282,553 Published as 2017/0083558, filed Sep. 30, 2016, Replication Using Deduplicated Secondary Copy Data.
Vijayan, et al., U.S. Appl. No. 15/282,668 Published as 2017/0090773, filed Sep. 30, 2016, Replication Using Deduplicated Secondary Copy Data.
Vijayan, et al., U.S. Appl. No. 14/682,988 Now U.S. Pat. No. 10,339,106, filed Apr. 9, 2015, Highly Reusable Deduplication Database After Disaster Recovery.
Vijayan, et al., U.S. Appl. No. 15/197,434 Published as 2016/0306707, filed Jun. 29, 2016, Highly Reusable Deduplication Database After Disaster Recovery.
Vijayan, et al., U.S. Appl. No. 15/197,435 Published as 2016/0306818, filed Jun. 29, 2016, Highly Reusable Deduplication Database After Disaster Recovery.
Vijayan, et al., U.S. Appl. No. 15/299,299 Published as 2017/0193003, filed Oct. 20, 2016, Redundant and Robust Distributed Deduplication Data Storage System.
Vijayan, et al., U.S. Appl. No. 15/299,254 Now U.S. Pat. No. 10,310,953, filed Oct. 20, 2016, System for Redirecting Requests After a Secondary Storage Computing Device Failure.
Vijayan, et al., U.S. Appl. No. 16/232,956 Published as 2019/0272221, filed Dec. 26, 2018, System for Redirecting Requests After a Secondary Storage Computing Device Failure.
Vijayan, et al., U.S. Appl. No. 15/299,290 Now U.S. Pat. No. 10,255,143, filed Oct. 20, 2016, Deduplication Replication in a Distributed Deduplication Data Storage System.
Vijayan, et al., U.S. Appl. No. 16/232,950 Published as 2019/0272220, filed Dec. 26, 2018, Deduplication Replication in a Distributed Deduplication Data Storage System.
Vijayan, et al., U.S. Appl. No. 15/299,280 Now U.S. Pat. No. 10,061,663, filed Oct. 20, 2016, Rebuilding Deduplication Data in a Distributed Deduplication Data Storage System.
Vijayan, et al., U.S. Appl. No. 15/299,298 Published as 2017/0192861, filed Oct. 20, 2016, Distributed File System in a Distributed Deduplication Data Storage System.
Vijayan, et al., U.S. Appl. No. 15/299,281 Published as 2017/0192868, filed Oct. 20, 2016, User Interface for Identifying a Location of a Failed Secondary Storage Device.
Vijayan, et al., U.S. Appl. No. 16/380,469, filed Apr. 10, 2019, Restore Using Deduplicated Secondary Copy Data.
Haridas et al., U.S. Appl. No. 16/201,897, filed Nov. 27, 2018, Generating Backup Copies Through Interoperability Between Components of a Data Storage Management System and Appliances for Data Storage and Deduplication.
Haridas et al., U.S. Appl. No. 16/201,856, filed Nov. 27, 2018, Using Interoperability Between Components of a Data Storage Management System and Appliances for Data Storage and Deduplication to Generate Secondary and Tertiary Copies.
Ngo, U.S. Appl. No. 16/407,040, filed May 8, 2019, Use of Data Block Signatures for Monitoring in an Information Management System.
Muller et al., U.S. Appl. No. 16/455,090 Now Abandoned, filed Jun. 27, 2019, Dedicated Client-Side Signature Generator in a Networked Storage System.
Attarde et al., U.S. Appl. No. 16/452,309, filed Jun. 25, 2019, Managing Deletions From a Deduplication Database.
Attarde et al., U.S. Appl. No. 16/550,094, filed Aug. 23, 2019, Managing Deletions From a Deduplication Database.

Related Publications (1)

	Number	Date	Country
	20200250145 A1	Aug 2020	US

Continuations (3)

	Number	Date	Country
Parent	15684812	Aug 2017	US
Child	16700938		US
Parent	14555322	Nov 2014	US
Child	15684812		US
Parent	12499717	Jul 2009	US
Child	14555322		US

Synchronized data deduplication

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Term Extension