De-duplication objects may be used to eliminate redundant copies of data. In the de-duplication process, unique units of data may be identified and stored and subsequent units of data may be compared to the stored units.
As noted above, the de-duplication process may include identification and storage of unique units of data and comparison thereof to subsequent units of data. If a redundant unit of data is received, the redundant unit of data may be substituted by a de-duplication object comprising a reference or pointer to the unique unit of data discovered earlier. A de-duplication object may be much smaller in size than the units of data. Thus, given that the same unit of data may occur dozens, hundreds, or even thousands of times, de-duplication may greatly reduce the amount of data in a storage device or may greatly reduce the amount of data transferred over a network. Unfortunately, these de-duplication objects may eventually become corrupt and may no longer refer to the correct unit of data. Corrupt de-duplication objects may be caused by disk failures, I/O errors, database corruption, or operational errors. While some techniques for checking the integrity of de-duplication objects exist, these techniques may check the objects randomly without prioritizing the de-duplication objects. In one example, a priority de-duplication object may be defined as a de-duplication object that is used or referenced frequently by a program accessing the data. In the event the system fails during an integrity check, high priority de-duplication objects may be overlooked. Recovery of these de-duplication objects may include a burdensome manual process.
In view of the foregoing, disclosed herein are a system, computer-readable medium, and method for checking the integrity of de-duplication objects. In one example, an integrity check of the most frequently referenced or used de-duplication objects is given higher priority. In a further example, a warning may be generated, if the integrity of a given de-duplication object fails. Thus, rather than verifying the de-duplication objects randomly or sequentially, the integrity check may be carried out intelligently such that the most referenced de-duplication objects are checked first. In the event of a system failure during an integrity check, the likelihood that high priority de-duplication objects were verified is higher. The aspects, features and advantages of the present disclosure will be appreciated when considered with reference to the following description of examples and accompanying figures. The following description does not limit the application; rather, the scope of the disclosure is defined by the appended claims and equivalents.
Non-transitory computer readable media may comprise any one of many physical media such as, for example, electronic, magnetic, optical, electromagnetic, or semiconductor media. More specific examples of suitable non-transitory computer-readable media include, but are not limited to, a portable magnetic computer diskette such as floppy diskettes or hard drives, a read-only memory (“ROM”), an erasable programmable read-only memory, a portable compact disc or other storage devices that may be coupled to computer apparatus 100 directly or indirectly. Alternatively, non-transitory CRM 112 may be a random access memory (“RAM”) device or may be divided into multiple memory segments organized as dual in-line memory modules (“DIMMs”). The non-transitory CRM 112 may also include any combination of one or more of the foregoing and/or other devices as well. While only one processor and one non-transitory CRM are shown in
The instructions residing in non-transitory CRM 112 may comprise any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by processor 110. In this regard, the terms “instructions,” “scripts,” and “applications” may be used interchangeably herein. The computer executable instructions may be stored in any computer language or format, such as in object code or modules of source code. Furthermore, it is understood that the instructions may be implemented in the form of hardware, software, or a combination of hardware and software and that the examples herein are merely illustrative.
In one example, a storage device may store units of data and may store a de-duplication object in lieu of at least one redundant copy of a given unit of data. As noted above, the de-duplication object may comprise a pointer to the given unit of data. The storage device may be any device that allows information to be retrieved, manipulated, and stored by processor 110. Some examples of storage devices include, but are not limited to, disk drives, fixed or removable magnetic media drives (e.g., hard drives, floppy or zip-based drives), writable or read-only optical media drives (e.g., CD or DVD), tape drives, or solid-state mass storage devices. In a further example, integrity module 116 may instruct at least one processor to determine which de-duplication objects are most frequently referenced and to execute an integrity check of the de-duplication objects, such that the most frequently referenced de-duplication objects are given priority over other de-duplication objects. In a further example, integrity module 116 may generate a warning, if the integrity check of a de-duplication object fails.
Working examples of the system, method, and non-transitory computer-readable medium are shown in
As shown in block 202 of
Referring back to
In another example, integrity module 116 may also check the integrity of the units of data themselves. In one example, a backup copy of each unit of data may be retained. If integrity module 116 determines that a unit of data is corrupt, integrity module 116 may modify each de-duplication object associated with the corrupt unit of data to point to the backup copy of each unit of data. Thus, integrity module 116 may check the integrity of the de-duplication objects and their associated data units.
Advantageously, the foregoing system, method, and non-transitory computer readable medium may confirm the integrity of de-duplication objects in a prioritized manner and may also redirect the de-duplication objects if their associated data units are corrupt. In this regard, rather than checking the de-duplication objects randomly or sequentially, the de-duplication objects may be verified in a more intelligent manner. In turn, users of programs that access the data via the de-duplication objects can be rest assured that the most important data is stable.
Although the disclosure herein has been described with reference to particular examples, it is to be understood that these examples are merely illustrative of the principles of the disclosure. It is therefore to be understood that numerous modifications may be made to the examples and that other arrangements may be devised without departing from the spirit and scope of the disclosure as defined by the appended claims. Furthermore, while particular processes are shown in a specific order in the appended drawings, such processes are not limited to any particular order unless such order is expressly set forth herein; rather, processes may be performed in a different order or concurrently and steps may be added or omitted.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/US2013/052590 | 7/29/2013 | WO | 00 |