One of the many benefits of computers and computing systems is their ability to process data and to make the data useful and readily available. People want immediate access to their email, for example, and email providers implement computing systems with sufficient processing power to handle email related processing. Data is important in other contexts as well. Businesses rely on readily available data to manage product and inventories. Businesses use data, for example, to set prices, sell tickets, or manage schedules. If a business does not have access to their data, the business suffers.
The inability to access data in the short term is often annoying and inconvenient. The complete loss of data, however, can have serious consequences. As a result, it is advisable to backup data. Most businesses and enterprises today have an active backup application that is protecting their data.
There are different types of backup systems in use today. It has long been recognized that repeatedly performing a full backup of data can consume significant space—especially when the backups are retained over time. In an incremental backup system, for instance, the amount of data backed up is reduced because an incremental backup only backs up modifications or changes that have been made to the data since the last backup.
While this approach can minimize the amount of data that is backed up at a given time, incremental backups also have undesirable features. For example, identifying which data (e.g., which files) have changed since the last backup may require that all of the files be examined to analyze the modification time stamps. For larger systems, which may have millions of files, this can become a time consuming process and can degrade the computing performance.
More generally, conventional backup applications that support incremental backups trawl the entire file system to generate a list of modified files. This can consume significant resources as previously stated.
Instead of trawling the entire file system, some backup applications may take advantage of the file system's native change log. However, using the native change log can also result in degraded performance. This is partly related to the fact that conventional change logs are transactional in nature. Every change that occurs to a file is recorded in a conventional change log. For example, a change log may record that a particular file is created, changed a large number of times, and then deleted.
Because a transactional change log records transactions for all files in a temporal manner, the transactions associated with a particular file will be interspersed with changes to other files. During backup, all of these changes need to be processed even though the file is ultimately deleted. Consequently, the activity associated with performing a backup based on a transactional change log can also degrade the performance of the file system.
In order to describe the manner in which at least some of the advantages and features of the invention can be obtained, a more particular description of embodiments of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:
Embodiments of the invention relate to systems and methods and computer-readable media for backing up data or for performing backup operations. Embodiments of the invention further relate to incremental backups or to forever incremental backup systems. When performing an incremental backup, only changes that have occurred to the data since the last backup are included in the incremental backup. Embodiments of the invention can optimize the process of performing an incremental backup and/or of preparing for an incremental backup such that the need to trawl the file system is reduced or eliminated.
In one example, a layered file system (a layered file driver) is inserted into a computing environment. The layered file system may be inserted between a virtual file system or operating system and a physical file system. In some systems, a layered file system may be instantiated for each client or for each device or each file system or physical file system. The configuration may be dependent on how the computing system is organized and/or on how the physical file system is organized relative to the operating systems and user applications.
Placing the layered file system above the physical file system enables the layered file system to identify all changes made to any/all files in a file system. The layered file system is positioned to evaluate every transaction that occurs with respect to the physical file system. The layered file system can also identify changes to the file system itself, such as changes in the size of the file system.
The layered file system can evaluate the transactions and then selectively record some of the transactions or information related to some of the transactions in a change log. As a result, the layered file system does not typically record or keep all transactions that occur in the computing environment. In one embodiment, the layered file system only maintains a record of transactions that have an impact on a subsequent backup operation. The layered file system, however, can be configured to record other transactions. Advantageously, the layered file system begins to prepare the environment or system for the next backup operation and is able to reduce overhead associated with at least incremental backup operations. By selectively recording entries in a change log that relate to transactions with the files or data stored in the physical file system, embodiments enable a backup agent to rely on the change log when the next backup operation is performed. In some examples, more than one change log may be maintained. For example, when a backup operation is initiated, subsequent transactions may be recorded in a different change log to ensure that the next incremental backup is not mixed with the current incremental backup. The switch can be performed immediately or gradually. In either case, the change logs are configured and interpreted such that the incremental backups are consistent and minimized in size.
When a backup such as an incremental backup is then performed, the incremental backup can be based on the change log that has been prepared by the layered file system. This optimizes the incremental backup and reduces the impact of performing the backup on overall system performance.
Embodiments of the invention can be implemented in multiple hardware configurations and/or network configurations.
The backup agent 110 can coordinate with the backup server 104 to ensure that the data or files 130 being backed up are sent to the backup server 104 and included in the backups 102. In one example, the backup server 104 maintains or manages the backups 102 for one or more clients. In other words, the backups 102 can include backups for different data or different sets of files or different file systems. The backups 102 may include backups of a file system, backups of an email system or server, backups of a database, or the like. For ease of discussion, the various forms of data of a database, email server, or the accompanying storage or the like may be referred to generally as file systems.
The backups 102 can include a full backup of the files 130 (or other data or client) and one or more subsequent incremental backups of the files 130. When a restoration is required, the client can be restored using the appropriate backups selected from all of the backups 102. The files 130 can be restored at various points in time because multiple incremental backups are kept in one example.
In
The information identifying which files 130 or portions thereof have changed may be maintained in a change log 122. The layered file system 120 can trap or evaluate all transactions (e.g., all file system operations or all client operations on the files 130) and then selectively record these operations (or information identifying the files to be backed up) in the change log 122. As a result, the change log 122 effectively identifies the files that needs to be backed up and the backup agent does not need to trawl the files 130 and does not need to deal with all of the transactions that have occurred to the files 130 since the last backup.
Instead, the change log 122 simply identifies that a certain file or portion thereof should be included in the next backup regardless of other transactions that may have occurred relative to that same file since the most recent backup. The layered file system 120 can selectively enter records in the change log 122 such that changes relevant to the next backup are included in the change log 122. Other changes that are not relevant to the next backup can be kept from the change log 122. The backup, which may be an incremental backup, is then created and stored in the backups 102.
Examples of transactions that are trapped or evaluated by the layered file system include, but are not limited to, writes, truncates, deletes, creates, appends, attribute changes, or the like or any combination thereof (generally referred to as modifications). In addition, the transactions trapped or evaluated by the layered file system may also include changes to the file system itself. For example, the transactions may include or reference changes like growth or shrinkage of the file system.
The transactions can be analyzed and then selectively recorded in the change log 122. The analysis of the transaction may involve a comparison between attributes of the files referenced in the transactions and attributes of the files already present in the file system. The analysis may also account for the file's status, such as whether the file was present in the file system at the time of the last backup. Based on the evaluation performed by the layered file system, the transaction may or may not be recorded in the change log 122. The process of performing an incremental backup is advantageously optimized by the selective content of the change log 122.
The change log may include records for each file in the file system. The records in the change log may be indexed, in one example, by an inode number of the corresponding files. A record in the change log 122 may include multiple fields or sections including, but not limited to, an old file name, a new file name, an event mask, a last backup file size, an inode number, or the like or any combination thereof.
When an transaction is evaluated, the transaction could be reflected in the corresponding record for the inode of the file involved in the transaction. An example of a record is as follows:
Old File Name: /mnt1/dir1/file1
New File Name: /mnt1/dir3/fi1e4
Event Mask: DELETE | CREATE| SETATTR
Last Backup Size: 0
inode number: 0×34fd
This record contains the entire history (or portion thereof) of an inode in a single record. The history may be limited to the time since the most recent backup. Analyzing this record enables the backup agent to determine how the file should be handled during the backup operation. This record indicates that the inode was removed, which is reflected by the DELETE event mask. The old name was stored in the old file name field. The inode was later used for a new file and the new file name is stored in the new file name field. This transaction is reflected by the CREATE event mask in the event mask field. Later, the owner and permissions of the file were changed, which resulted in the SETATTR entry in the event mask field. The last backup size is zero because the file is new at least with respect to a most recent backup.
When performing the backup operation, this information enables the file corresponding to the record to be backup up appropriately without having to trawl the file system. Had the last entry in the event mask field been DELE IL, then the backup agent would know that the file has been removed and there is no need to search for the file. A conventional change log, however, would result in a disk access request for a file that does not exist because all transaction for the inode are not grouped together.
In
In this example, a layered file system 210 is inserted between or operates between the operating system 214 and the physical file system 202. Placing the layered file system 210 at this level enables the layered file system 210 to trap or evaluate all of the transactions 218 that are sent to the physical file system as transactions 220. The layered file system 210 may simply pass through the transactions 218 unchanged and without detaining the transactions. Similar, transactions 222 originating from the physical file system may also be trapped or analyzed by the layered file system 210 and are passed unaltered to the operating system as transactions 224. In one example, the layered file system 210 may be transparent to the operating system 214.
The layered file system 210, which is an example of the layered file system 120, evaluates or analyzes the transactions 218 and/or the transactions 222 to determine whether an entry needs to be made in the change log 212, which is an example of the change log 122. The change log 212 may be a representation of all of the transactions 218 and/or 222 that have occurred since the last backup and that need to be accounted for in the next backup. Once a backup is completed, the change log 212 may be cleared of existing entries in one example, or certain information may be retained in the change log 212 from one backup to the next backup.
In one example, when a backup operation is initiated, the layered file system 210 may switch to a different change log in one example. Over time, the change logs can be purges as necessary and reused. By switching to a new change log, however, the files to be included in the next backup operation are identified in the new change log. Alternatively, the same change log can be used and the backup agent is configured to distinguish between entries that were part of previous backup operations and entries that should be processed for the next backup operation.
The layered file system 210 is configured to make entries in the change log 212 so that it is not necessary for a backup agent to trawl the data or files 204 and so that the backup agent does not need to worry about all of the transactions that may have occurred since the last backup. The change log 212 may be an optimized representation of the transactions 218 or 222 that have occurred in a given time period or since the last backup. The change log 212 may include enough information to enable a backup agent to retrieve the files or portions of files that should be included in the next backup.
Over time, additional incremental backups are performed at some interval. This sequence of incremental backups are stored in the backups. The most recent incremental backup 304 is illustrated in
Because the layered file system 210 makes selective entries in the change log 212, the size of the change log 212 is smaller and can be handled more efficiently than a pure conventional transaction change log. As previously stated, performing an incremental backup based on a conventional transactional change log can significantly reduce system performance for various reasons. A change log that only includes certain entries may not reflect a transactional history, but the change log may be optimized for performing backup operations including incremental backup operations.
The layered file system 210 evaluates transactions related to files (or other data) in the physical file system. The analysis may rely on attributes 308 of files in a previous backup, on attributes of files as currently stored in the file system, or the like or any combination thereof. The following discussion describes some of the analysis that is performed by the layered file system when deciding whether to make an entry in the change log 212.
The analysis performed by the layered file system begins when a transaction is identified. An identified transaction can be trapped, copied, queued or the like. In one example, the transaction may simply be queued and the analysis is performed on the transaction without pausing the transaction itself. This allows the file system to operate normally while still allowing the layered file system to determine whether the transaction or other information should be entered into the change log. In one example, the transaction is copied temporarily in order to perform the analysis on the transaction.
For example, a transaction occurs when a file is created. When the file is created after the last backup, this transaction only needs to be recorded a single time in the change log 212. Subsequent writes to the newly created file do not need to be recorded in the change log because the file, however it exists at the time of the next backup, will be included in the next backup. Thus, a file that is created after the last backup should be recorded only once in the change log 212 in one example. The record may therefore include a CREATE entry in the event mask. Writes to such a newly created file (since the last backup) need not be recorded in the change log 212 since all contents of the file are new at least with respect to the last backup. A transaction other than a write, such as a DELETE or SETATTR, may be reflected in the record in the change log 212 for the file.
When the file is newly created, the layered file system 210 should create a record denoting the creation of the file but flush the record to the change log 212 in a delayed fashion after the last close on the file has been detected. In this example, the entries to the records in the change log may be delayed.
Transactions also occur when a file is written, truncated, or when a set-attribute operation is performed. If a file in the physical file system has experienced a write/truncate/set-attribute operation, the layered file system should first check to determine if the create time of the file is after the most recent backup time If the create time is after the most recent backup time, then the write/truncate/set-attribute operation should not be recorded in the change log 212.
This is an example of selectively recording a transaction or of selectively making an entry in the records included in the change log 212. This also demonstrates that the change log may only identify, in one embodiment, changes that need to be accounted for in the next backup. When a file is newly created, the backup operation can be optimized by noting that the file needs to be backed up without worrying about changes that occur (e.g., writes, truncates) prior to the next backup operation. Operations that happen to a file that is created after the last backup do not need to be recorded in the change log 212 at least in the context of incremental backups. This information could be recorded in another log (e.g., a transactional log), however, if desired.
When a transaction is detected or trapped that relates to the removal or deletion of a file from the file system, the layered file system should first check the creation time of the file being removed by the transaction. If the creation time is after the last backup time, the removal of the file and the creation record associated with the file do not need to be recorded in the change log 212. In other words, the transactions related to the creation and removal of the file can be skipped as long as the change log does not include an entry reflecting the creation of the file. If the creation transaction has been flushed to the change log 212, then the removal transaction should also be reflected in the change log 212 by removing the creation record in one example or by adding a DELETE entry in the event mask. Advantageously, the entries or records in the change log 212 are minimized and the ability to perform an incremental backup is enhanced. In either case, the record can be processed and enable the backup agent to know that the file has been removed and does not need to be backed up.
This discussion also illustrates that the entries in the change log 212 may be entered by the layered file system on a delayed basis. The layered file system, for example, may periodically flush entries to the change log or may flush entries on detecting certain events (e.g., flush a creation record to the change log 212 when a close transaction is detected for the newly created file). This enables the layered file system to wait a certain period of time or until a certain transaction has been detected before making the entry in the change log.
In another example, a transaction may occur where a file that was present during last backup is appended to. In this instance, the layered file system should record a size of the file (last_backup_file_size) during the last backup in the change log 212. No record for appending writes should be maintained in the change log 212 in one embodiment. When the backup is performed, the backup agent should copy all file data in the file past “last_backup_file_size”. This eliminates the need to create change records in the change log 212 for all appending writes made to the file. In this example, a portion of the file is included in the incremental backup since all file data up to the last_backup_file_size was included in the previous backup.
In another example, a transaction in the file system may be related to a file truncation. When a file that was present during last backup is truncated, the layered file system should check to determine if the new size of the file is less than the size of the file at the time of the previous backup. If the size of the file is less, the new file size should be recorded in the change log 212. All writes past this new size (the truncated size) should now be treated as appending writes and handled as described above in the change log 212.
In another example, a transaction may involve changing or altering a time stamp or other attribute of a file. When a time stamp of the file is modified (e.g. using the setattr( ) or other system calls), the layered file system should synchronously record this change along with the inode number of the file and a “generation number” to the change log 212. Subsequently, if the file is removed or deleted, it is possible that the file removal record will be absent from the change log 212 especially if the setattr( )call advanced the creation time of the file since the last backup time (this is due to the optimization previously discussed that files created and removed since the last backup are not recorded in the change log).
If such a file (wherein the time stamp has been modified) were to be written to, no write records would be present in the change log 212 since the creation time has been modified and is after the last backup time. To solve this type of issue, the backup agent may be configured to handle transactions associated with the setattr( ) call or other system calls or attribute changes differently. When reading a setattr( ) record or entry from the change log 212, the backup agent should check for the file's existence at the time of doing the backup. If the file exists, then the backup agent should compare the file's generation number with the generation number recorded with the setattr( ) change log record. If the generation numbers match, then the file will be backed up with all its data since it is not possible to detect whether the file was written to after the setattr( ) call was performed. If the file exists, but the generation number does not match, then the file is assumed to be a new file that was created (with the same name) after the original file was removed. The backup agent in this scenario may replace the previous version of the file with the new version. If, however, the file as recorded in the setattr( ) change log record does not exist, the backup agent may remove the file from the previous backup.
In a transaction that involves a write that is an in-file write (which is different from an appending write in one instance), the regions of the file that have been modified may be tracked. This enables the backup operation to backup the in-line portions of the file that were modified. Alternatively, the entire file is backed up when the write is an in-line write. An appropriate entry may be made in the record of the file in the change log for such a write that allows the backup agent to identify the appropriate regions to include in the backup or that cause the entire file to be backed up.
The backup operation may be initiated by the backup server, the backup agent, a user, or the like. In block 404 (identify data to be backed up), the data to be backed up is identified or accessed. In one example, the data is identified by accessing the change log in block 406 (access change log).
Accessing the change log may include using the records recorded in the change log to identify which files and/or which parts of files should be included in the backup. The records in the change log may refer to newly created files, appended portions of files, files whose attributes have been modified, or the like or any combination thereof. The backup may also remove files from previous backups where appropriate.
In block 408 (perform backup), the backup is performed. The backup agent can cooperate with the backup server to backup the files that correspond to the records in the change log. The backup agent may access, read, or otherwise process the records in the change log to identify the files and/or portions of files to be backed up.
The change log may store certain attributes about each file to be backed up. The change log, in addition to fields previously discussed, may also store creation date, size, name, and other attributes, or the like or any combination thereof.
Once a transaction is detected or trapped, the transaction is processed in block 504 (process transaction), for example by the layered file system. Processing the transaction can include one or more acts or steps that are configured to determine whether a record should be made, removed, or updated in the change log. For example, a transaction that involves the creation of a file may involve a comparison between the creation date of the file and the time of the most recent backup. In this case, the layered file system may make an entry or record in the change log that reflects the creation of the file and that causes the backup agent to backup the newly created file in the next backup.
In one example, the layered file system may wait to flush the record of the transaction to the change log until a transaction is detected that closes the file or for a certain amount of time. By waiting to flush records or entries to the change log, it may be possible to eliminate unnecessary records from the change log that relate to transactions that do not impact the files included in the next backup.
When processing the transaction, the decision to create or update a record or entry in the change log is made. In this context, entries in the change log are selectively made. Records of some transactions may be entered into the change log while records of other transactions are not entered in the change log. A newly created file may result in a record in the change log while a transaction that appends data to the file may not result in a record being added to the change log.
In block 506 (record transaction to change log if necessary), the transaction is selectively recorded in the change log. Stated another way, a record of the transaction or that reflects the transaction is entered into the change log. The transaction is recorded in the change log, in one example, only when the transaction results in data that should be included in or accounted for by the next backup. A new file, file amendments or size increases, file truncations, file deletions, and the like are transactions that should be accounted for in the next backup and that may have corresponding entries in the records in the change log.
In block 508 (perform incremental backup using change log), a backup is performed based on the change log. A backup agent may access or use the change log to ensure that the appropriate files or data are backed up in the backup being performed. Embodiments of the invention advantageously reduce the overhead associated with conventional systems that trawls the system to identify changes to include in the incremental backup. In contrast to conventional systems, the change log is generated during normal use of the file system in one example. In addition, inefficiencies associated with transactional change logs are reduced because entries in the change log of the present disclosure are entered in a selective manner.
Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. Thus, it should be understood that computer storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts disclosed herein are disclosed as example forms of implementing the claims.
As used herein, the term ‘module’ or ‘component’ can refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system, for example, as separate threads. While the system and methods described herein can be implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In the present disclosure, a ‘computing entity’ may be any computing system as previously defined herein, or any module or combination of modulates running on a computing system.
In at least some instances, a hardware processor is provided that is operable to carry out executable instructions for performing a method or process, such as the methods and processes disclosed herein. The hardware processor may or may not comprise an element of other hardware, such as the computing devices and systems disclosed herein.
In terms of computing environments, embodiments of the invention can be performed in client-server environments, whether network or local environments, or in any other suitable environment. Suitable operating environments for at least some embodiments of the invention include cloud computing environments where one or more of a client, server, or target virtual machine may reside and operate in a cloud environment.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.