1. Field of the Invention
The present invention relates to a method, system, and program for determining changes made to a source file to apply to transmit to a target location providing a mirror copy of the source file.
2. Description of the Related Art
Backup programs backup data at a computer system to a backup storage device, which may comprise a local storage device or remote storage device. In certain backup environments, backup agents are installed on both a source and target systems. To determine if changes need to be applied to the target file, the source and target backup agents each calculate checksum codes for segments of the file and exchange the checksum codes. If the checksum codes for different file sections differ, then the data in the source file having different data than the target as identified by the different checksum codes needs to be transferred to the target backup agent to apply to the target file. This type of system allows only the changed data to be transferred without having to transfer the entire file, but requires the installation of separate agents. Further, the calculation of the checksums can be computationally expensive.
In other backup environments, there is only one backup program and the backup program keeps track of changed source files and then during synchronization copies the changed source files to the target site to replace the target file. This-type of backup environment avoids the need for separate source and target backup agents and checksum calculations for sections of the file, but requires that the entire modified source file be transferred even though only a small part of the source file may have changed.
Provided are a method, system, and program for determining changes made to a source file to transmit to a target location providing a mirror copy of the source file. An operation to modify a source file at a source location is detected, wherein a target file at a target location includes a copy of a version of the source file. A base copy of the source file is created. The operation to modify the source file after creating the base copy is executed. Differences are determined between the source file and the base copy of the file. The determined differences are transmitted to the target location, wherein an aggregation of the target file and the transmitted determined differences comprises the modified source file.
The backup program 8 is controlled by backup settings 22, including default settings and settings configured by a user of the backup program 8. The backup program 8 may generate a user interface 26 rendered on a computer monitor 28 in which the user may enter backup settings 22 to control the backup operations of the backup program 8.
A backup cache 24 is used to maintain a base copy 25 comprising a copy of a source file before modifications are made to that source file. In one embodiment, the backup cache 24 further includes delta files 17a for modified source files that indicate changes between a source file and the base copy 25 for the source file. The backup cache 24 may be implemented in the same device as the source storage 14 or in a separate storage device. The source storage 14, target storage 20, and backup cache 24 may be implemented in separate storage devices or in a same storage device or system. The memory 6 may further include a file system 30 that implements a hierarchical file system 12 to maintain and store user and system data. A backup extension 32 may be integrated with the file system 30 code to intercept modifications to source files in the file system 12. The backup extension 32 intercepts file system 30 operations to modify a source file to determine whether a base copy 25 needs to be created.
In certain embodiments, the file system 30 and backup extension 32 may operate at a high priority, such as by executing in a kernel mode of the operating system. The backup extension 32 may be installed to modify the file system 30 code when installing the backup program 8. The backup program 8 may operate in a user mode or space, as opposed to kernel mode, to perform backup related operations, such as synchronization, that do not involve intercepting file system operations, such as writes.
The storages 14 and 20 may be implemented in storage devices known in the art, such as one hard disk drive, a plurality of interconnected hard disk drives configured as Direct Access Storage Device (DASD), Redundant Array of Independent Disks (RAID), Just a Bunch of Disks (JBOD), etc., a tape device, an optical disk device, a non-volatile electronic memory device (e.g., Flash Disk), etc.
In one embodiment, the target file system 18 replicates the backed-up source directories and files 10, such that the target directories and files 16 are in the native file format of the corresponding source directories and files 10 backed-up. Thus, the target files 16 may be directly accessed by the applications that created the files.
In one embodiment, when the backup program 8 wants to transmit the changes to the target location 20, the backup program 8 creates a new delta file 17a for the source file, determines the differences or changes of the modified source file over the base copy 25 and indicates a location of the changes 86a . . . 86n (deleted, updated or added) in the source file 86a . . . 86n and the type of change 88a . . . 88n, such as deletion, update, or addition, being made during the current modification. Once the delta file 17a is transmitted to the target storage 20, the delta file 17a and base file 25 from which the delta file 17a was created may be deleted to clean up the backup cache 24. The delta file 17a may be stored in the backup cache 24 as shown in
In one embodiment, the backup program 8 may issue a file system write operation to apply differences between the source file and base copy 25. In such embodiments, the delta file 17a or 17b is not created or maintained, because the determined differences are applied directly to the target file in the target storage location 20. The write operation may be executed by the file system 30 if the target storage 20 is local with respect to the computer 2. Alternatively, if the target storage 20 is managed by a machine remote with respect to the computer 2, then the backup program 8 communicates the write operation over a network to the target location using techniques known in the art for communicating file system operations over a network.
After information on the changes, i.e., deltas, are transmitted to the target storage 20, to either be stored in delta file 17b or immediately applied to the target file, the base copy 25 and any existing delta files 17a created from that base copy may be deleted. After deletion, a new base copy 25 and delta file 17a are created in response to further modifications of the source file after synchronization with the target.
In the embodiment of
In certain embodiments, there may be multiple delta files 17b maintained for one target file. The user of the backup program 8 may then obtain the changes as of the point-in-time one of the delta files 17b was created by combining the delta file 17b and all previously created delta files with the target file, such that modifications are applied in an order from the earliest to the most current.
With the described embodiments, if a source file changes multiple times, then the updates can be determined by comparison of the source file and base copy, and the determined changes are only transferred at specific synchronization times, so that the backup program does not have to continually transfer data to the target storage in response to each change. Moreover, the synchronization may be scheduled during a time of minimal usage of the computer 2. The described embodiments may be useful if a user is modifying files while not having access to the target storage 20, such as the case if the target storage 20 is normally accessible over a network to which the user does not currently have access. In such case, all changes will be capable of being determined from the source file information in the backup cache 24, and the synchronization performed when the user reconnects to the target storage 20. Further, with certain described embodiments, the base copy is only maintained for source files that have been modified to conserve space in the backup cache to store base copies used to determine changes.
The described operations may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” as used herein refers to code or logic implemented in a medium, where such medium may comprise hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium, such as magnetic storage medium (e.g., hard disk drives, floppy disks, tape, etc.), optical storage (CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.). Code in the computer readable medium is accessed and executed by a processor. The computer readable medium in which the code or logic is encoded may also comprise transmission signals propagating through space or a transmission media, such as an optical fiber, copper wire, etc. The transmission signal in which the code or logic is encoded may further comprise a wireless signal, satellite transmission, radio waves, infrared signals, Bluetooth, etc. The transmission signal in which the code or logic is encoded is capable of being transmitted by a transmitting station and received by a receiving station, where the code or logic encoded in the transmission signal may be decoded and stored in hardware or a computer readable medium at the receiving and transmitting stations or devices. Additionally, the “article of manufacture” may comprise a combination of hardware and software components in which the code is embodied, processed, and executed. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the present invention, and that the article of manufacture may comprise any information bearing medium known in the art.
The terms “an embodiment”, “embodiment”, “embodiments”, “the embodiment”, “the embodiments”, “one or more embodiments”, “some embodiments”, and “one embodiment” mean “one or more (but not all) embodiments of the present invention(s)” unless expressly specified otherwise.
The terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless expressly specified otherwise.
The enumerated listing of items does not imply that any or all of the items are mutually exclusive, unless expressly specified otherwise.
The terms “a”, “an” and “the” mean “one or more”, unless expressly specified otherwise.
Devices that are in communication with each other need not be in continuous communication with each other, unless expressly specified otherwise. In addition, devices that are in communication with each other may communicate directly or indirectly through one or more intermediaries.
A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary a variety of optional components are described to illustrate the wide variety of possible embodiments of the present invention.
Further, although process steps, method steps, algorithms or the like may be described in a sequential order, such processes, methods and algorithms may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
When a single device or article is described herein, it will be readily apparent that more than one device/article (whether or not they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be readily apparent that a single device/article may be used in place of the more than one device or article or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the present invention need not include the device itself.
In certain embodiments, the file sets and metadata are maintained in separate storage systems and commands to copy the file sets and metadata are transmitted by systems over a network. In an alternative embodiment, the file sets and metadata may be maintained in a same storage system and the command to copy may be initiated by a program in a system that also directly manages the storage devices including the file sets and metadata to copy.
The illustrated operations of
The foregoing description of various embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto. The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.