1. Field of Invention
The present invention relates generally to disaster recovery and remote data replication in storage area networks (SANs), and more particularly to a system and method thereof for maintaining data consistency over an iSCSI network.
2. Discussion of Prior Art
Almost all business processing systems are concerned with maintaining backup data in order to ensure continued data processing when data is lost, damaged, or otherwise unreachable. Furthermore, business processing systems require data recovery in a case of unplanned interruption, also referred to as a “disaster”, of a primary storage site. Specifically, disaster recovery protection requires that at least a secondary copy of data is stored at a location remote to the primary site.
There are a myriad of prior-art disaster protection solutions. A known method of providing disaster protection is to backup data to a tape on a regular basis. The tape is then shipped to a secure storage area, usually located at a distance from the primary data center. A problem of this protection solution is the recovery time upon a disaster as it could take up to few days to restore the backup data, while at this time the data center can not operate.
An improved disaster recovery solution, also referred to as “remote mirroring”, is to backup data remotely and continuously, where the secondary site is geographically distant from the primary site. The two sites are typically connected to each other via high-speed wide area network (WAN) link. When data writes are made to a local volume at the primary site, these writes are replicated on a remote volume at the secondary site via the WAN link. This solution utilizes one of two different data replication methods referred to as synchronous mirroring or asynchronous mirroring.
In synchronous mirroring, data writes are simultaneously issued to both local and remote volumes. Write commands are placed in a holding queue while the host waits for the remote write to be completed and acknowledged. This method introduces substantial latency into the production environment even when the mirrored volumes share a high-speed connection. In asynchronous mirroring, data writes are made to the local volume and the host is acknowledged when local write is completed. The data writes are then transferred off-line to a remote site. This method reduces latency; however, it results in data gaps between the local and remote sites.
In storage area networks (SANs) data blocks are transferred between hosts and storage devices mainly by using the Fiber Channel (FC) or small computer system interface (SCSI) protocols. Traditionally, the connection to a remote SAN, for the purpose of disaster recovery, is formed through a FC link. This provides a native solution to backup data for distances of up to tens kilometers between a local and remote site. However, such a solution is expensive as it mandates a dedicated FC fiber-optic cable spread between the two sites. To eliminate the distance limitation, few technologies and protocols have been introduced. One of which is the internet FC protocol (iFCP) which provides a mechanism for transferring FC SCSI commands over IP networks. Yet, the iFCP solution requires dedicated and very expensive hardware for bridging between FC ports and the IP network. In addition, such hardware can bridge only a single FC port to the network, resulting in a bandwidth bottleneck.
Another connectivity means used in SANs is the internet SCSI (iSCSI) protocol. The iSCSI protocol utilizes the IP networking infrastructure to quickly transport large amounts of data blocks over existing local or wide area networks. The iSCSI does not require any dedicated hardware and does not have distance limitations. Therefore, there is a need for a system and method thereof that provides disaster recovery and remote data replication functionalities enabling to maintain data consistency between two SANs over an iSCSI network.
The following references provide a general teaching in the area of data coherency and data recovery, but they fail to provide for many of the limitations of the present invention.
The patent to Duyanovich et al. (U.S. Pat. No. 5,555,371) provides for data backup copying with delayed directory updating and reduced numbers of DASD accesses at a backup site using a log structured array data storage. Data storage in both primary and secondary data processing systems is provided by a log structured array (LSA) system that stores data in a compressed form. Each time data are updated within LSA, the updated data are stored in a data storage location different from the original data. Selected data recorded in a primary storage of the primary system is remote dual copied to the secondary system for congruent storage in a secondary storage device for disaster recovery purposes.
The patent to Kern et al. (U.S. Pat. No. 5,720,029) provides for a disaster recovery system for asynchronously shadowing record updates in a remote copy session using track arrays. A host processor at a primary site of the disaster recovery system transfers a sequentially consistent order of copies of record updates to a secondary site for backup purposes. The copied record updates are stored on the secondary data storage devices which form remote copy pairs with the primary data storage devices at the primary site.
The patent to Kern et al. (U.S. Pat. No. 5,734,818) provides for a remote data shadowing system forming consistency groups using self-describing record sets for remote data duplexing. Record updates at a primary site cause write I/O operations in a storage subsystem therein. The write I/O operations are time stamped and the time sequence and physical locations of the record updates are collected in a primary data mover.
The patent to Crockett et al. (U.S. Pat. No. 6,105,078) provides for an extended remote copying system for reporting both active and idle conditions wherein the idle condition indicates no updates to the system for a predetermined time period. A primary data mover monitors both consistency time and idle time in a system that performs continuous, asynchronous, extended remote copying between primary and remote processors, and manages both with accuracy and consistency. The primary data mover detects system activity levels and manages data accuracy for the extended remote copying in both active and idle systems.
The patent to LeCrone et al. (U.S. Pat. No. 6,543,001) provides for a method and apparatus for maintaining consistency data coherency in a data processing network including local and remote data storage controllers interconnected by independent paths. The remote storage controller(s) normally act as a mirror for the local storage controller(s), and, if transfer over one of the independent communication paths to predefined devices in a group is suspended thereby assuring data consistency at the remote storage controller(s). When the cause of the interruption has been corrected, the local storage controllers are able to transfer data modified since the last suspension occurred to their corresponding remote storage controllers to reestablish synchronism and consistency for the entire dataset.
The patent to Milillo et al. (U.S. Pat. No. 6,643,671) provides for a system and method for synchronizing a data copy using an accumulation remote copy trio consistency group. Target volumes transmit to secondary volumes in series relative to each other so that consistency is maintained at all times across the source volumes.
The patent application publication to Kodama et al. (US 2004/0133718) provides for a direct access storage system with combined block interface and file interface access, wherein the system includes a storage controller and storage media for reading data from or writing data to storage media in response to block-level and file-level I/O requests.
Whatever the precise merits, features, and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention.
The present invention provides for a method for maintaining data consistency over an internet small computer system interface (iSCSI) network, for disaster recovery purposes, wherein the method comprises the steps of: (a) copying the entire content of a primary volume to a secondary volume; (b) receiving data writes from at least one host; (c) saving simultaneously the data writes in a primary volume and in the primary journal, wherein the data writes in the primary journal are ordered in point-in-time (PiT) frames; and (d) according to a predefined policy initiating a process for transferring at least one PiT frame from the primary journal to a secondary journal by inserting in the primary journal a PiT marker ending the PiT frame, iteratively, obtaining data writes saved in the PiT frame, generating for each data write to be transferred a small computer system interface (SCSI) command, transferring the SCSI command to a secondary site using the iSCSI protocol, and saving the data write encapsulated in the SCSI command in a secondary journal.
The present invention also provides for a system for maintaining data consistency over an internet small computer system interface (iSCSI) network, for disaster recovery purposes, wherein the system comprises: (a) a network interface capable of communicating with a plurality of hosts through a network; (b) a data transfer arbiter (DTA) capable of handling data writes transfer between a plurality of storage devices and the plurality of hosts; wherein the DTA is being further capable of controlling the process of maintaining data consistency; (c) a device manager (DM) capable of interfacing with the plurality of storage devices; and, (d) a journal transcriber capable of transferring data writes from a primary site to a secondary site.
The present invention also provides for a computer program product comprising a computer readable medium with instructions to enable a computer to implement a method maintaining data consistency over an internet small computer system interface (iSCSI) network, wherein the medium comprises: (a) computer readable program code working in conjunction with the computer to copy the entire content of a primary volume to a secondary volume; (b) computer readable program code working in conjunction with the computer to receive data writes from at least one host; (c) computer readable program code working in conjunction with the computer to save, simultaneously, the data writes in the primary volume and in a primary journal, wherein the data writes in the primary journal are ordered in point-in-time (PiT) frames; and (d) computer readable program code working in conjunction with the computer to initiate, according to a predefined policy, a process for transferring at least one PiT frame from the primary journal to a secondary journal by inserting in the primary journal a PiT marker ending the PiT frame, iteratively obtaining data writes saved in the PiT frame, generating for each data write to be transferred a small computer system interface (SCSI) command, transferring the SCSI command to a secondary site using the iSCSI protocol, and saving the data write encapsulated in the SCSI command in a secondary journal.
The present invention also provides for a computer program product comprising a computer readable medium with instructions to enable a computer to implement a method maintaining data consistency over an internet small computer system interface (iSCSI) network, wherein the medium comprises: (a) computer readable program code working in conjunction with the computer to insert a PiT marker beginning a PiT frame to be transferred; (b) computer readable program code working in conjunction with the computer to log data writes in a primary journal, wherein said data writes are ordered in the point-in-time (PiT) frame; (c) computer readable program code working in conjunction with the computer to insert a PiT marker indicating end of said PiT frame to be transferred; (d) iteratively obtaining data writes saved in said PiT frame; (e) computer readable program code working in conjunction with the computer to generate, for each data write to be transferred, a small computer system interface (SCSI) command; (f) computer readable program code working in conjunction with the computer to transfer said generated SCSI command to said secondary site using the iSCSI protocol; and (g) computer readable program code working in conjunction with the computer to save a data write encapsulated in the SCSI command in a secondary journal.
While this invention is illustrated and described in a preferred embodiment, the invention may be produced in many different configurations. There is depicted in the drawings, and will herein be described in detail, a preferred embodiment of the invention, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and the associated functional specifications for its construction and is not intended to limit the invention to the embodiment illustrated. Those skilled in the art will envision many other possible variations within the scope of the present invention.
Disclosed are a method and system for maintaining data consistency over an Internet small computer system interface (iSCSI) network for disaster recovery purposes. Data consistency is maintained between primary and secondary sites geographically distant from each other. The method disclosed logs all changes (data writes) made to a primary volume in a primary journal, transmits the changes according to a predefined policy, to a secondary journal, and thereafter merges the changes in the secondary journal with a secondary volume. Changes logged in the primary journal are ordered in point-in-time (PiT) frames and transmitted using a vendor specific SCSI command utilizing the iSCSI protocol.
Referring to
Storage devices 114 and 124 are physical storage elements including, but not limited to, tape drives, optical drives, disks, and redundant array of independent disks (RAID). A virtual volume can be defined on one or more physical storage devices 114 and 124. Each virtual volume and hence storage device is addressable by logic unit (LU) identifier which usually comprises a target and a logical unit number (LUN). For the purpose of demonstrating the operation of the present invention a primary volume 118 comprising of storage devices 114-1 and 114-2 is defined in SAN 110 and exposed to host 111, while a secondary volume 128 comprising of storage device 124-1 is defined in SAN 120. The primary and secondary volumes are configured as a disaster recovery (DR) pair. A DR pair is a pair of volumes, one exposed on the primary site and the other exposed on the secondary site, where the latter volume is configured to be an asynchronous mirror volume of the former volume. It should be noted that a primary volume in the DR pair may be part of a consistency group. A consistency groLip is a groLip of volumes that maintain their consistency as a whole. All operations on volumes across a consistency group must be finished before any further action that may compromise the group consistency is performed.
The present invention discloses a point-in-time (PiT) based asynchronous mirroring technique for performing data replication for disaster recovery purposes. This technique provides a consistent recoverable volume at specific points in time. In accordance with the disclosed technique, primary volume 118 contains the updated data while secondary volume 128 contains a consistent copy of primary volume 118 at a specific point in time. Namely, the primary and secondary volumes have an intrinsic data gap.
To utilize the PiT based asynchronous mirroring technique a journal volume 119 (a primary journal) is linked to the primary volume 118 and another journal volume 129 (a secondary journal) is linked to the secondary volume 128. A journal may be considered as a first-in first-out (FIFO) queue where the first inserted record is the first to be removed from journal. Journaling is used intensively in database systems and in file systems. In such systems the journal logs any transactions or file system operations. The present invention utilizes the journal volumes to log data writes (changes) in storage devices. Specifically, journal volume 119 records data writes made to primary volume 118 and journal volume 128 maintains a copy of these writes that are up-to-date to a certain point in time. The data writes in the journal volumes are ordered in PiT frames. Each PiT frame includes a series of sequential writes perfonmed between two consecutive PiTs. The boundaries of a PiT frame are determined by a PiT marker that acts as a separator, and inserted by VS 112 each time a PiT synchronization procedure is called. This procedure is discussed in greater detail below. In an embodiment of this invention each of the journal volumes utilizes storage devices, e.g., disks. However, it should be noted that each of journal volumes 119 or 129 may be implemented using one or more non-volatile random access memory (NVRAM) units that may be connected to an uninterruptible power supply (not shown).
To ensure a proper recovery in a case of a disaster there is also a need to maintain the state of the primary site. For that purpose, VS 112 exchanges control information with VS 122 using a vendor specific SCSI command utilizing the iSCSI protocol.
In
The process for maintaining data consistency begins with a replication of the entire content of primary volume 118 to secondary volume 128. This procedure is referred to as the “initial synchronization” and is further discussed below. Once those two volumes are synchronized, all data writes (i.e., changes from the initial state) are recorded in journal volume 119. According to a predefined policy, a PiT marker is inserted to journal volume 119 and the PiT frame including all data writes between the last and previous PiT markers are transmitted to journal volume 129. PiT frame entries are sent to the secondary site utilizing a vendor-specific SCSI commands using the iSCSI protocol as a transport protocol over the IP network 140. In the secondary site the replicated PiT frame in journal volume 129 is merged with secondary volume 128 according to a predefined policy.
The predefined policy determines when to synchronize PiT frames with the secondary site and when to merge the PiT frames into the secondary volume. Specifically, the policies define the actions needed to be performed, the actions schedule and the consistency group the actions should be performed on. A policy may be, but is not limited to, completion of the transmission of a PiT frame, a user command, a predefined number of PiT frames in journal 129, a predefined elapsed time from the last merge action, a predefined time interval, a predefined number of data writes in a PiT frame, a predefined number of PiT frames, a predefined amount of changes (e.g., MB, KB, etc.), to replicate changes at a specific hour, and so on.
In case of a disaster in the primary site, the data that resides at the secondary journal includes all the entries needed to maintain a consistent and recoverable volume state for a specific point in time. That is, the last PiT frame that was successfully merged or fully written to the secondary journal 129. If journal volume 129 includes PiT frames that have not been merged yet, the user may run a merging procedure to update the PiT frames into secondary volume 128. To enable host 122 to access the latest consistent data, secondary volume 128 has to be exposed on host 122.
Referring to
Referring to
At step S440, data writes made by a client application that resides in the primary host (e.g., host 111) are received and thereafter, at step S450, written to the synchronous mirror volume. Namely, these writes are simultaneously written both to the primary volume and journal volume. Generally, the data writes saved in the journal volume include a data block and a logical block address (LBA) indicating the block location in the primary volume, e.g., an offset in the primary volume address space. At step S460, a check is made to determine whether the PiT synchronization procedure should be executed. As mentioned above, the execution of the PiT synchronization procedure is trigged by DR manager 320 according to predefined polices. If step S460 results with an affirmative answer execution continues with step S470 where the PIT synchronization procedure is performed; otherwise execution returns to step S440.
Referring now to
Referring back to
Referring to
Additionally, the present invention provides for an article of manufacture comprising computer readable program code contained within implementing one or more modules implementing a method to maintain data consistency over an internet small computer system interface (iSCSI) network. Furthermore, the present invention includes a computer program code-based product, which is a storage medium having program code stored therein which can be used to instruct a computer to perform any of the methods associated with the present invention. The computer storage medium includes any of, but is not limited to, the following: CD-ROM, DVD, magnetic tape, optical disc, hard drive, floppy disk, ferroelectric memory, flash memory, ferromagnetic memory, optical storage, charge coupled devices, magnetic or optical cards, smart cards, EEPROM, EPROM, RAM, ROM, DRAM, SRAM, SDRAM, or any other appropriate static or dynamic memory or data storage devices.
Implemented in computer program code based products are software modules for: (a) copying the entire content of a primary volume to a secondary volume; (b) receiving data writes from at least one host; (c) saving simultaneously the data writes in the primary volume and in a primary journal, wherein the data writes in the primary journal are ordered in point-in-time (PiT) frames; and (d) initiating, according to a predefined policy, a process for transferring at least one PiT frame from the primary journal to a secondary journal by inserting in the primary journal a PiT marker ending the PiT frame, iteratively obtaining data writes saved in the PiT frame, generating for each data write to be transferred a small computer system interface (SCSI) command, transferring the SCSI command to a secondary site using the ISCSI protocol, and saving the data write encapsulated in the SCSI command in a secondary journal.
Also implemented in a computer program code based products are software modules for: (a) inserting a PiT marker beginning a PiT frame to be transferred; (b) logging data writes in a primary journal, wherein said data writes are ordered in the point-in-time (PiT) frame; (c) inserting a PiT marker indicating end of said piT frame to be transferred; (d) iteratively obtaining data writes saved in said PiT frame; (e) generating, for each data write to be transferred, a small computer system interface (SCSI) command; (f) transferring said generated SCSI command to said secondary site using the iSCSI protocol; and (g) saving a data write encapsulated in the SCSI command in a secondary journal.
A system and method has been shown in the above embodiments for the effective implementation of a method and system for maintaining data consistency over an internet small computer system interface (iSCSI) network. While various preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, it is intended to cover all modifications falling within the spirit and scope of the invention, as defined in the appended claims. For example, the present invention should not be limited by software/program, computing environment, or specific computing hardware.
The above enhancements are implemented in various computing environments. For example, the present invention may be implemented on a conventional IBM PC or equivalent, multi-nodal system (e.g., LAN) or networking system (e.g., Internet, WWW, wireless web). All programming and data related thereto are stored in computer memory, static or dynamic, and may be retrieved by the user in any of: conventional computer storage, display (i.e., CRT) and/or hardcopy (i.e., printed) formats. The programming of the present invention may be implemented by one of skill in the art of disaster recovery and remote data replication in storage area networks (SANs).