The present invention relates generally remote data replication, and more particularly to a remote replication solution that improves application performance.
Many applications maintain persistent state information by storing data on disk. Often the data stored on disk is designed to allow an application to return to the same state after an unexpected restart. The ability of an application to reliably return to a previous state often depends on data being stored to disk in a specific order. In order to protect against data loss and business interruption due to disasters, application data is often replicated to a geographically remote site. Ideally the remote location is far enough away from the primary data center to ensure that a single disaster will not be able to destroy both data centers. In the event of a disaster, the remote copy of data can be used to either reconstruct a new primary data center or restart the affected applications at the remote location itself. In order for an application to be restarted at the remote site and return to its pre-failure state, data must be copied to the remote site in the appropriate order.
More particularly, to ensure that they can return to the same state, applications strictly control the order in which state information is written to disk. Typically, I/O requests to store new state information to disk are not issued until I/O operations to store previous state information have completed. Such write operations are said to be dependent on the previous write requests. Applications rely on this explicit control of dependent write ordering to ensure that there will be no gaps or misordering of the state information stored on disk. In order to guarantee that this strict write ordering occurs, disk storage systems must store write data to disk in the order that it is received. Furthermore, where remote copies of data are maintained (“remote replication”), the same write ordering restrictions exist. Some advanced storage systems are capable of performing remote replication automatically in a manner transparent to applications. Such solutions relieve critical applications from the burden of managing the remote data copy and allow them to focus on performing their particular business function.
At present, there are two primary methods to reliably maintain a remote copy suitable for application restart; synchronous and semi-synchronous remote replication. In accordance with the synchronous remote replication method, each write received is simultaneously applied to both the local disks and the remote disks. In order to ensure correct ordering of dependent writes, storage systems typically only allow one write to occur at a time and do not complete a write operation until the remote copy has been updated. Since write requests are not completed until the remote copy has been updated, the average latency of each write operation is increased to the time required to update the remote copy. That amount of time depends on, amongst other things, the geographic distance between the source of the request and the remote system, as well as the speed of the link between the two. Generally, the greater the distance, the longer the latency. This increased latency combined with the serial restriction needed to ensure the correct ordering of dependent writes can have a significant impact on application performance. As a result, it is difficult to construct geographically diverse disaster recover solutions using a synchronous replication solution while maintaining acceptable application performance.
In accordance with the semi-synchronous remote replication method, write operations are allowed to complete locally before the remote copy has been updated. Doing so decouples the application from the latency of updating the remote copy and thereby attempts to avoid the associated performance penalties. However, in order to ensure that the remote copy remains consistent, the writes must still be applied to the remote copy in the order that they were received. Typically storage systems accomplish this by storing writes that need to be applied to the remote copy in a queue. Sometimes, to control how out of date the remote copy gets, a maximum length for this queue is defined that, when reached, causes the replication to fall back to a synchronous behavior. When this happens, application performance is negatively impacted just as it would with a purely synchronous solution.
While semi-synchronous solutions offer better performance than synchronous ones, they can still result in a stricter than necessary ordering of writes. In general, not every write issued by an application is a dependent one. Therefore there are some writes that could be allowed to complete in parallel. In practice, it is difficult for storage systems to distinguish between dependent and non-dependent writes. Therefore, semi-synchronous solutions must default to ordering all writes in order to maintain correctness. In doing so, however, the overly strict serialization of writes that this causes may lead to the ordering queue quickly reaching its maximum length and the application performance degradations that result.
Both the synchronous and semi-synchronous solutions negatively impact application performance due to their strict serialization of writes. There is a need for an improved remote replication solution to allow better application performance while guaranteeing that the remote copy of application data remains consistent with the original, to ensure that the remote site can be used for application restart and failover in the event of a disaster.
In accordance with the invention, a remote replication solution is provided that significantly improves application performance. The remote replication method receives a stream of data including independent streams of dependent writes. The method is able to discern dependent from independent writes. The method causes writes from independent streams to be stored on a storage device in parallel until a dependent write in a stream needs to be stored on the storage device. The method discerns dependent from independent writes by assigning a sequence number to each write, the sequence number indicating a time interval in which the write began. It then assigns a horizon number to each write request, the horizon number indicating a time interval in which the first write that started at a particular sequence number ends. A write is caused to be stored on a disk drive if the sequence number associated with the write is less than the horizon number. The method waits until all outstanding writes complete if the sequence number associated with the write is greater than the horizon number before issuing another write to disk.
Similar computer program and apparatus are provided. In this manner the invention distinguishes dependent from independent writes and is able to parallelize some writes, thus resulting in application performance improvement.
In order to facilitate a fuller understanding of the present invention, reference is now made to the appended drawings. These drawings should not be construed as limiting the present invention, but are intended to be exemplary only.
In accordance with the principles of the invention, there is provided a method for remote replication that allows a storage system to distinguish between dependent and independent writes in real-time. The method of the invention could be used to enhance a standard semi-synchronous remote replication solution to allow independent writes to be performed in parallel, thereby avoiding the overly strict serialization that would otherwise occur while at the same time maintaining consistency.
Referring to
Application state information is stored by the servers 14 at the primary data center 10 on the disk drives 18 so that, if some failure occurs in a server or part of the storage system, the application(s) can be restarted and their state recovered. Also included in the storage system 16 is remote replication logic 28. The remote replication logic 28 causes the application state information to be copied to the disk drives 24 at the remote data center. This is done so that if a disaster occurs at the location of the primary data center 10 that destroys the servers 14 and/or storage system 16, 18, or renders the system completely inoperable, the copy of the application state information at the remote data center 12 can be used to restart the application(s) at the remote location.
Applications strictly control the order in which state information is written to disk to ensure that they can return to the same state in the event of a failure. Typically, I/O requests to store new state information to disk are not issued by an application until I/O operations to store previous state information have completed. Such write operations are said to be dependent on the previous write operations. Applications rely on this explicit control of dependent write ordering to ensure that there will be no misordering of the state information stored on disk. In order to guarantee that strict write ordering for each dependent write stream occurs, disk storage systems must store dependent write data to disk in the order that it is received. However, multiple dependent write streams are often issued by an application, each stream being independent of the others. The independent streams can be stored to disk in parallel, as long as the dependent writes of each individual stream are strictly ordered.
Known remote replication algorithms maintain ordering of dependent writes through strict serialization of all write operations to disk, regardless of whether they are writes from independent streams. This strict serialization of writes significantly impacts application performance. For example,
In contrast, the invention is able to distinguish between the different dependent write streams and replicate them in parallel. This provides significant improvements in application performance. As shown in
The invention utilizes two primary components:
Generally, referring to
The starting and horizon sequence numbers may be stored as entries associated with the writes in the queue 32, or may be stored separately. An implementation wherein the starting and horizon sequence numbers are stored in the queue 32 is shown in
A preferred implementation of the remote replication method of the invention is shown in
Referring to
In
Referring to
The results of the replication process of
Compare and contrast the results of the replication method of the invention, shown in
So far, for purposes of clarity, the invention has been described as receiving parallel independent streams of dependent writes. The invention also applies to write streams presented in other manners. First of all, a serial write stream might be received by the storage system 22 via the connection 26, and then divided into parallel independent streams of dependent writes. Or, the invention can operate on a serial write stream directly. In the case of a serial write stream, the separate dependent write streams would be interleaved amongst one another. An example of this case is shown in FIG. 12. In
Number | Name | Date | Kind |
---|---|---|---|
5220653 | Miro | Jun 1993 | A |
5469560 | Beglin | Nov 1995 | A |
5504861 | Crockett et al. | Apr 1996 | A |
5592618 | Micka et al. | Jan 1997 | A |
5603063 | Au | Feb 1997 | A |
5615329 | Kern et al. | Mar 1997 | A |
5734818 | Kern et al. | Mar 1998 | A |
6105078 | Crockett et al. | Aug 2000 | A |
6141707 | Halligan et al. | Oct 2000 | A |
6170042 | Gaertner | Jan 2001 | B1 |
6311256 | Halligan et al. | Oct 2001 | B2 |
6378036 | Lerman et al. | Apr 2002 | B2 |
6415292 | Kamvysselis | Jul 2002 | B1 |
6487562 | Mason, Jr. et al. | Nov 2002 | B1 |
6665740 | Mason, Jr. et al. | Dec 2003 | B1 |
6691115 | Mosher et al. | Feb 2004 | B2 |
6721789 | DeMoney | Apr 2004 | B1 |
6754897 | Ofer et al. | Jun 2004 | B1 |
6839817 | Hyde, II et al. | Jan 2005 | B2 |
6871011 | Rahman et al. | Mar 2005 | B1 |
20040044865 | Sicola et al. | Mar 2004 | A1 |