The present invention relates to a large area data storage system wherein an external storage device can quickly recover from a blockage that occurs due to a disaster, and in particular, to a large area data storage system wherein three or more external storage devices located at distances of one hundred to several hundred kilometers perform complementary operations.
Disclosed in JP11338647, by the present inventor, is a method whereby doubling of a system or data is performed synchronously or asynchronously. Further, disclosed in JP2000305856, by the present inventor, is a technique for asynchronously copying data to a remote area.
As is described above, the present inventor has proposed asynchronous remote copy techniques whereby an external storage device (hereinafter referred to as a storage sub-system), without receiving special control information specifying data order, receives data from a large computer system, a server or a personal computer connected to a network, or another higher computer system (hereinafter referred to as a host), and employs asynchronous transmission to continuously write data to a remotely situated second storage sub-system, while constantly maintaining the order of the data.
Further, when data is to be copied using the synchronous transmission technique, the performance of the data update process between a host and a storage sub-system connected thereto interacts with the exercise of the copy control process between the storage sub-system and a second storage sub-system located in the vicinity or in a remote area. Therefore, macroscopically, data exchanged by the two storage sub-systems are constantly being matched, and the order in which the data are written is also obtained. When an appropriate data transfer path is selected, the copy process effected through the synchronous transfer of data can be performed even when the distance between the two storage sub-systems exceeds 100 km.
Recently, awareness has grown of how important are the safe storage and the maintenance of data, giving rise to the expression of many demands, originating in the data storage market, for viable disaster recovery systems. Conventional means devised to satisfy these demands generally provide for the synchronous and asynchronous transfer of data between two connected data storage points. However, further market sourced requests call for the inclusion of third and fourth data storage points (hereinafter referred to as data centers), and for the construction of comprehensive, or near comprehensive, disaster recovery systems to service these data centers.
The reasoning behind these requests is that so long as three or more data centers are established, even if a disaster strikes one of the data centers, the redundancy represented by the storage and maintenance of data at the remaining data centers will enable data to be recovered and will reduce the risk represented by the occurrence of a succeeding disaster.
According to the conventional technique, adequate consideration is not given for a case wherein three or more data centers have been established and I/O data is received from a host having a logical volume of only one storage sub-system, and the remote copy technique is used for transmissions to multiple data centers. For example, for an event wherein a data center is disabled by a disaster, little consideration is given as to whether a logical volume that guarantees data order can be maintained between two or more remaining data centers, whether the update state can be maintained and non-matching data can be removed, and whether a system that can copy data relative to a vicinity and a remote area can be re-constructed.
Since when a disaster will occur is an unknown, among a grouping of three or more data centers the order in which data is updated must be constantly maintained.
Therefore, a large area data storage system must be constructed wherein a specific function is not uniquely provided for a host and a plurality of remote copying systems are coupled together, wherein received data having the same logical volume is distributed to another storage sub-system situated at a nearby or a remote location, and wherein the storage sub-systems of data centers constantly guarantee the order in which data received from the host are updated.
To resolve the above problem, according to the invention, a large area data storage system copies data to another storage sub-system without providing a redundant logical volume for a storage sub-system.
Further, according to the present invention, the reconstruction of a large area storage system is assumed to be the recovery operation objective following a disaster. During normal operation, management information is directly exchanged by storage sub-systems that do not perform data transfer functions, and the data update state is monitored and controlled by each storage sub-system. Then, during a recovery operation (re-synchronization, or resync) following a disaster, only the difference between data stored in the storage sub-systems transmitted immediately before the disaster occurs, and the exchange of hosts (fail over) and the continuation of the application are performed immediately.
<To Constantly Guarantee the Order for Updating Data>
A supplementary explanation will now be given for the time range for holding a data order.
The I/O data issued by the host is written to the storage sub-system, and the host receives a data-write-complete notification from the storage sub-system before performing the next step. When the host does not receive a data-write-complete notification from the storage sub-system, or receives a blockage notification, the host does not normally issue the next I/O data. Therefore, the data writing order should be maintained when the storage sub-system performs a specific order holding process before and after it transmits a write-end notification to the host.
In the remote copy process performed by the synchronous transfer of data, the data to be transmitted and copied is written to a storage sub-system situated nearby or at a remote location (hereinafter referred to simply as a different location), and when a write-end notification is received from the storage sub-system at the different location, the write-end notification is reported to the host. Compared with when a remote copy process is not performed, remote copy time and data transfer time are increased, and the performance is delayed. When the connection distance for a remote copy process is extended, the processing time for the data transfer is increased, and the remote copy process causes the performance of the I/O process to be further deteriorated. One of the methods used to resolve this problem is the asynchronous transfer of data.
During the asynchronous transfer of data, upon receiving I/O data from the host, the storage sub-system transmits data to a storage sub-system at a different location, and returns a write-end notification to the host without waiting for the write-end notification from the storage sub-system at the different location. Thus, the transmission of data between the storage sub-systems is not associated with the I/O process performed by the host, and can asynchronously be performed with the I/O process of the host. However, unless the data is written to the storage sub-system in a different location in the order whereat the data was received from the host, the data order may not be maintained by the storage sub-system at the different location, and data non-matching may occur between the two storage sub-systems. The additional provision of a function that constantly guarantees the data order, is the best possible means by which to reduce occurrences of this problem.
Compared with the storage sub-system that has received the host I/O data, the updating of data in the storage sub-system at a different location is generally delayed. However, so long as the data is written to the storage sub-system following the order in which the data arrived from the host, there is no divergence in the data order, and the recovery from a blockage can be performed by a journal file system or a database recovery process.
There is another method by which, without maintaining data order, the remote copying of the data order to a storage sub-system at a different location and the reflection of the data can be performed. According to this method, data from the host that have been received up to a specific time are transmitted to a different location and are collectively written to the storage sub-system. When the data received up to a specific time have been written, the data transfer process is terminated, and thereafter, data transfer by remote copying is halted until collective writing is next performed, and while data transfer is halted, the data order and the consistency of the I/O data received from the host is guaranteed.
According to this method, the function for providing the data order information is not required. A specific amount of data to be updated is stored and is collectively transmitted, and when the writing of data to a remote side has been completed, the data matching is guaranteed. According to this method, however, when a blockage occurs during remote copying, the data is not updated while the data updating order on the remote side is maintained, so that all the data are lost. Only during a period in which the data transfer by remote copying is halted can the data matching be guaranteed and be called adaptive.
The technique of the present inventor of the “remote copying by the asynchronous transfer of data for constantly guaranteeing the data order” includes a feature that, before returning an end notification to the host, the storage sub-system performs a process for guaranteeing the data order. Since regardless of the overheard in the controller of the storage sub-system, or the delay time for the internal process, management is provided for the data order information for each block before returning the end notification to the host, the data order can be consistently guaranteed.
Actually, the data order information is managed or controlled for each block during a time considerably shorter than the interval whereat the host issues the I/O. The time out (Timeout) value for the distribution of data to the storage sub-system at the remote location is set for at least one hour. The importance of this is that the remote copy technique of the present invention transmits data, together with order information, to a data block and writes the data in order in accordance with the order information. This is possible, so long as the order is correct, because even when between the local and remote systems the time lag for the updating of data is half a day, for example, this is much better than when, due to the non-matching of data, all the updated data are lost.
A data transfer method is disclosed wherein among a plurality of apparatuses, where each apparatus comprises a plurality of disk sub-systems, a first apparatus issues a first inquiry of whether a second disk sub-system thereof is under a state of transferring data or not transferring data to a first disk sub-system thereof. The firs apparatus transmits first data of the second disk sub-system to the first disk sub-system when the first disk sub-system is transferring data. The transfer states on transmitting data in the first and second disk sub-systems are stored. A second inquiry of transfer states to one of said apparatuses which is not transmitting data is issued. In response to a result of the second inquiry the transfer states are updated.
Storage sub-systems located at three or more data centers are interconnected by synchronous transfers of data, and by an asynchronous remote copy technique for constantly and sequentially guaranteeing the order of data. Thus, a storage sub-system of a primary data center receives data from a host, and transmits the data to each of the storage sub-systems of the data centers at the remaining two or more points, while maintaining the order wherein the host updated the data.
Since the data is thereby rendered redundant while maintenance of the order wherein the host updated the data is guaranteed, even when a disaster or a blockage occurs at the primary data center, the storage sub-systems of the remaining data centers need only transmit the differential data among themselves, so that the recovery of the remote copy operation can be quickly effected or the data loss can be minimized.
<Synchronization and Asynchronization>
First, copying through the synchronous transfer of data or the asynchronous remote copying is defined by referring to
During the copying process performed through the synchronous transfer of data, when a host 1 issues a data update (write) instruction to a storage sub-system 1, and when the data to be written are also those that are to be written to a storage sub-system 2 that is located in the vicinity, a data update end notification is transmitted to the host after the data has been updated (written), as instructed, relative to the storage sub-system. In this embodiment, the vicinity is a so-called metropolitan network included within a 100 km range.
Specifically, for the remote copying through the synchronous transfer of data (
When copying through the synchronous transfer of data is performed, macroscopically the data in the near sub-system 1 connected to the host 1 constantly matches the data stored in the farther distant storage sub-system 2 located in the vicinity. Thus, even the function of one of these storage sub-systems is lost due to a disaster, the complete state immediately before the disaster occurred is held by the other storage sub-system, and the processing can be quickly resumed by the remaining systems. The fact that the data are consistently matched macroscopically indicates that during the performance of the synchronous transfer function, the data may not be matched by the unit of the processing time of a controller or an electric circuit, but at the time whereat the data updating is completed, the data is always matched. This is because the storage sub-system 1 nearer the host 1 can not complete the updating process unless the updated data is reflected to the storage sub-system in the vicinity.
In the asynchronous remote copy process (
Thus, since the data updating is terminated within the processing time required by the nearer storage sub-system 1, the host 1 is not kept waiting longer than the transfer time or the storing process time due to the storage of data in the storage sub-system 2 sited at the remote location. The remote location is a point, further distant than the vicinity, in a so-called transcontinental network, wherefor data communication or transfer is enabled without any restriction on the distance.
More specifically, in the asynchronous remote copying process, the updated data block is received from the host 1 by the storage sub-system 1 (1), and the end of the writing of the updated data block is transmitted to the host 1 (2). Further, the storage sub-system 1 transmits the data, in accordance with its own schedule, to the storage sub-system 2 asynchronously with the process performed by the host 1.
Because of the complicated data transfer path to the remote location or to the vicinity and the en route bottleneck of the data transfer path, the order of the data that is being transmitted is not guaranteed (see an elliptical block indicated by a broken line in
Generally, in order to improve the data transfer performance, or in many cases, to increase the transfer speed, the data may be transmitted along multiple transfer paths by a transmission source. Further, when the destination is far distant, even when from the source only one transfer path is extended outward, the route taken to the destination is not always a single path because communication relay devices, such as a switch and a router, are located between the source and the destination. And when multiple paths are employed for the transmission of data, depending on the path taken, time differences may be generated since data may be transmitted along a fast path or a slow path, so that the order in which data arrives at the transfer destination does not always correspond with the order in which the data is transmitted by the source.
In an example enclosed by an ellipse in
In this embodiment, when the host 1 receives a data block and transmits it to the storage sub-system 2, the host 1 provides for the data sequence number information indicating the data updating order. Therefore, the storage sub-system 2 can sort the data based on the sequence number information, guarantee the order, and complete the storing of the data. After the process sequence required for the data transmission is completed, the data order is stored in the storage sub-system 2 situated at the remote location. As is described above, when the data process inherent to the asynchronous copying is continuously performed (asynchronous remote copying), the data updating order can be constantly guaranteed.
The asynchronous remote copying includes as a feature the extension of the distance between the storage sub-systems 1 and 2 without any deterioration in the performance of the host 1, and the consistent guarantee of the data order. Thus, when the user of the large area data storage system carries out his or her job, the matching of the databases or the journal file systems at a substantially arbitrary time can be obtained by the storage sub-system situated at a remote location.
<Large Area Data Storage System 1>
In
The data center 1 and the data center 2 are present in the vicinity, and can exchange data through synchronous transmission. The data center 1 and the data center 3 are relatively situated at remote locations, and can exchange data through an asynchronous remote copying technique.
In the normal operating form, the updated data that the data center 1 receives from the host is stored in the storage sub-system of the data center and employed. This updated data is synchronously transmitted to the storage sub-system of the data center situated in the vicinity through a fiber channel, a main frame interface, an ethernet LAN, a public line or the Internet or another dedicated line. That is, macroscopically, the data centers 1 and 2 constantly maintain the performance of data matching between the storage sub-systems.
In the normal operating form, the updated data that the data center 1 receives from the host is transmitted to the storage sub-system of the data center situated at a remote location, along the same dedicated line while using the asynchronous remote copying technique in the same manner as the synchronous transmission of data. It should be noted that the same line need not be employed for the data centers 1 and 2 and the data centers 1 and 3, and the data transfer paths between them.
There is a long distance between the data center 1 and the data center 3, and the non-matching of the order in which the updated data arrive occurs due to the transfer path between the data centers 1 and 3. Further, differential data that becomes non-reflected data at the transfer destination is present in the storage sub-system of the data center 1 at the transfer source. However, according to the asynchronous remote copy technique of the invention, since data received from the host is maintained in the order that is required for the recovery of the database and since the file system following the performance of the data process inherent to a predetermined asynchronous transfer of data is guaranteed, the order of the data for which non-matching occurs can be recovered. As a result, the order of the updated data received from the host is maintained between the storage sub-systems of the data center 1 and the data center 3.
In order to perform the recovery process, the communication line along which the data is transmitted is laid and prepared between the data center 2 and the data center 3, and the updated data from the host is not transmitted during the normal operation of the large area data storage system. Further, in order to cope with the occurrence of a disaster or a blockage at the data center 1, in the normal operation mode, an inquiry command for the data transfer process status is transmitted along the communication line from the data center 2 to the data center 3, or from the data center 3 to the data center 2. The communication lines that are laid and prepared are a fiber channel, a main frame interface, an ethernet LAN, a public line and an Internet or dedicated line.
During normal operation, to determine whether the updated data is received from the host by the asynchronous remote copying performed between the storage sub-systems 1 and 3, an inquiry is transmitted along the communication line between the data centers 2 and 3 using a “data transfer state inquiry command” issued by the storage sub-system 2.
The “data transfer state inquiry command” is activated in accordance with the schedule for the storage sub-system 2. At the timing whereat data is received from the storage sub-system 1 through synchronous transmission, this command may be issued or may be collectively issued at a predetermined time interval. The predetermined time interval may be, for example, 100 msec to 500 sec, and should be appropriate so that not too much time is spent in the management of a transfer state/bit map, which will be described later, and in the management of the differential data. Multiple bit maps may be examined upon the reception of one inquiry.
During normal operation, data is not directly exchanged by the storage sub-systems 2 and 3. Therefore, the storage sub-system 2 issues a “data transfer state inquiry command” to gain an understanding of the data updating statuses of the storage sub-systems 1 and 3.
When a blockage has occurred at the data center 1, the host of the data center 2 is employed to continue the current system operation (fail over of the host), and the differential data between the storage sub-systems 2 and 3 is transmitted by the data center 2 to the data center 3 along the communication line that is prepared to perform the recovery process. The immediate recovery of large area data storage system can be effected only by the transmission of the differential data. A fail over means a change from the primary system to the sub-system, and used also to be called a hot standby.
When the data center 2 thereafter performs the above described asynchronous remote copying for the data center 3 along the communication path, as the data center 1 has been recovered, through the synchronous transfer of data the recovery process is performed between the data center 2 and the data center 1 so that the large area data storage system existing before the blockage occurred can be recovered. It should be noted that the role of the data center 1 and the role of the data center 2 are exchanged before and after the blockage has occurred.
As is described above, the two data centers situated in the vicinity and the two data centers situated at the remote locations are unified to provide a total of three data centers, so that a large area data storage system connected by the remote copying technique can be provided. With this configuration, when a medium sized disaster or blockage has occurred, one of the data centers that are interconnected by the synchronous transfer of data can serve as a replacement for the other. Macroscopically, the data in the storage sub-systems of the two data centers are matched by the synchronous transfer of data, and the fail over can be immediately performed.
<Large Area Data Storage System 2>
Since the communication line between the data centers 2 and 3 in
In
With the above described configuration of the large area data storage system, even when a large disaster occurs, or blockages have continuously occurred in the two data centers situated in the vicinity, the fail over to the host of the data center 3 is performed, so that the data being processed by the system immediately before the disaster occurred can be continuously processed and the loss of data can be minimized.
That is, when a disaster large enough to destroy two data centers in the vicinity has occurred, the storage sub-system of the data center 3 or 5 situated at a remote location (
<Configuration of a Storage Sub-System>
These processes are implemented by the micro code of the controller of the storage sub-system. The updated data received from the host or another storage sub-system is temporarily stored in a cache 5 (
A controller 1 comprises a channel adaptor 3, for the exchange of data by a host and a remote copy destination; and a disk adaptor 9, for controlling a hard disk drive 7 in a disk device 2 along a disk interface (disk I/F) 8.
The channel adaptor 3 and the disk adaptor 9 each includes a microprocessor, and are connected to the cache memory 5 via a data transfer bus or control bus 11. The bus structure is only an example, and may, as needed be a cross-bar structure. Further, a plurality of controllers 1 may be provided to form a cluster structure, and a third common bus may be added to connect the controllers 1.
The cache memory 5 is used to store data that is to be exchanged with the host or with the remote copy destination. The control information, the configuration management information and the transfer state/bit map are stored in the control memory 6.
The remote copy function includes a transmission function and a reception function, and in this embodiment, the channel adaptors for receiving the I/O data from the host are separately mounted. The I/O data received from the host is temporarily stored in the cache 5. The transfer destination for the remote copying and the status management/bit map, which will be described later, are stored as control data in the control memory 6 and are controlled by the micro code.
The data stored in the cache 5 is written by the disk adaptor 9 to the hard disk drive 7 under RAID control. As a separate process, by using the micro code the data is transmitted to the remote copy destination that is defined in advance.
For example, the data received from the host is defined as the target for the succeeding remote copy process, data transmission by asynchronous transfer is defined, and the sequence number is provided for the data in the cache 5 in the order of the reception of data. The sequence number is also ID information indicating the data updating has been performed. The data is transmitted with the sequence number by the remote copy transmission function of the channel adaptor 3.
As another example, when the remote copying control is defined whereby the updated block received from the host is connected to multiple logical volumes, the data inside the cache memory 5 is processed for synchronous transfer and also for asynchronous transfer, and the resultant data, together with the sequence number, is transmitted by the channel adaptor 3 to the vicinity or to the remote location.
The example in
<Transfer State/Bit Map>
The transfer state/bit map is required for the paired logical volumes, and in this invention, at least two transfer states/bit maps can be obtained for one logical volume. In accordance with a pair of storage sub-systems and the definition of an assumption by the paired logical volumes, each bit map is employed to manage a difference with the logical volume of a partner. The block number in the bit map corresponds to a block that is the minimum unit for managing the update of the logical volume.
The host I/O need not be the same unit as the block number. The unit of the host I/O is normally 512 bytes, at the minimum, and an upper limit is also set; however, these are variable. The bit map is sightly smaller than 50 kB or around 700 kB; however, it can have various sizes ranging from 20 kB to 1000 kB. One bit map does not always correspond to one block of the host I/O data.
When the contents of the block corresponding to the block number are updated, differential management is conducted for all the data for the pertinent block number, and at the time of synchronization (resync), all the data for the block number is transmitted.
For each block number, the bit map is used as the unit for which the logical volume is updated. And “Update” information to be transmitted to another logical volume is waited for, so that only the updated block need be transmitted in order to reconstruct (re-synchronize) the pair of logical volumes used for remote copy. In other words, when the Update flag is On (1 in the embodiment in
The bit map has a further counter value whereat updates repeated multiple times are recorded using the same block number. The counter value is 0 for no update, or is 3 when the updating was repeated three times. When the size of a data block represented by a block number is larger than a data block updated by the host, the counter value is employed so that only the updated data can be transmitted to the logical volume partner.
A data copy monitoring function, which will be described later, compares the block number and the counter value that are stored in the “data transfer state inquiry command”, which will also be described later, with the block number and the counter value of the bit map for the storage sub-system at the inquiry destination. In this comparison, when the counter value stored in a specific storage sub-system is equal to or greater than the counter value included in the “data transfer state inquiry command”, that value is transmitted to the specific storage sub-system and the counter value of the bit map of the predetermined storage sub-system is decremented by one.
When the counter value held in the specific storage sub-system is smaller than the counter value included in the received “data transfer state inquiry command”, the counter value of the bit map of this storage sub-system is unchanged. Whether or not the counter value is decremented is transmitted in response to the “data transfer state inquiry command”.
When the counter value of the bit map of the storage sub-system is “equal to or greater than” the counter value included in the received “data transfer state inquiry command”, the data updating status indicates that the data have already been stored in or written to the pertinent storage sub-system by the normal remote copying function. When the counter value of the bit map is “less than” the counter value included in the “data transfer state inquiry command”, it means that data has not yet been received.
The counter value in
Once this permanent setup is performed (Over Flow in
The reason for the updating and the management using the counter value will now be supplementally explained.
When, for example, the bit map is to be managed in correlation with a track having a data capacity of about 50 kB, assume that three different portions of the data of 50 kB are updated at different times. The bit map is managed in correlation with the track because the recovery (re-synchronization) from a disaster or a blockage is performed by using the track unit.
When the bit map is not managed by using the counter value, only the Update flag is monitored. Even when it is determined at a specific time that the Update flag is 1, if at the following time the data is updated the second or the third time, the second and the following data updates are missed. Since a new concept for the counter value is introduced and the updating of the same data block using the command unit received from the host is precisely monitored, the above described inconvenience can be avoided.
An explanation will now be given for the definition of the transfer state/bit map function implemented inside the controller 1 in
1) The “normal pair state” is the state wherein the two overlapping volumes hold the same data while guaranteeing the data order.
2) The “transfer suppression bit map registration state” is the state wherein the data updating has not yet been registered in the bit map. It should be noted that the data has not yet been transferred to the paired volume.
3) The “copy state using a bit map” means the time whereat the “transfer suppression bit map registration state” is shifted to the “normal pair state”. This state corresponds to the initial state for double copying.
4) The “interrupted state” is the state wherein data can not be transmitted due to a blockage. This state is registered in the bit map.
5) The “no-pair bit map registration state” is a special state inherent to the present invention. This state occurs from the need whereat the data updating state must be monitored and held by the two volumes before a disaster or a blockage occurs.
6) The “no pair state” is the state wherein, while a bit map is prepared, the logical volumes have not yet been paired, and no information for data updating is registered.
The presence of the “no-pair bit map registration state” is the feature of the present invention. As the proxy for this state, the suspended state, “transfer suppression bit map registration state”, may be employed. The suspended state is the state wherein the state of updating data in the logical volume is managed only by the bit map, and the transfer using the remote copy is not performed.
In this embodiment, the “no-pair bit map registration state” is provided because the transfer state/bit map must be held by the pair (
In order to monitor the data held by the data center 3, the data update state of the data center 3 must be included in the transfer state/bit map that is provided in accordance with the logical volume of the storage sub-system of the data center 2. Further, in order to monitor the data held by the data center 2, the data update state of the data center 2 must be included in the transfer state/bit map that is provided in accordance with the logical volume of the storage sub-system of the data center 3.
In the large area data storage system in
The transfer state/bit map function is implemented by the micro code that carries out the above described control and a control table that is related to the bit map. The specific function is performed by the micro code, for example, of the micro processor 4 in
<Operation of a Large Area Data Storage System>
In
When a disaster or a blockage has occurred in the data center 1, the storage sub-system of the data center 2 transmits differential data to the data center 3 using asynchronous transfer, and the system operation performed between the data center 2 and the remote data center 3 can be immediately recovered.
In
For a synchronous transfer and an asynchronous transfer, the storage sub-systems 2 and 3 have the functions of transfer state/bit map #3 and #6. During normal operation, the functions #1 and #3, and #2 and #6, hold the “normal pair state”.
The functions of the transfer state/bit map #4 and #5 are provided for the storage sub-systems 2 and 3. When the large data storage system is normally operated, the functions of transfer state/bit map #4 and #5 hold the “no-pair bit map registration state”.
The function of transfer state/bit map #4 performs differential management relative to the logical volume of the storage sub-system 3, and the function of transfer state/bit map #5 performs differential management relative to the logical volume of the storage sub-system 2.
In a configuration extended from that in
<Data Copy Monitoring Function>
The data copy monitoring function will now be described. This function includes a bit map control function, a remote copy status management function, a configuration management function, a data transfer state inquiry command control function, and a remote copy data transfer instruction function.
The controller of the storage sub-system 2 in
The “data transfer state inquiry command” including the block number and the counter value is issued to the storage sub-system 3 by the storage sub-system 2. This command may be issued based on the synchronous transfer of data, or in accordance with the unique schedule of the storage sub-system 2.
The controller of the storage sub-system 3 receives the “data transfer state inquiry command” from the storage sub-system 2, and extracts the block number and the counter value for the transfer state/bit map, and compares them with the block number and the counter value for the transfer state/bit map #5 of the storage sub-system 3.
When the block number of the transfer state/bit map #5 indicates an Update flag of 1 (update), and the counter value is equal to or greater than the received counter value, it is assumed that the data concerning the synchronous transfer matches the data concerning the asynchronous remote copying, and the counter value is incremented by 1 based on the corresponding block number of the transfer state/bit map #6.
When the resultant counter value is “0”, the Update flag is set to “0”. And when the counter value is “Over Flow”, no further process is performed.
Furthermore, when the counter value registered at transfer state/bit map #5 is less than the counter value extracted from the inquiry command received from the storage sub-system 2, or when the Update flag is “0” (Off) and no update is performed, the updating to #5 is not performed, and this state is transmitted to the storage sub-system 2 as the response for the data transfer state inquiry command.
When the transfer state/bit map function #5 decrements the counter value of the transfer state/bit map function #6, this means that the data block that has been transmitted by the storage sub-system 1 to the storage sub-system 2 using a synchronous transfer has also been transmitted by the storage sub-system 1 to the storage sub-system 3 using an asynchronous transfer.
The data copy monitoring function employs the response results to control the transfer state/bit map function of the storage sub-system 2. When the storage sub-system 3 transmits a response indicating that the block number and the counter value included in the “data transfer state inquiry command” have already been registered (i.e., when the counter value can be decremented), similarly, the controller of the storage sub-system 2 employs the transfer state/bit map function to decrement the counter value and to set the Update flag.
When the response to the command indicates that the data has not yet been registered, it is assumed that the asynchronous transfer by the storage sub-system 1 to the storage sub-system 3 is incomplete, and transfer state/bit map function #4 of the storage sub-system 2 holds the updated state in its own bit map. This state is referred to when only the updated differential portion is re-synchronized later.
At this time, when a critical blockage has occurred in the storage sub-system 1 and when the remote copying configuration must be reconstructed (re-synchronized) between the storage sub-systems 2 and 3, only the non-transmitted data, i.e., only the differential data block, need be transmitted by the storage sub-system 2 to the storage sub-system 3 by referring to the bit map. As a result, a “normal pair” can be immediately constructed merely by the transfer of the differential data. The function for implementing this process is called the “data copy monitoring function”.
<Difference Management Method 1 Performed Between Storage Sub-Systems that in a Normal Operation Do Not Directly Exchange Data>
When a blockage has occurred in the storage sub-system 2 of the large area storage system in
The controller 1 (
The position information of a block to be transmitted is stored, as update information for the logical volume of the storage sub-system 3, in the bit map present in the controller 1 of the storage sub-system 1. At this time, when the block already transmitted has been updated by the storage sub-system 3, the counter value of the bit map is incremented by one.
When the controller 1 of the storage sub-system 1 has completed the synchronous transfer to the controller 1 of the storage sub-system 2, the controller of the storage sub-system 1 issues an acknowledgement command along the communication line connecting the storage sub-systems 1 and 3 in order to ask whether the data block has been synchronously transmitted via the controller 1 of the storage sub-system 2 to the controller 1 of the storage sub-system 3.
The acknowledgement command includes, for the updated data received from the host, the block number and the counter value of the data block for the storage sub-system. Upon receiving the acknowledgement command, the controller 1 of the storage sub-system 3 determines whether the data block received along the controller 1 of the storage sub-system 2 matches the block for which the acknowledgement command inquiry was issued.
The controller 1 of the storage sub-system 3 includes not only the transfer state/bit map function relative to the logical volume of the controller 1 of the storage sub-system 2, but also a state management/bit map function relative to the logical volume of the controller 1 of the storage sub-system 1.
When the controller 1 of the storage sub-system 3 receives data from the controller 1 of the storage sub-system 2, the controller 1 of the storage sub-system 3 registers the state of the controller 1 of the storage sub-system 1 in the transfer state/bit map held in the storage sub-system 3. This bit map includes update information relative to the block position associated with the address in the logical volume, and also includes the counter value in order to manage the updating of the same block multiple times.
The block number and the counter value registered in the transfer state/bit map of the controller 1 of the storage sub-system 3 are compared with those included in the acknowledgement command issued by the controller 1 of the storage sub-system 1. When the block numbers and counter values are matched, or the registered counter value is equal to or greater than the counter value of the acknowledgement command, it is ascertained that the arrival of the data has been normally completed, and the counter value of the bit map is decremented by one using the transfer state/bit map function.
When the results received from the controller 1 of the storage sub-system 3 indicate that the data block has arrived at the storage sub-system 3 via the storage sub-system 2, the controller 1 of the storage sub-system 1, as well as the controller 1 of the storage sub-system 3, decrements the counter value by one using the transfer state/bit map function.
Since the bit map is monitored and managed in the above described manner, even when a critical blockage, such as a disaster, has occurred in the storage sub-system 2 and data can not be exchanged by neither a synchronous nor an asynchronous transfer, the asynchronous remote copy configuration can be constructed by the storage sub-system 1 to which the host issues the I/O data and the storage sub-system 3 that stores the data contents of the storage sub-system 2 using the asynchronous remote copying.
At this time, since the transfer state/bit map functions of the controllers of the storage sub-systems 1 and 3 can be employed to transmit only the differential data block without copying all the logical volume data, the asynchronous remote copying configuration can be immediately constructed.
<Difference Management Method 2 Performed Between Storage Sub-Systems that in a Normal Operation Do Not Directly Exchange Data>
In the large area data storage system in
When a blockage has occurred in the controller 1 of the storage sub-system 1, and neither the copying using a synchronous transfer nor the asynchronous remote copying can be continued any longer, first, the controllers 1 of the storage sub-systems 2 and 3 copy the differential data to match the two data sets. Then, the asynchronous remote copying is established between the storage sub-systems 2 and 3.
The controller 1 of the storage sub-system 1, which has received from the host data to be updated, uses a synchronous transfer to transmit a data block to the controller 1 of the storage sub-system 2. Upon receiving the data block, the controller 1 of the storage sub-system 2 stores the position information (block number) of the received data block in its own transfer state/bit map in order to compare the received data with the management information for the logical volume dominated by the controller 1 of the storage sub-system 3. The transfer state/bit map function increments the counter value by one when the received data block is updated, and the data block updating performed multiple times can be recorded.
After the controller 1 of the storage sub-system 2 has registered predetermined management information in the transfer state/bit map, along the data transfer path connecting the controller 1 of the storage sub-system 2 to the controller 1 of the storage sub-system 3, the controller 1 of the storage sub-system 2 issues, to the controller 1 of the storage sub-system 3, an acknowledgement command asking whether the data block has arrived at the storage sub-system 3.
The acknowledgement command includes a block number, which is position information for a data block that the controller 1 of the storage sub-system 2 has received from the storage sub-system 1 through the synchronous transfer, and a counter value, which indicates the times at which the data block was updated.
The controller 1 of the storage sub-system 3 employs its own transfer state/bit map function to store, in the bit map, the position information (block number) and the counter value of the data block that is received from the controller 1 of the storage sub-system 1 by using the asynchronous remote copying technique, so that the block number and the counter value can be compared with the management information of the logical volume dominated by the controller 1 of the storage sub-system 2. Then, the controller 1 of the storage sub-system 3 compares the values in the bit map with the corresponding values included in the acknowledgement command.
The block number and the counter value, which are included in the acknowledgement command issued by the storage sub-system 2 to the storage sub-system 3, are compared with the management information, which the controller 1 of the storage sub-system 3 holds for the logical volume dominated by the controller 1 of the storage sub-system 2. When the counter value is equal to or greater than that included in the acknowledgement command, the counter value of the data block is decremented by one using the transfer state/bit map function.
When the decremented counter value reaches 0, it is assumed that there is no differential data between the storage sub-systems 2 and 3, and the counter value is erased from the bit map. When the comparison results are not matched, the controller 1 of the storage sub-system 3 does not operate the counter value of the bit map.
The controller 1 of the storage sub-system 3 transmits the determination results to the controller 1 of the storage sub-system 2 as a response to the acknowledgement command. When the controller 1 of the storage sub-system 2 refers to these results and decrements the counter value, it is ascertained that between the storage sub-systems 2 and 3 the same data block has been normally updated.
When a data block to be updated is not received by the storage sub-system 3, it is assumed that the data block to be updated is stored only in the storage sub-system 2. The controller 1 of the storage sub-system 2 stores this data block by using its own transfer state/bit map function.
When the controller 1 of the storage sub-system 2 receives from the controller 1 of the storage sub-system 3 a response relative to the acknowledgement command, and when the data block to be updated has not yet been transmitted to the storage sub-system 3, the counter value in the transfer state/bit map that is held by the controller 1 of the storage sub-system 2 and that corresponds to the updated state of the logical volume of the storage sub-system 3 is not decremented. This indicates that the data block for updating the bit map is differential data between the storage sub-systems 2 and 3.
When the data has arrived, the counter value of the data block for updating the transfer state/bit map is decremented by one. And when the counter value reaches 0, the storage sub-systems 2 and 3 assume that the data block concerning the updating is the same and there is no non-matching data, and do not regard the data block as the target for the copying of differential data.
As is described above, during a normal operation, since the controllers of the storage sub-systems that do not directly exchange data manage the differential data between the logical volumes while assuming a recovery from a disaster or a blockage is effected. Thus, the differential data need only be copied between the storage sub-systems, and non-matching data can be removed quickly.
<Operation of a System After Fail Over>
While referring to
According to the present invention, the differential copy need only be copied between the logical volumes (the storage sub-systems 1 and 3) that do not directly relate to the data transfer, a remote copy pair can be immediately generated, and the remote copy operation can be resumed.
If the present invention is not applied, in the configuration in
The data copy monitoring function of the configuration in
The data transfer state inquiry command is issued by the storage sub-system 1 to the storage sub-system 3. The data copy monitoring function differs partially from that in
The storage sub-system 1 issues an inquiry to the storage sub-system 3 to determine whether the same data as the data (track) the storage sub-system 1 received from the host has been transmitted to the storage sub-system 3. When the data has not yet been received, the bit map for the transfer state/bit map #1 of the storage sub-system 1 is maintained unchanged. If the data has arrived, i.e., if the block number and the counter value of the bit map of the transfer state/bit map function #3 are the same, the Update flag and the bit map for the transfer state/bit map function #1 are deleted.
<Other Process for Re-Synchronization>
When an error or a defect occurs in the response to the “data transfer state inquiry command” detected by the data copy monitoring function, or when a defect occurs in the transfer state/bit map function, the difference management is inhibited, which concerns the recovery process that should be performed upon the occurrence of a blockage or a disaster.
For the transfer state/bit map function, the bit map includes a storage area for an finite counter value. When the same data block is updated over the finite value (overflow), even if the redundancy is maintained later by the two or more storage sub-systems, the data block is always regarded as the update target when the re-synchronization process or the difference copy process is performed after a blockage or a disaster has occurred.
In the normal operation, when a response is not issued for a predetermined period of time relative to an inquiry (acknowledge command) that is exchanged among the storage sub-systems that do not directly transmit data, it is assumed that the time has expired and the re-synchronization process is inhibited, without performing the reconstruction of a pair of logical volumes using asynchronous remote copying, or the transmission of only differential data. This is because, since the data updated state of the logical volume to be paired can not be obtained, it is not appropriate to perform the reconstruction of the pair of logical volumes.
<Management of Matching of Data through an Asynchronous Transfer>
Assume that the storage sub-systems 1 and 2 connected to the host are operated using asynchronous transfers whereby the data is copied from the storage sub-system 1 to the storage sub-system 2. In this case, when the data writing order for the storage sub-system 1 differs from the data writing order for the storage sub-system 2, the matching of the data for the storage sub-systems 1 and 2 is not guaranteed. The arrangement for avoiding the non-matching of data will now be described.
First, blocks of predetermined size (e.g., 16 K bytes) are defined in the storage area of the resource for each of the storage sub-systems 1 and 2, and unique block numbers are allocated to the blocks. Then, for each block for which the host has written data, the correlation of the block number and the sequence number provided in the data writing order is entered in the control memory 6. For example, when as is shown in
For an asynchronous transfer from the storage sub-system 1 to the storage sub-system 2, as is shown in the transfer data format in
As is described above, the data is written to the storage resource of the storage sub-system 2 in the order whereat the host has written the data to the storage resource of the storage sub-system 1, so that the matching of the data in the storage sub-systems 1 and 2 can be guaranteed.
Another example is disclosed in co-pending and co-owned U.S. application Ser. No. 10/892,958, filed Jul. 16, 2004, incorporated herein by reference for all purposes. U.S. application Ser. No. 10/892,958 describes a two-data center configuration which uses an asynchronous remote copy function. A disk subsystem assures the sequence and the coherence of data updating with two or more disk subsystems is provided with an asynchronous remote copy function. A first data center includes a computer system having the configuration of slave subsystems connected to a master disk subsystem. The first data center secures coherence between data in the first data center and data in a second data center remote from the first data by repeatedly suspending and releasing suspension of the remote copy operation of the master and slave disk subsystems at predetermined opportunities. This example is explained with reference to the architecture shown in
In the main center 9, a host unit 1, equipped with a central processing unit (CPU), that executes data processing is connected with a disk subsystem 3-1 (a master subsystem) and disk subsystems 3-2 . . . 3-n (slave subsystems) through an interface cable 2 that provides a transmission path. The master disk subsystem 3-1 is connected with a disk subsystem 7-1 of a remote center 10 through an interface cable 4-1. The slave disk subsystem 3-2 is connected with a disk subsystem 7-2 of a remote center through an interface cable 4-2 and the slave disk subsystem 3-n is connected with a disk subsystem 7-n of the remote center through an interface cable 4-n similarly. The interface cables 4-1, 4-2, . . . 4-n can be general communication lines using a network connection unit. In this embodiment, these are described as interface cables 4-1, 4-2 . . . 4-n.
When there are two or more disk subsystems 3, the disk subsystem 3-1 is connected with a disk subsystems 3-2, . . . 3-n (one other than disk subsystem 3-1) that stores the data that are the object of the remote copy inside a main center 9 through the interface cable 5. Thus, on the side at the main center 9, the disk subsystem 3 that stores the data that is the object of the remote copy comprises a master disk subsystem 3-1 and other slave disk subsystems 3-2, . . . 3-n connected by the interface cable 5.
When the host unit 1 issues data write request to the master disk subsystem 3-1, the master disk subsystem 3-1 writes the corresponding data synchronously in a data buffer of its own subsystem and instructs the data to be written to the disk subsystem 7-1, located at a distant place, asynchronously of the writing of data in the data buffer. Corresponding data written in the data buffer 12 of its own subsystem is recorded in the magnetic disk drive 13 synchronously or asynchronously.
Because the remote copy method writes data to the distant place asynchronously, there is a mode that the disk subsystem 3 at the main center 9 transmits the update data to the disk subsystem 7 of the remote center 10, to which the subsystem is connected, according to the sequence of the update of the volume inside the own subsystem. The disk subsystem 7 in the remote center 10 stores the updated data in the volume inside own subsystem according to the sequence of the receipt. There is also a mode that the main center 9 transmits the data that is the object of transfer by a lot at the opportunity optimally scheduled by the disk subsystem 3, not depending on the sequence of the update of the volumes in the own subsystem, and the disk subsystem 7 of the remote center 10 reflects the updated data to the volumes inside its own subsystem regardless of the sequence of the receipt.
When the host unit 1 issues a data write request to the disk subsystems 32, . . . 3-n, the slave disk subsystems 3-2, . . . 3-n write the corresponding data synchronously in the data buffers 12 inside their own subsystems and then refer to the state of the remote copy control information storing part 16 of the master disk subsystem 3-1. The slave disk subsystem judges whether to instruct the data write to the disk subsystem 7-2, . . . 7-n asynchronously of the writing of data in the data buffer 12 inside its own subsystem or to store the information regarding to the storing position of the update data in the remote copy control information storing part 16 inside its own subsystem depending on the state of the remote copy. The disk subsystems 7-1, 72, . . . 7-n are connected with the disk subsystems 3-1, 3-2, . . . 3-n through the interface cable 4 and store the data received from the disk subsystems 3-1, 3-2, . . . 3n in the data buffers 12 inside their own subsystems. That is, it shows the system configuration that when host unit 1 issues write data instruction to one or more of the disk subsystems 3-1, 3-2, . . . 3-n, the same data are stored in one or more of the disk subsystems 7-1, 72, . . . 7n in the remote center 10 depending on the state of the disk subsystem 3-1. Arrows on
Because the master disk subsystem 3-1 has control bits that indicate the state of the remote copy, a system operator can suspend the remote copy state temporarily by altering the control bits at a predetermined opportunity or at any time by instruction of the system operator. When the remote copy is temporarily suspended, the disk subsystems 3-1, 3-2, . . . 3-n store the update data in the data buffer of their own disk subsystems, retain the information of the address of the update data regarding to the write instruction received after the start of the temporary suspension of the remote copy in the remote copy control information storing part 16, and do not issue, but suspend the write instruction of the update data to the disk subsystems 7-1, 7-2, . . . 7-n.
With the present invention, therefore, the data on the side at the main center 9 at the moment of the temporary suspension of the remote copy reside in all subsystems on the side of the remote center 10. That is, the coherence between the data on the side at the main center 9 and the data on the side of the remote center 10 at the time of temporary suspension can be secured. Therefore, the necessity of adding time stamps to the data for securing coherence is eliminated and the remote copy is realized without the intervention of a host unit, even in an open system where time information is not attached from a host unit. The disk subsystems 3-1, 3-2, . . . 3-n can release the temporary suspension of the remote copy based on an instruction sent to the master disk subsystem 3-1 by the system operator or an instruction by the system operator at any time.
When the temporary suspension of the remote copy is released, the disk subsystems 3-1, 3-2, . . . 3-n issue the write instruction of the data that is updated during the temporary suspension to the disk subsystems 7-1, 7-2, . . . 7-n. If the data write request is issued from the host unit 1 to the disk subsystems 3-1, 3-2, . . . 3-n, the disk subsystem 3-1, 3-2, . . . 3-n write the corresponding data synchronously to the data buffer 12 inside their own subsystem, and further, instruct the data write to the disk subsystems 7-1, 7-2, .. .7-n asynchronously of the data write to the internal data buffer.
With such a configuration, the same data are held within both the volumes of the disk subsystem 3 that is the object of remote copy inside the main center 9 and the volumes of the disk subsystem 7 inside the remote center 10 (if the delay of the update timing can be ignored). During the temporary suspension of the remote copy state with the master subsystem 3-1, the state of the data of each disk subsystem 3 at the main center 9 at the time the master subsystem 3-1 is temporarily suspended, that is, the state of the data secured with coherence at the corresponding time point, are assured and sustained by each disk subsystem 7 of the remote center 10.
Temporary suspension of remote copy or release of the temporary suspension of remote copy can be set at the unit of a volume pair. If two or more volume pairs are set as a volume group, changing the state with the unit of a volume group is enabled. By displaying temporary suspension or release of temporary suspension on a console of any subsystems 3 or 7 or the console of host unit 1 or 8, or on a monitor that is used for the control of these systems, a user can recognize whether remote copy is currently executed or not and with what unit the remote copy is executed. The user can arbitrarily set the interval between the temporary suspension and the release of the temporary suspension as long as the interval is not too short for new data copied to the side of remote center 10 at the release of the temporary suspension before all of the data before the temporary suspension can be copied to the side of the remote center 10, and the coherence between the data on the side at the main center 9 and on the side of the remote center 10 can be maintained.
As an example of copying time consider the storing of the data of the subsystem 7 at the moment of temporary suspension of the remote copy inside the remote center 10, a cycle of the execution of the remote copy for 30 minutes from the main center 9 to the remote center 10, the temporary suspension for 30 minutes and then, the execution of the remote copy for 30 minutes after the release of the temporary suspension. The time of the temporary suspension can be changed in conjunction with the copying time in case the copying time inside the remote center 10 is not 30 minutes, and the interval between the temporary suspension and the release of the temporary suspension can be set without retaining the copying time.
In this example, all of the volumes of the disk subsystem 3 are the object of the remote copy. Therefore, the following describes the state of the remote copy with the unit of the disk subsystem 3, but not with the unit of a volume pair or a volume group. This detail is not described in this example, but the volume groups are set separately for a database and a log file. Therefore, there can be a definition that the opportunity for temporary suspension or the opportunity for release of the temporary suspension of the remote copy is not set up for the volume that store the log file.
Because of the method of setting files and volumes that are the object of the remote copy, there is a method of assigning specific addresses that implies volumes or disk subsystems, or a method of selecting an address within the arbitrary range of the addresses by the control program inside the disk subsystem. As the initial setting, the example of a path setting, a pair setting, and setting the temporary suspension and release of the temporary suspension are described.
When the host unit 1 issues a data write request (a write command, hereafter) to the disk subsystem 3-1 (step 2), the disk subsystem 3-1 executes data storage processing in the own disk subsystem based on the write command (step 3). After completing data write (storage) processing in the own disk subsystem, the disk subsystem 3-1 reports the completion of the write command to the host unit 1 (step 4).
When the host unit 1 issues write commands to the disk subsystems 32 . . . 3-n (step 2), the disk subsystems 3-2 . . . 3-n execute the data storage processing in their own disk subsystems based on the write commands (step 5). Here, the write command is the command for the data writing instruction and the transfer of the write data itself, and the user sets the disk subsystem to which the request is issued in advance to the host unit 1 (step 1).
When the disk subsystem 3-1 receives the write command, the disk subsystem 3-1 refers to the control bits inside the remote copy control information storage part 16 that indicate the state of the remote copy of the own subsystem and judges the state of the remote copy of the own subsystem (step 6 of
If its own subsystem retains the storage position information of the data that are updated during the temporary suspension state of the remote copy, the disk subsystem 3-1 judges the data of the corresponding position to be the object of transmission to the disk subsystem 7-1 of the remote center 10, issues a write command to write corresponding data, and after completion of processing the write command, erases the update position information. When the write command is received, the disk subsystems 3-2 . . . 3-n issue a command to the disk subsystem 3-1 through the interface cable 5 to inquire about the state of the disk subsystem 3-1, and by obtaining and referring to the control bits that indicate the state of the remote copy of the disk subsystem 3-1 (step 9), confirm whether the disk subsystem 3-1 is in temporary suspension state of the remote copy (step 10).
If the disk subsystem 3-1 is in the temporary suspension state of the remote copy, the disk subsystems 3-2 . . . 3-n retain the information related to the storage position of the updated data inside their own subsystems (step 12) and report the completion of the processing of the write command to the host unit 1 (step 13). If the disk subsystem 3-1 is not in the temporary suspension state, the disk subsystems 32 . . . 3n report the completion of the processing of the write command to the host unit 1 (step 14) and issue the write command to the disk subsystems 7-2 . . . 7-n at the opportunity decided depending on the processing capability of the own subsystems. If the disk subsystems 3-2 . . . 3-n retain the storage position information of the data updated during the temporary suspension state of the remote copy, the disk subsystems 3-2 . . . 3-n judge the data of the corresponding position to be the object of transmission to the disk subsystems 7-2 . . . 7-n of the remote center, issue a write command to write the corresponding data (step 15), and erase the update position information after the processing of the write command is completed.
That is, when the disk subsystem 3-1 is in the temporary suspension state of the remote copy, the other disk subsystems at the main center 9 that are connected with the disk subsystem 3-1 turn into the temporary suspension state of the remote copy due to the issuing of a write command from the host unit 1. When the disk subsystem 3-1 is not in the temporary suspension state of the remote copy, the other disk subsystems at the main center 9 that are connected with the disk subsystem 3-1 execute the remote copy due to the issuing of a write command from the host unit 1.
When the remote copy state of the disk subsystem 3-1 is changed, the disk subsystems 3-2 . . . 3-n inform the disk subsystem 3-1 (step 9′, not shown) or as above mentioned, when the remote copy state of the disk subsystem 3-1 itself is changed, the disk subsystem 3-1 inform the disk subsystems 3-2 . . . 3-n is possible, instead of the inquiring from the disk subsystems 3-2 . . . 3-n to the disk subsystem 3-1 (step 9).
In case of the above mentioned settings, the disk subsystems 3-2 . . . 3-n need to retain the state of the remote copy of themselves as with the disk subsystem 3-1. That is, when the disk subsystem 3-1 is in the temporary suspension state of the remote copy at step 10, the disk subsystems 3-2 . . . 3-n change the state of remote copy of their own disk subsystems to the temporary suspension state of the remote copy (step 11, not shown). When the disk subsystem 3-1 is not in the temporary suspension state of the remote copy at step 10, the disk subsystems 3-2 . . . 3-n change the state of the remote copy of the own disk subsystems to the released temporary suspension state of the remote copy (step 11′, not shown). When the indication of the remote copy state of the disk subsystems is desired, the disk subsystems 3-2 . . . 3-n inquire to the disk subsystem 3-1 as step 9, the state of the remote copy can be retained in the own disk subsystems and-step 11 and step 11′ can be arranged.
When the disk subsystem 7-1, 7-2, . . . 7-n confirm the receipt of a write command issued from the disk subsystems 3-1, 3-2, . . . 3-n, the disk subsystems 7-1, 72, . . . 7-n execute the processing of the write command, that is, the data storage processing into the data buffer 12 inside the own subsystems (step 16). After completing processing of write command, that is the storage of the data in the data buffer 12 of its own subsystems, the disk subsystems 7-1, 7-2, . . . 7-n report the completion of the processing of write command to the disk subsystems 3-1, 3-2, . . . 3-n (step 17).
When the temporary suspension state is released, the disk subsystems 3-1, 3-2, . . . 3-n issue the write instruction of the data of the corresponding position based on the storage position information of the data that are updated after the time the remote copy of its own subsystems were temporarily suspended to the disk subsystems 7-1, 72, . . . 7-n. When the host unit 1 issues the write request of data to the disk subsystems 3-1, 3-2, . . . 3-n, the disk subsystems 3-1, 3-2, . . . 3-n, write the corresponding data to the data buffers 12 of their own subsystems synchronously with the write request, and instruct the data write to the disk subsystems 7-1, 7-2, . . . 7-n that are remotely located, asynchronously of writing data to the data buffers 12 inside its own subsystems.
<Multi-Hop Method>
A large area data storage system in
The storage sub-systems 1 and 2 are employed for synchronous transfers whereby the data is copied from the storage sub-system 1 to the storage sub-system 2. Further, the storage sub-systems 2 and 3 are employed for asynchronous transfers whereby the data is copied from the storage sub-system 2 to the storage sub-system 3. The remote copy method in this form is thereafter called a “multi-hop method”. It should be noted that with the multi-hop method either synchronous transfers or asynchronous transfers are arbitrarily set for communication among the storage sub-systems. Further, another transfer method may be employed.
While referring to
The storage sub-system 1 receives, from the host, target data to be written and a writing request (Write I/O) (S121). Then, the storage sub-system 1 writes the target data in the logical volume (first storage resource), provides a sequence number in the order whereat the data writing process was performed, and stores the sequence number (in a predetermined table) in correlation with the write position information that specifies the storage location in the logical volume (first storage resource) whereat the target data is written (S122). It should be noted that the write position information is represented using a sector number or a track number.
The storage sub-system 1 transmits, to the storage sub-system 2, the target data and the sequence number provided (S123). The transmission of the data and the sequence number is performed between the storage sub-systems after the data transmission command has been issued, and as needed, the data write position information is provided for the data transmission command.
The storage sub-system 2 receives, from the storage sub-system 1, the target data to be written and the sequence number, and writes them to its own logical volume (second storage resource). When the writing is completed, the storage sub-system 2 transmits a complete notification to the storage sub-system 1.
The storage sub-system 2 transmits the target data and the sequence number to the storage sub-system 3 at an appropriate timing (S124). (In
The storage sub-system 3 receives the data and the sequence number, and transmits, to the storage sub-system 1, the sequence number that is issued in correlation with the target data to be written (S125). The storage sub-system 1 receives the sequence number from the storage sub-system 3.
The storage sub-system 1 examines the received sequence number and the correlation (table) between the stored sequence number and the corresponding write position information. Thus, the data not reflected to the logical volume (third storage resource) in the storage sub-system 3, i.e., the differential data, can be obtained. The examination is performed by deleting, from the table, the write position information and the sequence numbers up to the write complete position that is received from the storage sub-system 3 (S126).
An explanation will now be given for the recovery process when the storage sub-system 2 is halted due to a disaster.
As is shown in
When the storage sub-system 1 detects the occurrence of a blockage in the storage sub-system 2 (S131), first, the storage sub-system 1 generates a bit map in correlation with the data storage location for a predetermined block unit in the logical volume (first storage resource) of the system 1. Then, based on the correlation between the sequence number and the write location information, both of which are stored in the storage sub-system 1 as is the differential data that is not reflected to the storage sub-system 3, the storage sub-system 1 renders ON a bit at the location corresponding to the bit map for which the data is updated (S132).
Then, the differential data that is stored at the ON location in the bit map of the logical volume of the storage sub-system 1 is copied from the storage sub-system 1 to the corresponding storage location in the storage sub-system 3 (S133). When the copying is completed, the temporary operation is initiated in the form of copying the differential data from the storage sub-system 2 using asynchronous transfers (S134).
To change the operation to the temporary operation, when a blockage has occurred in the storage sub-system 2, not all the data need be copied from the storage sub-system 1 to the storage sub-system 3, and only the differential data need be copied. Therefore, when a satisfactory amount of data is not transmitted along the communication line between the storage sub-systems 1 and 3, the data stored in the logical volumes of the storage sub-systems can be easily synchronized.
Now, an explanation will be given for the process sequence performed when the storage sub-system 2 is recovered and the temporary operation is changed to the normal operation.
First, the storage sub-system 1 copies, to the logical volume (second storage resource) of the storage sub-system 2, all the data stored in the logical volume (first storage resource) of the storage sub-system 1, and initiates the operation using synchronous transfers whereby data is copied from the storage sub-system 1 to the storage sub-system 2. Specifically, when data is written to the logical volume (first storage resource) upon receiving an instruction from the host, the storage sub-system 1 transmits the written data and the sequence number to the storage sub-system 2.
The storage sub-system 2 writes, to the logical volume thereof (second storage resource), the data and the sequence number that are received from the storage sub-system 1. When the writing process is completed, the storage sub-system 2 stores (in a predetermined table) the write location information, which specifies the location in the logical volume (second storage resource) wherein data has been written, together with the sequence number provided in the data writing order. The data transfer state at this time is shown in
Next, when the storage sub-system 3 receives the data and the sequence number from the storage sub-system 1, the storage sub-system 3 stores the data in the logical volume thereof (third storage resource) (
The storage sub-system 2 receives the sequence number from the storage sub-system 3. At this time, the storage sub-system 2 examines the received sequence number and the correlation between the stored sequence number and the corresponding write position information, so that data not reflected to the logical volume of the storage sub-system 3, i.e., the differential data, can be obtained.
Then, in the temporary operation, the asynchronous transfer process for copying the data from the storage sub-system 1 to the storage sub-system 32 is halted. After this process is halted, the storage sub-system 2 generates, in the control memory thereof, a bit map that corresponds to the data storage location for a predetermined block unit of the logical volume (second storage resource). Then, based on the correlation stored in the storage sub-system 2 between the write position information and the sequence number for the differential data that is not reflected to the storage sub-system 3, the storage sub-system 2 renders ON a bit at the pertinent location of the bit map for which the data has been updated.
In addition, the storage sub-system 2 transmits, to the storage sub-system 3, the differential data, which is not reflected to the logical volume (third storage resource) of the storage sub-system 3, and the write position information, both of which are obtained from the bit map.
The storage sub-system 3 receives the differential data and the write position information, and writes the differential data to the data storage location that is designated in the logical volume (third storage resource) by using the write position information. Thus, synchronization can be obtained between the contents of the logical volume (second storage resource) of the storage sub-system 2 and the contents of the logical volume (third storage resource) of the storage sub-system. After the above described process is terminated, the asynchronous transfer operation is resumed by the storage sub-systems 2 and 3 in the normal state in
The shifting from the temporary operation to the normal operation is completed in this manner.
<Multi-Copy Method>
A large area data storage system in
The storage sub-systems 1 and 2 are operated using synchronous transfers during which the data is copied from the storage sub-system 2 to the storage sub-system 1. The storage sub-systems 2 and 3 are operated using asynchronous transfers during which the data is copied from the storage sub-system 2 to the storage sub-system 3. Hereinafter, the remote copy method having this form is called a “multi-copy” method. It should be noted that either synchronous transfers or asynchronous transfers are arbitrarily set for the communication among the storage sub-systems when the multi-copy method is used. A transfer method other than the synchronous and the asynchronous transfer methods may be employed.
The data difference management method of the embodiment will now be described while referring to
The storage sub-system 1 receives the target data and the sequence number from the storage sub-system 2, and writes the target data to the logical volume thereof (first storage resource). At this time, the sequence number is stored (in a predetermined table) in correlation with the write position information that specifies the storage location in the logical volume (first storage resource) in which the data has been written (S163). The write position information is represented using, for example, a sector number or a track number.
Next, the storage sub-system 3 receives the target data and the sequence number from the storage sub-system 2, and writes the target data to the logical volume thereof (third storage resource). When the writing is completed, the storage sub-system 3 transmits, to the storage sub-system 1, the target data to be written and the sequence number that is paired with this data (S165). Thus, the storage sub-system 1 receives the sequence number from the storage sub-system 3.
The storage sub-system 1 examines the received sequence and the correlation of the stored sequence number, and the corresponding write position information, so that the data not reflected to the logical volume (third storage resource) of the storage sub-system 3, i.e., the differential data, can be obtained. This examination is performed, for example, by deleting from the table the sequence numbers up to the write-end position and the write position information that are received from the storage sub-system 3 (S166).
The normal operation using the multi-copy method is performed in the above described manner.
An explanation will now be given for the recovery process performed when the storage sub-system 2 is halted due to a disaster.
As is shown in
When the storage sub-system 1 has detected the occurrence of a blockage in the storage sub-system 2 (S171), upon, for example, an operator's instruction, the operation performed by the host connected to the storage sub-system 2 is transferred to the sub-host connected to the storage sub-system 1.
Then, the storage sub-system 1 generates, in the control memory 6, a bit map that corresponds to the data storage location for a predetermined block unit of the logical volume (first storage resource) for the storage sub-system 1. And, based on the correlation between the sequence number and the updated data position information, both of which are stored in the storage sub-system 1 as differential data that is not reflected to the storage sub-system 3, the storage sub-system 1 renders ON the bit at the pertinent position of the bit map for which the data has been updated (S172).
Further, the differential data, which is stored in the logical volume of the storage sub-system 1 at the position corresponding to the position in the bit map where the bit has been rendered ON, is copied from the storage sub-system 1 to the storage sub-system 3 (S173). When the copying is completed, the temporary operation is initiated in the form where the data is copied from the storage sub-system 1 using a synchronous transfer (S174).
To change to the temporary operation, even when a blockage has occurred in the storage sub-system 2, not all the data in the storage sub-system 1 need be copied to the storage sub-system 3, only the differential data. Therefore, even when a satisfactory amount of data is not transmitted along the communication line between the storage sub-systems 1 and 3, the data stored in the logical volumes of the storage sub-systems can be easily synchronized.
An explanation will now be given for the process sequence performed when the storage sub-system 2 is recovered from the blockage and the temporary operation is changed to the normal operation.
First, the storage sub-system 1 copies all the data stored in its logical volume (first storage resource) to the logical volume (second storage resource) of the storage sub-system 2, and the operation is initiated using synchronous transfers wherein data is copied from the storage sub-system 1 to the storage sub-system 2. At this time, the asynchronous transfers between the storage sub-systems 1 and 3 are also continued. The storage sub-system 1 transmits, to the storage sub-system 2, the data written by the host and the sequence number provided in the data writing order. The storage sub-system 1 also transmits to the storage sub-system 3 the written data and the sequence number that were provided. The storage sub-system 2 stores the correlation between the write position information, which specifies the position of its logical volume (second storage resource) whereat the data was written, and the sequence number, which is provided in the data writing order (prepares a position information management table). The operating state at this time is shown in
The storage sub-system 3 receives the data and the sequence number from the storage sub-system 1, stores the data in its own logical volume (third storage resource), and transmits the correlated sequence number to the storage sub-system 2.
The storage sub-system 2 receives the sequence number from the storage sub-system 3. The storage sub-system 2 then compares the received sequence number with the correlation stored in the storage sub-system 2, so that the data not reflected to the logical volume of the storage sub-system 3, i.e., the differential data, can be obtained.
Then, during the temporary operation, the asynchronous transfer copying of the data from the storage sub-system 1 to the storage sub-system 3 is halted. After the asynchronous transfer is halted, the storage sub-system 2 generates, in its control memory, a bit map that is correlated with the data storage position for a predetermined block unit of the logical volume (second storage resource) of the storage sub-system 2. Then, based on the correlation between the sequence number and the write position information that are stored in the storage sub-system 2 for the differential data that is not reflected to the storage sub-system 3, the storage sub-system 2 renders ON a bit at the pertinent position in the bit map for which the data has been updated.
Next, when the storage sub-system 2 obtains, from the bit map, the differential data that is not yet reflected to the logical volume (third storage resource) of the storage sub-system 3 and the write position information, the storage sub-system 2 transmits them to the storage sub-system 2.
The storage sub-system 3 receives the differential data and the write position information, and stores the differential data in its logical volume (third storage resource) based on the write position information. As a result, synchronization can be obtained between the contents of the logical volume (second storage resource) of the storage sub-system 2 and the contents of the logical volume (third storage resource) of the storage sub-system 3. The asynchronous transfer from the storage sub-system 2 to the storage sub-system 3 is then begun. The operation state at this time is shown in
When the data has been written from the host to the storage sub-system 1 connected thereto, and when synchronization is obtained between the storage sub-systems 1 and 2, the copying of data from the storage sub-system 1 to the storage sub-system 2 is changed to the copying of data from the storage sub-system 2 to the storage sub-system 1. That is, since the operation is switched while the data are synchronized, an extra process, such as the copying of differential data, is not required.
Following this, the job performed by the host connected to the storage sub-system I is transferred by the host connected to the storage sub-system 2. When the synchronous transfer copying of data from the storage sub-system 2 to the storage sub-system 3 is begun, the operation in the normal state in
Through the above processing, the switching from the temporary operation to the normal operation is completed.
<Another Blockage Removal Method>
A variation of the blockage removal method will now be explained.
When the storage sub-system 1 breaks down in the multi-hop system shown in
When the storage sub-system 1 is recovered, first, all the data in the storage sub-system 2 is copied to the storage sub-system 1, and the job of the sub-host is transferred by the host connected to the storage sub-system 1. In the above described manner, the data transfer direction is reversed between the storage sub-systems 1 and 2, and the normal operation is resumed (
When a blockage has occurred in the storage sub-system 3 in the multi-hop system in
When a blockage has occurred in the storage sub-system 1 in the multi-copy system in
When a blockage has occurred in the storage sub-system 3 in the multi-copy system in
<Management of Write Position Information at a Copy Source and a Copy Destination>
For the transmission of data among the storage sub-systems, the data transmission source and destination and the use of the synchronous transfer or the asynchronous transfer method is designated in various forms depending on the system configuration; for example, for this designation an operator may manipulate each storage sub-system (in this case, when a specific storage sub-system can not be used due to a blockage, a storage sub-system, as the next data transmission source, and a storage sub-system, as the next transmission destination, are registered in advance when the system is arranged), or a system attached to a storage sub-system may automatically perform the designation.
The correlation between the sequence number and the write position information is managed at the time whereat, for example, an operator begins to register the transmission source and the transmission destination for the storage sub-system.
<Method for Selecting a Storage Sub-System>
A large area data storage system in
The storage sub-system 2 detects the occurrence of a blockage in the host 1h or the storage sub-system by determining, for example, whether data has been transmitted by the storage sub-system 1, or by monitoring a heart beat message transmitted by the storage sub-system 1 at a predetermined time.
Upon the detection of the blockage, the storage sub-system 2 quickly determines the primary sub-system, and changes the operation to the temporary operation using a sub-host 2 or 3. The selection of the primary storage sub-system is performed as follows. First, upon the detection of the blockage, the storage sub-system 2 transmits, to the storage sub-system 3, a message requesting the transmission of the latest sequence number. Upon receiving this message, the storage sub-system 3 transmits the latest stored sequence number to the storage sub-system 2.
The storage sub-system 2 compares the sequence number received from the storage sub-system 3 with the latest sequence number stored in the storage sub-system 2. The storage sub-system 2 then selects, as the primary storage sub-system, a storage sub-system that has received the later sequence number, stores the identifier of the selected storage sub-system as a selection choice, and transmits the identifier to the storage sub-system 3. Based on the received identifier, the storage sub-system 3 identifies the storage sub-system that has been selected as the primary storage sub-system.
During this selection process, due to matters such as the properties of a communication method used by the storage sub-systems, of the sequence numbers stored in the storage sub-system 2 or 3 a sequence number may be omitted. In this case, the latest sequence number of the available sequential sequence numbers is employed for the above comparison.
When the primary storage sub-system is selected, the matching of the data contents stored in the storage sub-systems 2 and 3 is obtained in order to perform the double management of the data using the storage sub-systems 2 and 3. This matching is performed by copying all of the data or differential data between the storage sub-systems 2 and 3. When between the storage sub-systems 2 and 3 the data match, the storage sub-system selected as the primary storage sub-system transmits to the sub-host connected thereto a message indicating that the pertinent storage sub-system is serving as the primary storage sub-system. Upon receiving this message, the sub-host begins the operation as a proxy. Further, double data management using either synchronous transfers or asynchronous transfers is initiated by the storage sub-systems 2 and 3.
In the above explanation, the storage sub-system 2 obtains the latest sequence number from the storage sub-system 3 and selects the primary storage sub-system. However, the storage sub-system 3 may perform this process.
In addition, for a large area data storage system constituted by three storage sub-systems 1 to 3, an-example method has been explained for selecting a specific storage sub-system that is employed as a proxy when a blockage has occurred in the storage sub-system 1. This method can be employed for a large area data storage system constituted by four or more storage sub-systems.
<Management of Data in a Cache Memory>
For a system wherein at least one secondary storage sub-system, which is a destination for the remote copying of data in the primary storage sub-system connected to a host, is connected to the primary storage sub-system, an example for the management of data in the cache memory of the primary storage sub-system will now be explained.
In this system, data that do not need to be copied (remote copying) from the primary storage sub-system to the secondary storage sub-system may be deleted from the cache memory of the primary storage sub-system after the data have been written to the storage resource of the primary storage sub-system. When the data is to be copied to the secondary storage sub-system, this data must be maintained in the cache memory at least until the data has been transmitted to the secondary storage sub-system. Further, when a plurality of secondary sub-systems are present as transmission destinations, generally, the data is not transmitted at the same time to these secondary storage sub-systems because of differences in communication means and in operations. Therefore, in this case, the data must be maintained until the data has been transmitted to all the secondary sub-systems.
Thus, the primary storage sub-system manages the data to determine whether the data stored in its cache memory has been transmitted to all the secondary storage sub-systems connected to the primary storage sub-system. Specifically, for example, as is shown in
In this table, bit “0” indicates that the transmission is completed, and bit “1” indicates that the transmission is incomplete. When the data from the host is written to the primary storage sub-system, “1” is set for the bit that corresponds to a secondary storage sub-system that is defined as a transmission destination for the storage block to which the data is written. Among the “1” bits for a specific block, a bit for the secondary storage sub-system for which the data transmission has been completed is set to “0”.
The data stored in the storage blocks, the bits for which have been set to “0” for all the secondary storage sub-systems, can be deleted from the cache memory.
In the large area data storage system in
In accordance with the effects of the invention, when only the differential data is copied between the logical volumes that do not directly relate to the data transmission, e.g., the storage sub-systems 1 and 3 in
Further, in the invention, since a redundant logical volume is not required in the storage sub-system in order to perform remote copying, the efficiency in the use of the memory resources of the storage sub-system can be increased, and the cost performance of the storage sub-system can be improved.
It should be further understood by those skilled in the art that the foregoing description has been made on embodiments of the invention and that various changes and modifications may be made in the invention without departing from the spirit of the invention and the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2001-240072 | Aug 2001 | JP | national |
2002-019971 | Jan 2002 | JP | national |
This application is a continuation in part of U.S. application Ser. No. 10/096,375 filed Mar. 16, 2003 and a continuation in part of U.S. application Ser. No. 10/892,958, filed Jul. 16, 2004 which is a continuation of U.S. application Ser. No. 10/424,495, filed Apr. 25, 2003 (U.S. Pat. No. 6,813,683) which in turn is a continuation of U.S. application Ser. No. 09/526,948, filed Mar. 16, 2000 now abandoned, the disclosures of which are included herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 10096375 | Mar 2002 | US |
Child | 11321842 | Dec 2005 | US |
Parent | 09854125 | May 2001 | US |
Child | 10096375 | Mar 2002 | US |