The present invention relates generally to safeguarding data, and more particularly to a system and method for mirroring data.
It is almost axiomatic that a good computer data network should be able to still function if a catastrophic event such as the “crash” of a disk should occur. Thus, network administrators typically perform routine processes in which data is backed up to prevent its permanent loss if such an event were to occur. When such an event occurs, the backup version of the data can be introduced into the computer network and operation of the network can continue as normal. Although routine backup processes are typically effective in restoring data on the network to allow normal operation to continue, they often do not safeguard against the loss of all data. For instance, data that is introduced into the computer network at a time period shortly after a routine backup operation is completed is often permanently loss if a catastrophic event occurs before a subsequent backup operation.
In an effort to prevent such a type of loss, in addition to performing back up processes, network administrators often use a process known as mirroring. Such a process typically includes copying data from a first data storage location to at least one other data storage location in real time. If a catastrophic event such as a “disk crash” occurs, a failover operation can then be implemented to switch to a standby database or disk storage space, thereby preventing or acutely minimizing data loss. As the data is copied in real time, the data on the other data storage location is a substantial replica of the data residing on the first data storage location most of the time. Mirroring is often strongest when it is performed remotely. Although remote mirroring is ideal, it is sometimes not used because of its degradation on input/output performance of the network. For instance, transmission latency, for example, the time it takes to copy from the main storage device to the mirror, is often one of the greatest deterrents to remote data mirroring.
Data mirroring has a significant problem similar to that described above with respect to performing routine data backups. Data as part of an I/O request introduced into the network prior to the mirroring processes is subject to permanent loss if the main storage device becomes inoperable, for example, crashes, while processing the I/O request that has not been sent to the mirror storage device. Such a result can be disastrous for a critical computer data network such as one utilized by an intelligence agency, a financial institution or network, a computer data medical network, or any other computer data network in which it is essential to prevent any loss of data.
In light of the foregoing, what is needed is a system and method for mirroring data, reducing data transmission latency, and preparing for data failover and/or synchronization.
In at least one exemplary embodiment, a system according to the invention includes a primary data storage space having a first non-volatile buffer and a secondary data storage space having a second non-volatile buffer in at least one exemplary embodiment wherein mirroring is performed to cause data stored on the secondary data storage space to replicate data stored on the primary data storage space and input/output requests affecting the primary data storage space are logged on at least the first non-volatile buffer to manage an event affecting data on the primary data storage space or data on the secondary data storage space.
In at least one exemplary embodiment, a method of the present invention includes logging a current data operation in a non-volatile buffer on a first device, executing the current data operation on the first device, transmitting the current data operation to a second device as the current data operation occurs on the first device, receiving a confirmation from the second device that the current data operation has been executed, and executing a subsequent data operation on the first device. The system and method of the invention can reduce latency and better prepare a network storage device for failover procedures.
In at least one exemplary embodiment, a method for mirroring data and preparing for failover, including logging a first data operation in a non-volatile buffer on a first device; executing the first data operation on the first device; transmitting the first data operation to a second device from the buffer on the first device; executing the first data operation on the second device; receiving a confirmation from the second device that the first data operation has been executed; logging a second data operation in the buffer on the first device; and executing a subsequent data operation on the first device.
In at least one exemplary embodiment, a system for providing fail-over for data storage includes a primary data storage unit including a buffer; a secondary data storage unit including a buffer; means for communicating between the primary data storage unit and the secondary data storage unit; and each buffer includes means for receiving a data operation and means for forwarding the data operation to at least one data storage unit.
In at least one exemplary embodiment, a system for providing failover protection for each data operation communication to the system, the system includes a first storage device having a non-volatile buffer; a second storage device; means for logging at least one data operation in the non-volatile buffer on the first storage device; means for executing the data operation on the first storage device; means for transmitting the data operation to the second storage device from the non-volatile buffer on the first storage device; means for executing the transmitted data operation on the second storage device; means for receiving a confirmation from the second storage device that the transmitted data operation has been executed.
Like reference numerals in the figures represent and refer to the same element or function throughout.
The present invention relates to a system and method for mirroring data and preparing for data failover. The system also logs data input/output requests to prepare for failover and improve the integrity of the mirroring process. When one storage unit has a failure and becomes unusable, by switching the IP address or the DNS entry, the mirror storage unit can take the place of the primary storage unit (or a replacement storage unit or back-up storage unit can take the place of the mirror storage unit).
Each of the storage units preferably includes a buffer storage space. For example, the illustrated primary storage unit (or first device) 105 includes a non-volatile random access memory (NVRAM) or other buffer storage 107. Likewise, the illustrated mirror storage unit (or second device) 110 includes a NVRAM 112, which may be omitted but if omitted then the mirror storage unit will not be able to fully replace the primary storage unit. The NVRAM 107 and the NVRAM 112 in the discussed exemplary embodiments preferably have the same capabilities unless noted otherwise. In at least one embodiment, the NVRAM is included on a memory card such as an eight gigabyte PC3200 DDR REG ECC (8×1 gigabyte) random access memory card. In at least one embodiment, the system 100 includes an emergency reboot capability. In such an embodiment, the NVRAM resides on a card with its own processor so that if the primary storage unit 105 crashes and is unable to recover, the NVRAM is able to transmit the last few instructions relating to, for example, writing, deleting, copying, or moving data within the storage unit to the mirror storage unit 110. In at least one embodiment in which the system 100 includes an emergency reboot capability, the card includes a power source to supply power to the card to complete the transmission of the last few instructions. Either of the last two embodiments can be thought of as an emergency reboot capability.
For purposes of explanation, primary means for intercepting 120 and mirror means for intercepting 122 are also illustrated in
Referring now to
In step 205, the data operation is executed. In at least one exemplary embodiment, only data operations that change stored data are sent to the mirror storage unit 110. For example, a data write operation may be executed to write a new block of data to the primary storage unit 105 and this type of operation will also occur on the mirror storage unit 110. As illustrated in
In decision step 207, if it is determined that an event has occurred, and then step 229 is executed.
In step 209, the data operation that was executed in step 205 is executed on the mirror storage unit, for example, mirror storage unit 110. After a determination is made as to whether an event has occurred in step 211, in step 213, data relating to the data operation is erased from the non-volatile buffers in both the primary and mirror storage units, for example, by having the mirror storage unit 110 notify the primary storage unit 105 of completion of the data operation. Steps 205 and 209 may be performed in reverse order to that illustrated in
In step 215, a subsequent data operation is logged in the non-volatile buffer to prepare for a fail over. In decision step 216, it is determined whether an event has occurred.
In step 217, in at least one embodiment, a subsequent data operation is executed before mirroring of the data operation executed in step 209 has completed. Executing the subsequent data operation before the previous data operation has been completed on the mirror storage unit 110 can reduce latency during the mirroring process, as data operations on the primary storage unit 105 can continue without being delayed due to waiting on the data operation on the mirror storage unit 110 to complete. Since the data operation is stored in a buffer 107, the data operation will be available for transmission to the mirror storage unit 110. In at least one embodiment, the subsequent data operation is not executed on the primary storage unit 105 until after the mirroring of the current data operation has occurred. In such a situation, after the current data operation has been completed on the primary storage unit 105, completion is not signaled to the process requesting the I/O on the primary storage unit 105 until after the current data operation has been completed on the mirror storage unit 110.
In step 221, the subsequent data operation is mirrored. In step 225, data relating to the data operation is removed, for example, erased, from non-volatile buffers in both the primary storage unit 105 and the mirror storage unit 110 upon performance of the data operation by the mirror storage unit 110. In step 226, a determination is made regarding whether an event has occurred. If it is determined in step 227 that there are more data operations, steps 202-226 are repeated. Alternatively, if it is determined that there are no more data operations to be processed, in step 229, in at least one embodiment, the data is synchronize upon occurrence of an event such as one of the events described above. Alternatively, the system waits for the next data operation. Another embodiment eliminates one or more of event decision steps from the method.
Referring now to
In step 310, the I/O request received in step 305 is intercepted and transmitted to (or logged in) the NVRAM-1107, in preparation for a fail-over situation. In particular, if the primary storage unit 105 should experience a disk crash before the I/O request can be processed, when the repaired primary storage unit 105 or its replacement storage unit (such as the mirror storage unit 110) enters an on-line state, the I/O request can be transmitted from the NVRAM-1107 and executed, thereby minimizing restoration time.
In at least one exemplary embodiment, at least one data block pointer to the data block associated with an instruction, for example, is written to the NVRAM-1107. For example, continuing with the write operation offered above, in step 310, a pointer to the actual data block that is to be written to the primary storage unit 105 is sent to the NVRAM-1107. If a mishap such as crash of the mirror storage unit 110 were to occur before the data is actually written to the mirror storage unit 110, the copy of the data in the NVRAM-1107 can be accessed and written to the mirror storage unit replacement. In at least one embodiment, the actual data to be written is stored in the NVRAM-1107.
In addition to handling a failover situation in which the mirror storage unit 110 crashes, the present invention also provides an embodiment that handles a failover situation in which the primary storage unit 105 crashes. In particular, in at least one embodiment, data associated with an instruction is stored in the NVRAM-1107. For example, continuing with the example offered above, in step 310, the actual data block that is to be written to the primary storage unit 105 is written to the NVRAM-1107. In such a situation, if the primary storage unit 105 were to experience a disk crash, thereby rendering its data inaccessible, the data can be copied from the NVRAM-1107 to the primary storage unit replacement and ultimately to the mirror storage unit 110, which likely would be the primary storage unit replacement. In particular, in at least one embodiment, a central processing unit (CPU) on the primary storage unit 105 reboots with an emergency operating system kernel which is responsible for accessing the NVRAM-1107 and performs data synchronization with mirror storage unit 110. The NVRAM logged data and the block pointers, for example, stored therein can be used to replay the mirror block updates and then the input/output requests that were “in flight” when the primary storage unit failed. The mirror storage unit 110 or another storage unit can then transparently take over input/output requests. In at least one embodiment, the processing card on which the NVRAM-1107 is stored includes its own Central Processing Unit (CPU) which can perform a synchronization regardless of whether the primary storage unit 105 is operable.
In step 315, the I/O request is executed on the primary storage unit 105. For example, the data is written to a block address within the primary storage unit 105.
It should be noted that the order of steps in
In step 320, the instruction received in the NVRAM-1107 (shown in
In step 325, the I/O request is transmitted from the intercepting means 122 to the NVRAM-2112 in preparation for failover. In-particular, if the primary storage unit 105 should experience a disk crash, for example, the mirror storage unit 110 can serve as the primary storage unit. In at least one embodiment, a synchronization is performed before the primary storage unit 105 experiences a disk crash to bring the mirror storage unit 110 up-to-date compared to the primary storage unit 105. When the primary storage unit 105 experiences a disk crash, a function of the mirror storage unit 110 will require replacement by a new mirror storage unit, which is preferably added to the system to serve the function of the mirror storage unit 110. Logging to the NVRAMs preferably continues after the replacement with the mirror storage unit 110 serving as the primary storage unit. When the original mirror storage unit 110 receives an I/O request, the I/O request will be transmitted to an NVRAM on the original mirror storage unit 110 and then ultimately transmitted to an NVRAM on the new mirror storage unit. In at least one embodiment, the primary storage unit 105 is rebuilt from the mirror storage unit 110. After the primary storage unit 105 is rebuilt, input/output operations on the primary storage unit 105 are performed.
It should be noted that the primary storage unit 105 may crash before a synchronization is possible. In such an instance, the primary storage unit 105 preferably reboots with an emergency kernel whose job includes accessing the NVRAM-1107 and performing a synchronization and/or transmission of any pending data operations. In at least one embodiment, as mentioned in the text accompanying
Failover preparation also occurs when the mirror storage unit 110 or the network to the mirror storage unit 110 should experience a disk crash, mirror block pointers preferably remain in the NVRAM-1107, for example, as the asynchronous mirror input/output has not been completed. When the mirror storage unit 110 is again available, data blocks from the primary storage unit 105 identified by the NVRAM pointer(s) are preferably asynchronously copied over to the mirror storage unit 110.
In step 330, the I/O request is executed on the mirror storage unit 110.
In step 335, the NVRAM-1107 is preferably cleared. For example, in step 335, after all data operations are allowed to complete, the data logged in NVRAM-1107 is preferably flushed or cleared. An exemplary method of accomplishing this is for the mirror storage unit 110 to send a signal to the NVRAM-1107 confirming the I/O request has been performed. It should be noted, however, that the NVRAM-1107 may also be cleared at other times. In particular, in at least one embodiment, synchronization automatically occurs when the NVRAM-1107 is full. In at least one exemplary embodiment, synchronization automatically occurs with a secondary mirror storage unit of the mirror storage unit when the NVRAM-2112 is full. In an embodiment where there is not a secondary mirror storage unit to the mirror storage unit 110, then the completed data operation is cleared form the NVRAM-2112.
It should be noted that the present invention can be utilized in conjunction with other utilities. For instance, Linux, such as Suse Linux, Knoppix Linux, Red Hat Linux, or Debian Linux high availability clustering, mirroring and fail-over capabilities can be utilized by the present invention in conjunction with the NVRAM data logging feature and the emergency reboot capability mentioned above. Such mirroring and fail-over facilities can work with networking input/output protocols used by storage devices, for example, Unix/Linux clients, SMB for Microsoft® Windows clients, and Internet Small Computer Systems Interface (ISCSI).
Domain Name Service (DNS), the standard Internet Protocol (IP) dynamic name service, can enable UNIX and Windows clients to locate remote NAS file resources. Using DNS round robin IP assignment, I/O work load balancing can be achieved between the primary and mirror NAS machines, in such a case, both NAS machines should serve as primaries and would serve as mirrors for the other NAS machine, i.e., when one machine receives a data operation manipulating data it will transmit the data operation to the second machine. It should be noted that a code change to the root DNS server can be performed so that it only assigns an IP address if a particular machine is operable.
In the example shown in
The primary machine NAS-A-1414 in
Good throughput is experienced by the system, as both NAS-A and NAS-B machines are used as DNS load balanced primaries in the illustrated embodiment. Thus, approximately half the workload was being accomplished by each machine. This is preferably ideal as read activity is usually higher than update activity requiring mirroring. In situations of high update activity, it is probably best to configure the NAS-B machines as dedicated to mirroring and fail-over.
When it is required to recover a file from a NAS-C backup, the required NAS-C file system was mounted, and “DD copy” was used to copy the required file. In cases where client machines (that is, in cases which other machines in addition to the NASs) required connectivity to NAS backup machines, corresponding NAS-A and NAS-B machines provided needed IP forwarding, as NAS-C machines did not have a direct connection to the big gigabyte switch 412 shown in
Backups for the systems illustrated in
The testing of the system 400 illustrated in
While the mirror storage device 610 is offline, the primary storage device 605 preferably continues to handle production operations and changed block numbers are preferably logged in non-volatile buffers, for example, NVRAMs so that the mirror storage device 610 can be updated, that is, synchronized when it is brought back on-line after the backup has been completed.
The illustrated functional relationship during the backup is the mirror storage device 610 operates as a primary storage device 605, and the third storage device 612 operates as a mirror storage device through connection 608 as illustrated in
As will be appreciated by one of ordinary skill in the art, the present invention may be embodied as a computer implemented method, a programmed computer, a data processing system, a signal, and/or computer program. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program on a computer-usable storage medium having computer-usable program code embodied in the medium. Any suitable computer readable medium may be utilized including hard disks, CD-ROMs, optical storage devices, carrier signals/waves, or other storage devices.
Computer program code for carrying out operations of the present invention may be written in a variety of computer programming languages. The program code may be executed entirely on at least one computing device, as a stand-alone software package, or it may be executed partly on one computing device and partly on a remote computer. In the latter scenario, the remote computer may be connected directly to the one computing device via a LAN or a WAN (for example, Intranet), or the connection may be made indirectly through an external computer (for example, through the Internet, a secure network, a sneaker net, or some combination of these).
It will be understood that each block of the flowchart illustrations and block diagrams and combinations of those blocks can be implemented by computer program instructions and/or means. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowcharts or block diagrams.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means or program code that implements the function specified in the flowchart block or blocks.
The computer program instructions may also be loaded, e.g., transmitted via a carrier wave, to a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
Various templates and the database(s) according to the present invention may be stored locally on a provider's stand-alone computer terminal (or computing device), such as a desktop computer, laptop computer, palmtop computer, or personal digital assistant (PDA) or the like. Accordingly, the present invention may be carried out via a single computer system, such as a desktop computer or laptop computer.
As is known to those of ordinary skill in the art, network environments may include public networks, such as the Internet, and private networks often referred to as “Intranets” and “Extranets.” The term “Internet” shall incorporate the terms “Intranet” and “Extranet” and any references to accessing the Internet shall be understood to mean accessing an Intranet and/or an Extranet, as well unless otherwise noted. The term “computer network” shall incorporate publicly accessible computer networks and private computer networks.
The exemplary and alternative embodiments described above may be combined in a variety of ways with each other. Furthermore, the steps and number of the various steps illustrated in the figures may be adjusted from that shown.
It should be noted that the present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, the embodiments set forth herein are provided so that the disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. The accompanying drawings illustrate exemplary embodiments of the invention.
Although the present invention has been described in terms of particular exemplary and alternative embodiments, it is not limited to those embodiments. Alternative embodiments, examples, and modifications which would still be encompassed by the invention may be made by those skilled in the art, particularly in light of the foregoing teachings.
Those skilled in the art will appreciate that various adaptations and modifications of the exemplary and alternative embodiments described above can be configured without departing from the scope and spirit of the invention. Therefore, it is to be understood that, within the scope of the appended claims, the invention may be practiced other than as specifically described herein.
This patent application claims the benefit of U.S. Provisional Patent Application No. 60/627,971, filed Nov. 16, 2004, which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60627971 | Nov 2004 | US |