Claims
- 1. A system for redundantly backing up data, comprising:
a first storage facility controlled by a first state machine having a finite number of states, each state of the first state machine having a set of allowed operations, and the first state machine including at least one state for controlling the first storage facility to operate as a primary storage facility, the primary storage facility being for storing and updating a primary copy of the data; and a second storage facility coupled to the first storage facility wherein the second storage facility is controlled by a second state-machine having a finite number of states, each state of the second state machine having a set of allowed operations, and the second state machine having at least one state for controlling the second storage facility to operate as a secondary storage facility, the secondary storage facility being for storing data that is redundant of the primary copy, and at least one state for controlling the second storage facility to operate as the primary storage facility.
- 2. The system according to claim 1, wherein prior to a failover event, the first storage facility operates as the primary storage facility and the second storage facility operates as the secondary storage facility and after the failover event, the second storage facility operates as the primary storage facility.
- 3. The method according to claim 2, wherein the failover event comprises a fault.
- 4. The method according to claim 3, wherein the fault occurs at a location selected from the group consisting of the first storage facility, the second storage facility and a communication medium between the first data storage facility and the second data storage facility.
- 5. The system according to claim 3, wherein detection of the fault is automatic.
- 6. The system according to claim 3, wherein the fault is detected manually.
- 7. The method according to claim 2, wherein the failover event comprises an operational event.
- 8. The method according to claim 7, wherein the failover event is selected from the group consisting of a manually initiated event and an automatically initiated event.
- 9. The method according to claim 2, wherein the failover event comprises a communication traffic conditition.
- 10. The method according to claim 9, wherein the communication traffic condition comprises a greater portion of request traffic originating closer to the second data storage facility.
- 11. The system according to claim 2, wherein the first state machine includes at least one state for controlling the first storage facility to operate as the secondary storage facility and wherein the first storage facility operates as the secondary storage facility after a fallback event.
- 12. The system according to claim 11, wherein the fallback event comprises recovery from a fault.
- 13. The system according to claim 11, wherein the fallback event comprises recovery from a fault at a location selected from the group consisting of the first storage facility, the second storage facility and a communication medium between the first and second storage facilities.
- 14. The system according to claim 11, wherein the first and second storage facilities reassume their original roles after the first storage facility operates as the secondary storage facility.
- 15. The system according to claim 11, wherein when the second storage facility operates as the primary storage facility, the second storage facility includes a primary log for storing records of updates to the primary copy of the data.
- 16. The system according to claim 11, wherein the first state machine includes at least one update state for receiving a snapshot of the primary copy of the data prior to the first storage facility operating as the secondary storage facility.
- 17. The system according to claim 2, wherein the second state machine includes a failover state for preparing the second storage facility for operating as the primary storage facility wherein the failover state is entered after a fault occurs at the first storage facility.
- 18. The system according to claim 17, wherein the secondary storage facility includes a secondary log for storing records of updates to the primary copy of the data before the updates are committed at the secondary storage facility and wherein any entries in the secondary log are committed in the failover state.
- 19. The system according to claim 1, wherein the first state machine includes at least one state for controlling the first storage facility to operate as the secondary storage facility and wherein the first and second storage facilities exchange roles based on origins of storage request traffic.
- 20. The system according to claim 19, wherein the first and second storage facilities exchange roles so that a greatest portion of the storage request traffic has its origin physically closer to the primary storage facility.
- 21. The system according to claim 1, wherein the first state machine includes at least one state in which records of updates to the primary copy of the data are stored in a primary log.
- 22. The system according to claim 21, wherein the first state machine further includes at least one logless state for updating the primary copy without storing the updates in the primary log.
- 23. The system according to claim 22, wherein the logless state is entered upon the primary log experiencing a fault.
- 24. The system according to claim 1, wherein the first state machine includes at least one state in which records of updates to the primary copy of the data are stored in a primary log in which write-ordering is preserved for use by the secondary storage facility and at least one state for recording updates to the primary copy in which write-ordering is not preserved.
- 25. The system according to claim 24, wherein the state for recording updates to the primary copy in which write-ordering is not preserved is entered upon the primary log reaching a predetermined capacity.
- 26. The system according to claim 1, wherein the first state machine includes at least one standalone state in which records of updates to the primary copy of the data are stored but not propagated to the secondary facility.
- 27. The system according to claim 26, wherein the standalone state is entered in response to the secondary facility becoming inaccessible.
- 28. The system according to claim 27, wherein the secondary facility becoming inaccessible is detected automatically.
- 29. The system according to claim 1, wherein the first and second state machines include substantially the same set of states.
- 30. The system according to claim 1, wherein the first and second state machines are each an instance of the same state machine.
- 31. A state machine for controlling a first data storage facility, the state machine having a finite number of states, each state of the first state machine having a set of allowed operations, the state machine including at least one state for controlling the first data storage facility for generating a primary log of write requests in which write-ordering of the requests is preserved and including at least one state for generating a change record of data changed in response to said write requests in which write-ordering of the requests is not preserved and wherein operation in the state for generating the primary log ceases and operation in the state for generating the change record commences when the primary log exceeds a predetermined capacity.
- 32. The state machine according to claim 31, wherein propagation of the change record to a second data storage facility removes write requests from the primary log after an acknowledgement and wherein operation in the state for generating the primary log commences after the change record is propagated.
- 33. The state machine according to claim 32, wherein the change record specifies a batch of updates to be committed at the second data storage facility as a whole.
- 34. The state machine according to claim 31, wherein in absence of a fault that affects the second data storage facility or accessibility of the second data storage facility, the write requests of the primary log are forwarded to a second data storage facility.
- 35. The state machine according to claim 34, wherein the write requests of the primary log are not forwarded to the second data storage facility after a fault is detected that affects the second data storage facility or accessibility of the second data storage facility.
- 36. A method for verifying a data redundancy system comprising:
applying a sequence of one or more events to the system; performing one or more verifications on the system; and determining whether a result of the verification is positive, and if the result is not positive, evaluating the system to determine a cause of the result.
- 37. The method according to claim 36, wherein the events are selected from the group consisting of: write operations, failover events and fallback events.
- 38. The method according to claim 36, wherein the verification comprises a state validity verification.
- 39. The method according to claim 36, wherein the verification comprises a data consistency verification.
- 40. The method according to claim 36, wherein the verification comprises a system liveness verification.
- 41. The method according to claim 36, wherein the verification comprises a state validity verification, a data consistency verification and a system liveness verification.
- 42. The method according to claim 36, further comprising repeating said steps of applying, performing and determining until the result is positive.
RELATED APPLICATIONS
[0001] The following applications disclose related subject matter: U.S. application Ser. No. (Attorney Docket No. 100204276-1), filed (on the same day as this application) and entitled, “Asynchronous Data Redundancy Technique”; U.S. application Ser. No. (Attorney Docket No. 200309042-1), filed (on the same day as this application) and entitled, “Redundant Data Consistency After Failover”; U.S. application Ser. No. (Attorney Docket No. 200309043-1), filed (on the same day as this application) and entitled, “Distributed Data Redundancy Operations”; U.S. application Ser. No. (Attorney Docket No. 200309044-1), filed (on the same day as this application) and entitled, “Fault-Tolerant Data Redundancy Technique”; U.S. application Ser. No. (Attorney Docket No. 200309045-1), filed (on the same day as this application) and entitled, “Adaptive Batch Sizing for Asynchronous Data Redundancy”; U.S. application Ser. No. (Attorney Docket No. 200309047-1), filed (on the same day as this application) and entitled, “Batched, Asynchronous Data Redundancy Technique”; U.S. application Ser. No. (Attorney Docket No. 200309499-1), filed (on the same day as this application) and entitled, “Data Redundancy Using Portal and Host Computer”; the contents of all of which are hereby incorporated by reference.