Claims
- 1. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
managing the checkpoint, the checkpoint containing checkpoint information; creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information; updating the primary replica so that the first checkpoint information corresponds to the checkpoint information; creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information; and updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information.
- 2. The method of claim 1, wherein the updating the secondary replica step uses a checkpoint message.
- 3. The method of claim 2, further comprising:
formatting the checkpoint message based on version information.
- 4. The method of claim 1, wherein the two updating steps are asynchronous.
- 5. The method of claim 1, wherein both the primary replica and the secondary replica have states.
- 6. The method of claim 5, further comprising:
maintaining the state of the primary replica; and maintaining the state of the secondary replica.
- 7. The method of claim 6, further comprising:
executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is invalid.
- 8. The method of claim 6, wherein the state of the primary replica and the state of the secondary replica each includes EMPTY, CHECKPOINTING, MISSED, COMPLETED, and CORRUPTED.
- 9. The method of claim 8, further comprising:
executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is MISSED or CORRUPTED.
- 10. The method of claim 1, further comprising:
synchronizing the first checkpoint information in the primary replica and the second checkpoint information in the secondary replica.
- 11. The method of claim 1, further comprising:
retaining the primary replica in the memory of the first node until a retention time of the primary replica expires; and retaining the secondary replica in the memory of the second node until a retention time of the secondary replica expires.
- 12. The method of claim 1, further comprising:
conducting a garbage collection based a retention time of the primary replica and a retention time of the secondary replica.
- 13. The method of claim 1, wherein the checkpoint has a plurality of checkpoint attributes.
- 14. The method of claim 1, wherein there is a control block associated with the primary replica and there is a control block associated with the secondary replica.
- 15. The method of claim 14, further comprising:
maintaining first control block information in the control block of the primary replica; and maintaining second control block information in the control block of the secondary replica.
- 16. The method of claim 15, further comprising:
formatting a checkpoint message using first control block information, second control block information, or both, wherein the checkpoint message is used in the updating the secondary replica step.
- 17. The method of claim 1, further comprising:
executing a failure recovery procedure.
- 18. The method of claim 17, wherein the executing step further comprises:
when a primary component on the first node fails, restarting the primary component using the primary replica.
- 19. The method of claim 17, wherein the executing step further comprises:
when a primary component on the first node fails, starting a secondary component on the second node as a new primary component using the secondary replica.
- 20. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, the plurality of replicas including a primary replica and a secondary replica, the method comprising:
creating the checkpoint; opening the checkpoint from the first node in a write mode; creating the primary replica in a memory of the first node; updating the checkpoint; updating the primary replica; propagating a checkpoint message, the checkpoint message including information regarding the checkpoint; opening the checkpoint from the second node in a read mode; creating the secondary replica in a memory of the second node; and updating the secondary replica based on the checkpoint message.
- 21. The method of claim 20, wherein the propagating and the updating steps are asynchronous.
- 22. The method of claim 20, further comprising:
executing a failure recovery procedure.
- 23. The method of claim 22, wherein the executing step further comprises:
making a secondary component in the second node a new primary component using the secondary replica.
- 24. The method of claim 22, wherein the executing step further comprises:
restarting a primary component in the first node using the primary replica.
- 25. The method of claim 20, further comprising:
formatting the checkpoint message using version information.
- 26. The method of claim 20, further comprising:
deleting the primary replica based on a first retention time of the primary replica; and deleting the secondary replica based on a second retention time of the secondary replica.
- 27. The method of claim 20, further comprising:
conducting a garbage collection using a first retention time of the primary replica and a second retention time of the secondary replica.
- 28. The method of claim 20, wherein the memory of the first node has a first control block for the primary replica and the memory of the second node has a second control block for the secondary replica.
- 29. The method of claim 28, further comprising:
maintaining the first control block; and maintaining the second control block.
- 30. The method of claim 20, wherein the primary component has a state and the secondary component has a state.
- 31. The method of claim 30, further comprising:
executing an error recovery procedure if the state of the primary replica or the state of the secondary replica is invalid.
- 32. The method of claim 30, wherein the state of the primary replica and the state of the secondary replica each includes EMPTY, CHECKPOINTING, MISSED,COMPLETED and CORRUPTED.
- 33. The method of claim 32, further comprising:
executing an error recovery procedure if the state of the primary replica or the state of the secondary replica is MISSED or CORRUPTED.
- 34. The method of claim 20, wherein the checkpoint has checkpoint attributes.
- 35. A computer program product configured to provide cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the computer program product comprising:
computer readable program code configured to manage the checkpoint, the checkpoint containing checkpoint information; computer readable program code configured to create the primary replica in a memory of the first node, the primary replica containing first checkpoint information; computer readable program code configured to update the primary replica so that the first checkpoint information corresponds to the checkpoint information; computer readable program code configured to create the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information; computer readable program code configured to update the secondary replica so that the second checkpoint information corresponds to the checkpoint information; and a computer readable medium having the computer readable program codes embodied therein.
- 36. A computer program product configured to provide cluster replicated checkpoint services for a plurality of replicas for a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the computer program product comprising:
computer readable program code configured to create the checkpoint; computer readable program code configured to open the checkpoint from the first node in a write mode; computer readable program code configured to create the primary replica in a memory of the first node; computer readable program code configured to update the checkpoint; computer readable program code configured to update the primary replica; computer readable program code configured to propagate a checkpoint message, the checkpoint message including information regarding the checkpoint; computer readable program code configured to open the checkpoint from the second node in a read mode; computer readable program code configured to create the secondary replica in a memory of the second node; computer readable program code configured to update the secondary replica based on the checkpoint message; and a computer readable medium having the computer readable program codes embodied therein.
- 37. A system for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
means for managing the checkpoint, the checkpoint containing checkpoint information; means for creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information; means for updating the primary replica so that the first checkpoint information corresponds to the checkpoint information; means for creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information; and means for updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information.
- 38. The system of claim 37, wherein the means for updating the secondary replica uses a checkpoint message.
- 39. The system of claim 38, further comprising:
means for formatting the checkpoint message based on version information.
- 40. The system of claim 37, wherein both the primary replica and the secondary replica have states.
- 41. The system of claim 40, further comprising:
means for maintaining the state of the primary replica; and means for maintaining the state of the secondary replica.
- 42. The system of claim 41, further comprising:
means for executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is invalid.
- 43. The system of claim 37, further comprising:
means for executing a failure recovery procedure.
- 44. The system of claim 37, further comprising:
means for maintaining a control block of the primary replica; and means for maintaining a control block of the secondary replica.
- 45. The system of claim 37, further comprising:
means for conducting a garbage collection based on a retention time of the primary replica and a retention time of the secondary replica.
- 46. A system for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, the plurality of replicas including a first replica and a second replica, the system comprising:
means for creating the checkpoint; means for opening the checkpoint from the first node in a write mode; means for creating the primary replica in a memory of the first node; means for updating the checkpoint; means for updating the primary replica; means for propagating a checkpoint message, the checkpoint message including information regarding the checkpoint; means for opening the checkpoint from the second node in a read mode; means for creating the secondary replica in a memory of the second node; and means for updating the secondary replica based on the checkpoint message.
- 47. The system of claim 46, wherein the propagating means and the updating means operate asynchronously.
- 48. A system for managing a checkpoint, the system comprising;
a first node running a primary component, including a primary replica having first checkpoint information in its memory, having a first checkpoint service, and connected to a network; and a second node running a secondary component, including a secondary replica in its memory, having a second checkpoint service, and connected to the network, wherein the first checkpoint service and the second checkpoint service are capable of accessing the checkpoint, wherein the first checkpoint service works with the primary component to update a checkpoint, issue a checkpoint message containing information regarding the checkpoint, asynchronously propagate the checkpoint message, and update the first replica, and wherein the second checkpoint service is capable of asynchronously updating the secondary replica based on the checkpoint message.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application Nos. 60/201,092 and 60/201,099, which were filed on May 2, 2000, and which are hereby incorporated by reference.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60201092 |
May 2000 |
US |
|
60201099 |
May 2000 |
US |