Claims
- 1. In a computing environment having a cluster of servers and a plurality of storage devices, a method of operation, comprising:
a first of said cluster of servers having a need to write a first version of a unit of coherent data into said plurality of storage devices determining whether a valid second version of the unit of coherent data is replicated on a second of the said cluster of servers as a result of a preceding delegated write operation; and the first server conditionally replicating the first version of the unit of coherent data on a selected one of said second and a third of said cluster of servers, based at least in part on the result of said determination, to delegate to the selected one of said second and third servers, the writing of the first version of the unit of coherent data into the plurality of storage devices.
- 2. The method of claim 1, wherein the method further comprises
a lock manager of the first server requesting a partition lock manager for a write lock on the unit of coherent data; and the partition lock manager, in response, identifying for the lock manager of the first server, the second server as having the valid second version of the unit of coherent data by virtue of the fact that the second server is a last synchronization server target of a last delegated write operation.
- 3. The method of claim 2, the method further comprises the partition lock manager learning of the second server being the last synchronization server target, by examining an active synchronization server target property of a control object corresponding to the unit of coherent data.
- 4. The method of claim 1, wherein said second server is a last synchronization server target of a last delegated write operation of the coherent unit of data, and said determining comprises determining whether said second server continues to consider the second version of the unit of coherent data as active.
- 5. The method of claim 4, wherein the method further comprises
the second server maintaining the second version of the unit of coherent data as a valid active object if the second server continues to consider the second version of the unit of coherent data as active; and said determining of whether said second server continues to consider the second version of the unit of coherent data as active comprises determining whether the unit of coherent data is among the valid active objects maintained by the second server.
- 6. The method of claim 4, wherein the method further comprises the first server further determining whether the second server is an eligible synchronization server target based at least in part on a current usage level of the second server, if it is determined that the second server contains the valid second version of the unit of coherent data.
- 7. The method of claim 6, wherein said determining of whether the second server is an eligible synchronization server target based at least in part on a current usage level of the second server comprises the first server examining a usage indicia of the second server, and determining whether the usage indicia is below a predetermined threshold.
- 8. The method of claim 7, wherein said examining of the usage indicia comprises the first server examining a local copy of the usage indicia, and the method further comprises each of the cluster of servers periodically providing each other with its usage indicia, and maintaining local copies of the received indicia.
- 9. The method of claim 8, wherein said usage indicia is a composite usage indicia computed from a plurality resource utilizations, and the method further comprises each of the cluster of servers periodically computing its own composite usage indicia.
- 10. The method of claim 6, wherein the method further comprises identifying said third server if it is determined that the second server is not an eligible synchronization server target, the third server being identified based at least in part on the relative current usage levels of the cluster of servers, excluding the second server.
- 11. The method of claim 10, wherein the third server is also identified based at least in part on its membership of an eligible synchronization server target group.
- 12. The method of claim 4, wherein the method further comprises identifying said third server if it is determined that the second server no longer considers the second version of the unit of coherent data as active, the third server being identified based at least in part on the relative current usage levels of the cluster of servers, excluding the second server.
- 13. The method of claim 12, wherein the third server is a selected one of the first server and a member of an eligible synchronization server target group.
- 14. The method of claim 1, wherein the method further comprises the selected one of the second and third servers writing the first version of the unit of coherent data into the plurality of storage devices at a subsequent point in time.
- 15. The method of claim 14, wherein said writing of the first version of the unit of coherent data into the plurality of storage devices comprises
obtaining at least shared read access to the unit of coherent data; and validating a timestamp of the unit of coherent data with a partition lock manager.
- 16. The method of claim 15, wherein said writing of the first version of the unit of coherent data into the plurality of storage devices further comprises notifying another server to cancel any scheduled write back of its version of the unit of coherent data.
- 17. The method of claim 14, wherein said writing of the first version of the unit of coherent data into the plurality of storage devices comprises
reading a prior version of the unit of coherent data and its corresponding parity data; computing a new parity value for the first version of the unit of coherent data to be written; RAID writing the first version of the unit of coherent data to be written and the computed new parity value; and updating a partition lock manager with a new write timestamp for the unit of coherent data.
- 18. The method of claim 14, wherein said writing of the first version of the unit of coherent data into the plurality of storage devices comprises invalidating another replicated version of the unit of coherent data on another server.
- 19. The method of claim 1, wherein the method further comprises
a fourth of the cluster of server requesting a partition lock manager for read lock on the unit of coherent data; the partition lock manager in response informing the first server of the request, and requesting the first server to demote its write lock on the unit coherent data to a shared read lock; the first server in response demoting the lock as requested, and replicating a copy of the first version of the unit of coherent data on the fourth server.
- 20. The method of claim 1, wherein the unit of coherent data is a selected one of a data block, a data stripe, a map table, a state table and a unit of cached data.
- 21. In a first server of a cluster of servers coupled to each other and to a plurality of storage devices, a method of operation, comprising:
obtaining a write lock for a unit of coherent data, a first version of which is to be written into the plurality of storage devices; receiving a last synchronization server target; determining whether the last synchronization server target is to be selected as a current synchronization server target; selecting the last synchronization server target as the current synchronization server target, if it is to be selected; replicating the first version of the unit of coherent data on the selected current synchronization server target to delegate to the selected current synchronization server target the writing of the first version of the unit of coherent data into the plurality of storage devices.
- 22. The method of claim 21, the method further comprises selecting a second other server of the cluster as the current synchronization server target if it is determined that the last synchronization server target is not to be selected as the current synchronization server target.
- 23. The method of claim 22, wherein the second other server is a member of an eligible synchronization server target group.
- 24. The method of claim 21, wherein the method further comprises
demoting the obtained write lock to a shared read lock; and further replicating the first version of the unit of coherent data on a second server of the cluster, the second server being a server wanting to read a current version of the unit of coherent data.
- 25. The method of claim 21, wherein the method further comprises
reading a prior version of the unit of coherent data and its corresponding parity data; computing a new parity value for the first version of the unit of coherent data to be written; RAID writing the first version of the unit of coherent data and the computed new parity value into the plurality of storage devices; and updating a partition lock manager with a new write timestamp for the unit of coherent data.
- 26. The method of claim 25, wherein the method further comprises invalidating another replicated version of the unit of coherent data on another server.
- 27. The method of claim 21, wherein the method further comprises computing a usage indicia of the first server, and providing the computed usage indicia to the other servers of the cluster.
- 28. The method of claim 27, wherein the usage indicia is a composite usage indicia, and said computing comprises computing the composite usage indicia of the first server based on a plurality resource utilizations of the first server.
- 29. An article of manufacture comprising:
a storage medium; a software RAID driver stored in said storage medium, designed to program a server to enable the server to facilitate RAID writing of coherent data into a plurality of storage devices to which the server and other servers are coupled, and reading of the coherent data; and a distributed lock manager stored in said storage medium, also designed to program the server, to operationally assist the software RAID in said writing and reading of coherent data, including
obtaining a write lock for a unit of coherent data, a first version of which is to be written into the plurality of storage devices; receiving a last synchronization server target, determining whether the last synchronization server target is to be selected as a current synchronization server target, and selecting the last synchronization server target as the current synchronization server target, if it is to be selected; wherein the software RAID driver replicates the first version of the unit of coherent data on the selected current synchronization server target to delegate to the selected current synchronization server target the writing of the first version of the unit of coherent data into the plurality of storage devices.
- 30. The article of claim 29, wherein the distributed lock manager is also designed to enable the server to select a second other server of the cluster as the current synchronization server target if it is determined that the last synchronization server target is not to be selected as the current synchronization server target.
- 31. The article of claim 29, wherein
the distributed lock manager is further designed to enable the server to demote the obtained write lock to a shared read lock; and the SW RAID driver is further designed to enable the server to replicate the first version of the unit of coherent data on a second other server, the second server being a server wanting to read a current version of the unit of coherent data.
- 32. The article of claim 29, wherein
the distributed lock manager is further designed to enable the server to read a prior version of the unit of coherent data and its corresponding parity data, and compute a new parity value for the first version of the unit of coherent data to be written; and the software RAID driver is further designed to enable the server to RAID write the first version of the unit of coherent data and the computed new parity value into the plurality of storage devices.
- 33. The article of claim 29, wherein the distributed lock manager is further designed to enable the server to update a partition lock manager with a new write timestamp for the unit of coherent data.
- 34. The article of claim 29, wherein the distributed lock manager is further designed to enable the server to invalidate another replicated version of the unit of coherent data on another server.
- 35. The article of claim 29, wherein the software RAID driver is further designed to enable the server to periodically compute a usage indicia for the server, and provide the computed usage indicia to the other servers of the cluster.
- 36. The article of claim 35, wherein the usage indicia is a composite usage indicia, and the software RAID driver is designed to compute the composite usage indicia based on a plurality resource utilizations of the server.
- 37. A server comprising:
a software RAID driver to facilitate RAID writing of coherent data into a plurality of storage devices to which the server and other servers are coupled, and reading of the coherent data; and a distributed lock manager operationally coupled to the software RAID to assisting the software RAID in said writing and reading of coherent data, including
obtaining a write lock for a unit of coherent data, a first version of which is to be written into the plurality of storage devices; receiving a last synchronization server target, determining whether the last synchronization server target is to be selected as a current synchronization server target, and selecting the last synchronization server target as the current synchronization server target, if it is to be selected; wherein the software RAID driver replicates the first version of the unit of coherent data on the selected current synchronization server target to delegate to the selected current synchronization server target the writing of the first version of the unit of coherent data into the plurality of storage devices.
- 38. In a first server of a cluster of servers coupled to each other and to a plurality of storage devices, a method of operation, comprising:
receiving from a second server of the cluster, a replicated copy of a first version of a unit of coherent data, to be written into the plurality of storage devices on behalf of the second server; scheduling the requested write; obtaining from a partition lock manager, at least a shared read lock on the unit of the coherent data; validating with the partition lock manager, a timestamp of the replicated copy; obtaining a prior version of the unit of coherent data and its parity data; computing new parity data for the first version of the unit of coherent data; writing the first version of the unit of coherent data and the computed new parity data into the plurality of storage devices; and updating the partition lock manager with a new write timestamp.
- 39. The method of claim 38, the method further comprises notifying another server to cancel any scheduled write back of its version of the unit of coherent data.
- 40. The method of claim 38, wherein the method further comprises computing a usage indicia of the first server, and providing the computed usage indicia to the other servers of the cluster.
- 41. The method of claim 40, wherein the usage indicia is a composite usage indicia, and said computing comprises computing the composite usage indicia of the first server based on a plurality resource utilizations of the first server.
- 42. An article of manufacture comprising:
a storage medium; a distributed lock manager stored in the storage medium, designed to program a server to enable the server to facilitate obtaining of locks from a partition lock manager and validating timestamps of units of coherent data with the partition lock manager; and a software RAID driver stored in the storage medium, also designed to program the server, to facilitate RAID writing of coherent data into a plurality of storage devices to which the server and other servers are coupled, and reading of the coherent data, including performing delegated writes for other servers, wherein for the performance of a delegated write, the software RAID driver is designed to
receive from a second server of the cluster, a replicated copy of a first version of a unit of coherent data, to be written into the plurality of storage devices on behalf of the second server, schedule the requested write, obtain through the distributed lock manager, at least a shared read lock on the unit of the coherent data, validate through the distributed lock manager, a timestamp of the replicated copy, obtain a prior version of the unit of coherent data and its parity data, compute new parity data for the first version of the unit of coherent data, write the first version of the unit of coherent data and the computed new parity data into the plurality of storage devices, and update the partition lock manager with a new write timestamp.
- 43. The article of claim 42, the software RAID driver is further designed to enable the server to notify another server to cancel any scheduled write back of its version of the unit of coherent data.
- 44. The article of claim 42, wherein the software RAID driver is further designed to enable the server to periodically compute a usage indicia for the server, and provide the computed usage indicia to the other servers of the cluster.
- 45. The article of claim 44, wherein the usage indicia is a composite usage indicia, and the software RAID driver is designed to compute the composite usage indicia based on a plurality resource utilizations of the server.
- 46. A server comprising:
a distributed lock manager to enable the server to facilitate obtaining of locks from a partition lock manager and validating timestamps of units of coherent data with the partition lock manager; and a software RAID driver operationally coupled to the distributed lock manager to facilitate RAID writing of coherent data into a plurality of storage devices to which the server and other servers are coupled, and reading of the coherent data, including performing delegated writes for other servers, wherein for the performance of a delegated write, the software RAID driver is designed to
receive from a second server of the cluster, a replicated copy of a first version of a unit of coherent data, to be written into the plurality of storage devices on behalf of the second server, schedule the requested write, obtain through the distributed lock manager, at least a shared read lock on the unit of the coherent data, validate through the distributed lock manager, a timestamp of the replicated copy, obtain a prior version of the unit of coherent data and its parity data, compute new parity data for the first version of the unit of coherent data, write the first version of the unit of coherent data and the computed new parity data into the plurality of storage devices, and update the partition lock manager with a new write timestamp.
- 47. A cluster of servers comprising:
a first server having a first software RAID driver and a first distributed lock manager operationally coupled to each other to identify a coupled second server as a last synchronization server target, determine whether the second server is to be selected as a current synchronization server target, and if so, replicate a first version of a unit of coherent data on the second server to delegate to the second server, writing of the first version of the unit of coherent data into a plurality of storage devices coupled to the cluster of servers; and the second server, having a second software RAID driver and a second distributed lock manager operationally coupled to each other to receive from the first server a replicated copy of the first version of the unit of coherent data, and subsequently perform the delegated write for the first server.
- 48. The server of claim 47, wherein the second software RAID driver and the second distributed lock manager are designed to perform the delegated write by
obtaining, at least a shared read lock on the unit of the coherent data, validating a timestamp of the replicated copy, obtaining a prior version of the unit of coherent data and its parity data, computing new parity data for the first version of the unit of coherent data, writing the first version of the unit of coherent data and the computed new parity data into the plurality of storage devices, and updating the partition lock manager with a new write timestamp.
- 49. The cluster of claim 48, wherein the second software RAID driver and the second distributed lock manager are further designed to notify the first server to cancel any scheduled write of its version of the unit of coherent data.
- 50. The cluster of claim 47, wherein both the first and second software RAID drivers are further designed to periodically compute respective usage indicia of the first and second servers, and notify each other of the computed result.
- 51. A cluster of servers comprising:
a first server having a first software RAID driver and a first distributed lock manager operationally coupled to each other to delegate to a coupled second server, writing of a first version of a unit of coherent data into a plurality of storage devices coupled to the cluster of servers; and the second server, having a second software RAID driver and a second distributed lock manager operationally coupled to each other to perform the delegated write on behalf of the first server; wherein for the performance of the delegated write, the second software RAID driver and the second distributed lock are designed to
receive from the first server, a replicated copy of the first version of the unit of coherent data, schedule the requested write, obtain at least a shared read lock on the unit of the coherent data, validate a timestamp of the replicated copy, obtain a prior version of the unit of coherent data and its parity data, compute new parity data for the first version of the unit of coherent data, write the first version of the unit of coherent data and the computed new parity data into the plurality of storage devices, and update the partition lock manager with a new write timestamp.
- 52. The cluster of claim 51, wherein the second software RAID driver and the second distributed lock manager are further designed to notify the first server to cancel any scheduled write of its version of the unit of coherent data.
- 53. The cluster of claim 51, wherein both the first and second software RAID drivers are further designed to periodically compute respective usage indicia of the first and second servers, and notify each other of the computed result.
RELATED APPLICATION
[0001] This application is a non-provisional application of provisional application No. 06/305,282, filed on Jul. 12, 2001. This application claims priority to the filing date of the '282 provisional application, and incorporates its specification hereby in totality by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60305282 |
Jul 2001 |
US |