Claims
- 1. A method for managing data, the method comprising the steps of:
allocating a plurality of partitions of a shared cache, wherein said plurality of partitions include a partition in each of a set of two or more nodes of a multiple node system; establishing a mapping between the plurality of partitions and a plurality of data items; and in response to a request for a particular data item by a first node of said multiple node system, performing the steps of
determining which partition of said plurality of partitions corresponds to the particular data item based on said mapping; determining whether the particular data item currently resides in said corresponding partition; if the particular data item does not currently reside in said corresponding partition, then loading a copy of the particular data item into the corresponding partition; and providing the particular data item from the corresponding partition to the first node.
- 2. The method of claim 1 wherein the step of providing the particular data item includes the first node reading the data item from the corresponding partition.
- 3. The method of claim 1 wherein the step of providing the particular data item includes the particular data item being sent from the corresponding partition to the first node.
- 4. The method of claim 1 wherein the corresponding partition resides on a second node of said multiple node system that is different than said first node.
- 5. The method of claim 1, wherein the multiple node system includes one or more nodes that use said shared cache but do not themselves have any partition of said shared cache.
- 6. The method of claim 1 wherein:
each partition of said plurality of partitions maintains lock structures for data items that correspond to the partition based on the mapping; and the method includes gathering information from each partition of the plurality of partitions to construct wait-for-graphs to perform deadlock detection.
- 7. The method of claim 1 wherein:
a dirty version of the particular data item resides in a second node of the multiple node system; and the step of loading a copy of the particular data item into the corresponding partition includes loading the dirty version of the particular data item from the second node into the corresponding partition.
- 8. The method of claim 1 wherein the step of establishing a mapping includes:
establishing the mapping by performing a hash operation on hash keys to produce hash values; and establishing the mapping based on the hash values.
- 9. The method of claim 8 wherein at least a portion of the hash keys are identifiers associated with data items.
- 10. The method of claim 8 wherein at least a portion of the hash keys are identifiers associated with persistent storage devices.
- 11. The method of claim 1 wherein the step of establishing a mapping includes mapping each partition to only data items that persistently reside on storage devices that can be directly accessed by the node on which the partition resides.
- 12. The method of claim 1 wherein recovery from failure of a particular partition of said plurality of partitions includes recovering a data item that resided in said particular partition based on a current version of the data item from a node-private cache of a surviving node of said plurality of nodes.
- 13. The method of claim 12 further comprising:
in response to a change made by the surviving node to the data item, generating redo information associated with the change; and preventing the surviving node from sending the data item for recovery until the redo information is flushed to persistent storage.
- 14. The method of claim 1 wherein:
a node of said plurality of nodes includes a checkpoint associated with a redo log; and the method further comprises preventing the node from advancing the checkpoint past a position in the redo log associated with a particular data item until the partition, of said plurality of partitions, that is associated with the data item has written the data item to persistent storage.
- 15. The method of claim 1 wherein recovery from failure of a particular partition of said plurality of partitions includes merging redo information from a plurality of nodes in said multiple node system.
- 16. The method of claim 1 wherein recovery from failure of a particular partition of said plurality of partitions includes
recovering a first subset of data items that map to said failed cache based on current versions of data items that reside in node-private caches of nodes, in said multiple node system, that have not failed; and recovering a second subset of data items that map to said failed cache based on redo information created by merging redo from a plurality of nodes in said multiple node system.
- 17. The method of claim 1 further comprising:
in response to a change made by a node to a data item, generating redo information associated with the change; and preventing the node from sending a data item to another node until the redo information is flushed to persistent storage.
- 18. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 1.
- 19. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 2.
- 20. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 3.
- 21. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 4.
- 22. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 5.
- 23. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 6.
- 24. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 7.
- 25. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 8.
- 26. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 9.
- 27. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 10.
- 28. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 11.
- 29. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 12.
- 30. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 13.
- 31. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 14.
- 32. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 15.
- 33. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 16.
- 34. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 17.
PRIORITY CLAIM/RELATED APPLICATIONS
[0001] This application claims the benefit of priority from U.S. Provisional Application Ser. No. 60/492,019 entitled “Shared Nothing on Shared Disk Hardware”, filed Aug. 1, 2003, which is incorporated by reference in its entirety for all purposes as if fully set forth herein.
[0002] This application is a Continuation-in-Part of U.S. application Ser. No. 10/665,062, entitled “Ownership Reassignment in a Shared-Nothing Database System,” filed Sep. 17, 2003; and U.S. application Ser. No. 10/718,875, entitled “One-Phase Commit in a Shared-Nothing Database System,” filed Nov. 21, 2003; which are incorporated by reference in their entirety for all purposes as if fully set forth herein.
[0003] This application is related to U.S. application Ser. No.______, (Attorney Docket No. 50277-2323) entitled “Dynamic Reassignment of Data Ownership,” by Roger Bamford, Sashikanth Chandrasekaran and Angelo Pruscino, filed on the same day herewith, and U.S. application Ser. No.______, (Attorney Docket No. 50277-2325) entitled “Parallel Recovery by Non-Failed Nodes,” by Roger Bamford, Sashikanth Chandrasekaran and Angelo Pruscino, filed on the same day herewith; both of which are incorporated by reference in their entirety for all purposes as if fully set forth herein.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60492019 |
Aug 2003 |
US |
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
10718875 |
Nov 2003 |
US |
Child |
10831248 |
Apr 2004 |
US |
Parent |
10665062 |
Sep 2003 |
US |
Child |
10831248 |
Apr 2004 |
US |