Claims
- 1. A method of logging a system having a database so that the system can recover the database in case of a system failure, comprising the steps of:
generating log records, each representing update to the database by storing physical image of the update portion; assigning an update sequence number representing the sequence of database updates to each log record; and storing the generated log records into one or more log disks.
- 2. The method of claim 1, further comprising the step of:
checkpointing by storing a backup copy of the database into one or more backup disks.
- 3. The method of claim 1, wherein the database resides in main memory.
- 4. The method of claim 1, wherein the system processes transactions where a transaction is a set of operations forming a logical unit in an application.
- 5. The method of claim 1, wherein the update sequence number is a global sequence number (GSN) stored in a global counter representing the sequence of updates to the entire database.
- 6. The method of claim 5, wherein the step of assigning the update sequence number comprises the steps of:
acquiring a latch for the global counter; increasing the counter by 1 and saving its the current value; release the acquired latch; and returning the saved counter value.
- 7. The method of claim 1, wherein the update sequence number is a transaction sequence number (TSN) representing the sequence of transactions performed by the system.
- 8. The method of claim 1, wherein the update sequence number is a slot sequence number (SSN) representing the sequence of updates to a given slot of the database.
- 9. The method of claim 1, the step of assigning an update sequence number further comprises the step of assigning a page version number (PVN) that is increased by one when a page is formatted with a different slot size.
- 10. A method of logging a system having a database so that the system can recover the database in case of a system failure, comprising the steps of:
partitioning the database; generating log records, each representing update to the database by storing physical image of the update portion; assigning an update sequence number representing the sequence of database updates to each log record for each database partition; and storing the generated log records into one or more log disks.
- 11. The method of claim 10, wherein the update sequence number is a global sequence number representing the sequence of updates to the entire database.
- 12. The method of claim 10, wherein the update sequence number is a transaction sequence number representing the sequence of transactions performed by the system.
- 13. The method of claim 10, wherein the update sequence number is a slot sequence number representing the sequence of updates to a given slot of the database.
- 14. A method of logging a system having a primary database in main memory for transaction processing so that the system can recover the database in case of a system failure wherein the system maintains an active transaction table (ATT) for storing active transactions, private log buffers for storing redo and unto transactions, and public log buffers for storing committed redo transactions, the method comprising the steps of:
generating log records, each representing update to the database by storing physical image of the update portion; assigning an update sequence number representing the sequence of database updates to each log record; assigning a page version number (PVN) that is increased by one when a page is formatted with a different slot size; and storing the generated log records into one or more log disks.
- 15. The method of claim 14, wherein the step of generating log records comprises the steps of:
generating redo log record and undo log records in a private log buffer. updating the slot; and copying the update sequence number of the slot and the PVN of the page into the redo log record and incrementing the update sequence number.
- 16. The method of claim 15, wherein the step of generating log records further comprises the steps of:
generating the redo log records during a transaction by combining the headers and after images, and append them together with a transaction commit record; flushing the private log buffer to a corresponding public log buffer; waiting until the redo log records are written to the disk; releasing all the locks that the transaction holds; notifying the user involved in the transaction that the transaction has been committed; removing the matching element from the ATT; and deleting the private log buffer for the transaction.
- 17. The method of claim 16, wherein the step of generating log records further comprises the steps of:
applying the undo log record to the primary database; releasing all the locks that the transaction holds; notifying the user that the transaction has been aborted; removing the matching element from the ATT in a latched state; and deleting the private buffer for the transaction.
- 18. The method of claim 14, wherein the update sequence number is a global sequence number representing the sequence of updates to the entire database.
- 19. The method of claim 14, wherein the update sequence number is a transaction sequence number representing the sequence of transactions performed by the system.
- 20. The method of claim 14, wherein the update sequence number is a slot sequence number representing the sequence of updates to a given slot of the database.
- 21. The method claim 14, whether comprising the steps of:
creating a begin_checkpoint record and appending it to all the public log buffers; choosing the backup database that was the least recently checkpointed as the current backup database; for each checkpointing partition in parallel, while scanning the database page by page, copying all dirty pages into the current backup database asynchronously, and waiting until all asycnchronous I/Os are completed; for each transaction/log partition in parallel, holding a a latch on the ATT, writing the undo logs of all the active transactions on the assigned disk, and writing the active transaction list on the assigned disk, and releasing the latch; and appending an end_checkpoint record into all the public log buffers and update log anchors with the current back up database ID and begin checkpointing the positions of log records.
- 22. A method of recovering a database in a system from a system failure, wherein the system generates log records representing updates to the database, the method comprising the steps of:
reading log records, each having an update sequence number representing the sequence of database updates and a physical image of the updated portion; and selectively replaying the log records based on the update sequence number
- 23. The method of claim 22, wherein the selective replaying step comprises the step of:
replaying a log record if the update sequence number in the log record is larger than the most recently played update sequence number.
- 24. The method of claim 22, further comprising the step of:
reading a backup copy of the database stored in one or more backup disks by checkpointing before said step of reading log records.
- 25. The method of claim 22, wherein the database resides in main memory.
- 26. The method of claim 22, wherein the system processes transactions where a transaction is a set of operations forming a logical unit in an application.
- 27. The method of claim 22, wherein the step of selectively replaying comprises the step of applying redo and undo operations using public log buffering.
- 28. The method of claim 22, wherein the step of selectively replaying comprises the step of applying redo operations only using private buffering.
- 29. The method of claim 22, wherein the update sequence number is a global sequence number representing the sequence of updates to the entire database.
- 30. The method of claim 22, wherein the update sequence number is a transaction sequence number representing the sequence of transactions performed by the system.
- 31. The method of claim 22, wherein the update sequence number is a slot sequence number representing the sequence of updates to a given slot of the database.
- 32. The method of claim 22, wherein the selective replaying step comprises the step of:
replaying a log record if the update sequence number in the log record is larger than the most recently played sequence number.
- 33. A method of recovering a database in a system from a system failure, wherein the system generates log records representing updates to the database wherein the system maintains an active transaction table (ATT) for storing active transactions, private log buffers for storing redo and unto transactions, and public log buffers for storing committed redo transactions, the method, the method comprising the steps of:
reading log records, each having an update sequence number representing the sequence of database updates and a physical image of the updated portion reading page version numbers (PVN) each of which was increased by one when a page is formatted with a different slot size.; and selectively replaying the log records based on the update sequence number
- 34. The method of claim 33, wherein the update sequence number is a global sequence number representing the sequence of updates to the entire database.
- 35. The method of claim 33, wherein the update sequence number is a transaction sequence number representing the sequence of transactions performed by the system.
- 36. The method of claim 33, wherein the update sequence number is a slot sequence number representing the sequence of updates to a given slot of the database.
- 37. The method of claim 33, wherein the step of selectively replaying further comprising the steps of:
reading the position of a begin_checkpoint record from a log anchor and marking it as the beginning of the log record; initializing the ATT from the active transaction list stored in the log anchor; going backward in the log record from the end until the first commit record is encountered; marking the position as the end of the log record; from the marked log beginning to the marked log end going forward in the log, doing the step further comprising the following steps of:
(A) for an updated log record,
(i) holding a latch on the page, (ii) if the update record's PVN is larger or equal to the page header's PVN, proceeding to the next step, otherwise releasing the latch and ignoring the record; (iii) if the update record's update sequence number is larger than the current update sequence number, updating the current update sequence number with the record's update sequence number, otherwise, ignoring the current update record; (iv) releasing the latch; (B) for a committed log record,
(i) removing the corresponding transaction ID (TID) from the ATT, if it exists; waiting until the back loading completes; and rolling back the remaining TIDs in the ATT.
- 38. The method of claim 37, further comprising the steps of:
finding the recently checkpointed backup database copy from the log anchor information; reading the pages of the backup database into a backup database buffer; and for each page read in the backup database buffer,
A. holding a latch on the page; B. if the buffered page's PVN is equal to larger the primary page header's PVN, proceeding to the next step, otherwise releasing the latch and skipping this page; C. for each slot in the page,
(i) if the update sequence number is larger than the stored update sequence number, override the image and the stored sequence number with the after image and the new update sequence number, respectively, (iii) otherwise, ignoring the current update record; and D. releasing the latch.
- 39. A method for hot-standby in a transaction service system using a database where a slave server takes over a master server in case of a problem wherein the two servers exchange heartbeat messages for monitoring working conditions and the system stores log records representing incremental changes to the database, the method comprising the step of taking over the slave server that further comprises the steps of:
waiting until all received log records are displayed; aborting all active transactions at the moment; setting a sync_position as the recently received log record ID; setting a send_position as the tail address of the log file; and resuming the transaction service.
- 40. The method of claim 39, further comprising the step of normal processing of the slave server that further comprises the steps of:
A. if the connection with the slave server is available,
(i) receiving a log page from the master server, and replaying log records in the page; (ii) if the received log record is safely stored in the log disk, sending the acknowledgement message to the master server; B. otherwise,
(i) if the heartbeat message dose not arrive for a period, invoking said step of taking over the slave server.
- 41. The method of claim 39, further comprising the step of restarting of failed master server that further comprises the steps of:
A. doing the following synchronization process for lost transactions:
(i) requiring the synchronizing processing by asking the sync_position to the taken-over server; (ii) collecting the transactions that were committed in the log located from the sync_position; (iii) sending all the log records generated by the collected transactions to the taken-over server; and B. invoking said step of normal processing of the slave server.
- 42. The method of claim 41, further comprising the step of synchronization process of the taken-over server that further comprises the steps of:
A. sending the sync_position to the failed master server: B. for each received log record of the lost transactions,
(i) if the timestamp of the log record is larger than that of the corresponding slot, replaying it with transactional locking; (ii) otherwise, ignoring the log record.
- 43. The method of claim 42, further comprising the step of normal processing of the master server that further comprises the steps of:
A. if the connection with the slave server is available,
(i) sending a log page located from the send_position to the slave server, if is reflected to the log disk; (ii) receiving an acknowledgement message from the slave server, and incrementing the send_position by the size of successfully sent log page; B. otherwise,
(i) waiting until it is recovered; (ii) if the recovered slave server requires the synchronization process, invoking said step of synchronization process of the taken-over server.
- 44. A system having a database for recovering from a system failure, comprising:
main memory for storing the database, one or more log disks for storing log records representing update to the database by storing the physical image of the update; one or more backup disks for storing a copy of the main memory database; a recovery manager having a counter for storing an update sequence number for representing the sequence of database updates.
- 45. The system of claim 44, wherein the system processes transactions where a transaction is a set of operations forming a logical unit in an application.
- 46. The system of claim 44, wherein the database comprises a plurality of fixed-size pages.
- 47. The system of claim 44, wherein the database comprises a plurality of slots.
- 48. The system of claim 44, wherein the update sequence number is a global sequence number representing the sequence of updates to the entire database.
- 49. The system of claim 44, wherein the update sequence number is a transaction sequence number representing the sequence of transactions created.
- 50. The system of claim 44, wherein the update sequence number is a slot sequence number representing the sequence of updates to a given slot of the database.
- 51. The system of claim 45, further comprising one or more buffers for storing log records before storing them to said one or more disks.
- 52. The system of claim 51, wherein said one or more buffers include one or more private buffers for storing both redo and undo transaction log records.
- 53. The system of claim 51, wherein said one or more buffers include one or more public log buffers for storing committed redo transaction log records.
- 54. The system of claim 51, wherein the recovery manager comprises:
a backup loader for loading the backup data from said one or more backup disks into the main memory database; and a log loader for loading the log from said one or more log disks into the main memory database in order to restore the main memory database to the most recent consistent state.
- 55. The system of claim 54, wherein said log loader comprises:
a log reader for reading the log records from said one or more log disks; and a log player for playing the log records to restore the main memory database to the latest consistent state.
- 56. A hot-standby system for a system having a database in main memory where one server takes over another in case of a problem, comprising:
a master server for logging updates to the database, further comprising:
an active transaction table (ATT) for storing the list of active transactions; a private buffer for storing redo and undo transaction log records; and a public log buffer for storing committed redo transactions log records to be written to one or more disks; AND a slave server for logging updates to the database in case of a failure of the master server, further comprising:
an aborted transaction table for storing the list of active transactions; a private buffer for storing both redo and undo transactions log records; and a public log buffer for storing committed redo transaction log records to be written to said one or more log disk.
- 57. A computer-readable storage medium that contains a program for logging updates in a system having a central processing unit (CPU) and main memory for storing a database, one or more log disks for storing log records representing updates to the database, and one or more backup disks for storing a copy of the main memory database, where the program under the control of a CPU performs the steps of:
generating log records where each log record contains the physical image of the updated portion of the database; assigning an update sequence number representing the sequence of updates to each log record; and recovering from a system failure by selectively replaying the log records.
- 58. The storage medium of claim 57, further comprising the step of checkpointing by storing the database in one or more backup disks.
- 59. The storage medium of claim 57, wherein the medium is a CD.
- 60. The storage medium of claim 57, wherein the medium is a magnetic disk.
- 61. The storage medium of claim 57, wherein the medium is a magnetic tape.
RELATED APPLICATION
[0001] This application claims the benefit of co-pending U.S. Provisional Application Ser. No. 60/305,956, filed Jul. 16, 2001, entitled “Parallel Logging and Restart Method and System Based on Physical Logging in Main-Memory Transaction Processing System,” and co-pending U.S. Provisional Application Ser. No. 60/305,947, filed Jul. 16, 2001, entitled “Parallel Logging and Restart Method and System Based on Physical Logging in Disk-Based Transaction Processing System.”
Provisional Applications (2)
|
Number |
Date |
Country |
|
60305956 |
Jul 2001 |
US |
|
60305947 |
Jul 2001 |
US |