Claims
- 1. A method for facilitating reliable execution in a computer system by periodically checkpointing write operations to a main memory of the computer system, comprising:
receiving a write operation directed to the main memory at a memory controller, the write operation including data to be written to the main memory and a write address specifying a location in the main memory into which the data is to be written; looking up the write address in a checkpoint store coupled to the memory controller; if the write address is not associated with any entry in the checkpoint store, creating an entry for the write address in the checkpoint store; writing to the data to be written to the entry in the checkpoint store; and periodically performing a checkpointing operation, wherein the checkpointing operation transfers the data to be written from the checkpoint store to the write address in the main memory.
- 2. The method of claim 1, further comprising:
receiving a read operation at the memory controller, the read operation being directed to a read address specifying a location in the main memory to be read from; looking up the read address in the checkpoint store; if the read address is associated with an entry in the checkpoint store, retrieving data from the entry in the checkpoint store to satisfy the read operation; and if the read address is not associated with any entry in the checkpoint store, retrieving data from the read address in the main memory to satisfy the read operation.
- 3. The method of claim 1, wherein the checkpoint store is organized as a cache memory.
- 4. The method of claim 1, wherein if a new entry is to be added to the checkpoint store and no room exists in the checkpoint store for the new entry, the method further comprises performing a checkpointing operation to transfer the contents of the checkpoint store to the main memory.
- 5. The method of claim 1, wherein performing the checkpointing operation involves:
stopping execution of a central processing unit in the computer system; storing an internal state of the central processing unit to the main memory; transferring the data to be written from the checkpoint store to the write address in the main memory; and recommencing execution of the central processing unit.
- 6. The method of claim 5, wherein the internal state of the central processing unit includes:
contents of internal registers in the central processing unit; and dirty cache lines associated with the central processing unit.
- 7. The method of claim 1, further comprising delaying I/O operations so that the I/O operations are performed during a subsequent checkpoint operation.
- 8. The method of claim 1, wherein if an error occurs during execution of the computer system, the method further comprises rolling back to a preceding checkpoint by clearing the checkpoint store and restoring the internal state of the central processing unit from the main memory.
- 9. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for facilitating reliable execution in a computer system by periodically checkpointing write operations to a main memory of the computer system, the method comprising:
receiving a write operation directed to the main memory at a memory controller, the write operation including data to be written to the main memory and a write address specifying a location in the main memory into which the data is to be written; looking up the write address in a checkpoint store coupled to the memory controller; if the write address is not associated with any entry in the checkpoint store, creating an entry for the write address in the checkpoint store; writing to the data to be written to the entry in the checkpoint store; and periodically performing a checkpointing operation, wherein the checkpointing operation transfers the data to be written from the checkpoint store to the write address in the main memory.
- 10. The computer-readable storage medium of claim 9, wherein the method further comprises:
receiving a read operation at the memory controller, the read operation being directed to a read address specifying a location in the main memory to be read from; looking up the read address in the checkpoint store; if the read address is associated with an entry in the checkpoint store, retrieving data from the entry in the checkpoint store to satisfy the read operation; and if the read address is not associated with any entry in the checkpoint store, retrieving data from the read address in the main memory to satisfy the read operation.
- 11. The computer-readable storage medium of claim 9, wherein the checkpoint store is organized as a cache memory.
- 12. The computer-readable storage medium of claim 9, wherein if a new entry is to be added to the checkpoint store and no room exists in the checkpoint store for the new entry, the method further comprises performing a checkpointing operation to transfer the contents of the checkpoint store to the main memory.
- 13. The computer-readable storage medium of claim 9, wherein performing the checkpointing operation involves:
stopping execution of a central processing unit in the computer system; storing an internal state of the central processing unit to the main memory; transferring the data to be written from the checkpoint store to the write address in the main memory; recommencing execution of the central processing unit.
- 14. The computer-readable storage medium of claim 13, wherein the internal state of the central processing unit includes:
contents of internal registers in the central processing unit; and dirty cache lines associated with the central processing unit.
- 15. The computer-readable storage medium of claim 9, wherein the method further comprises delaying I/O operations so that the I/O operations are performed after a subsequent checkpoint operation.
- 16. The computer-readable storage medium of claim 9, wherein if an error occurs during execution of the computer system, the method further comprises rolling back to a preceding checkpoint by clearing the checkpoint store and restoring the internal state of the central processing unit from the main memory.
- 17. An apparatus that facilitates reliable execution in a computer system by periodically checkpointing write operations to a main memory of the computer system, comprising:
a memory controller coupled to the main memory; a receiving mechanism that is configured to receive a write operation directed to the main memory at the memory controller, the write operation including data to be written to the main memory and a write address specifying a location in the main memory into which the data is to be written; a checkpoint store, coupled to the memory controller, which is configured to store write operations directed to the main memory; a lookup mechanism that is configured to look up the write address in the checkpoint store; wherein the checkpoint store is configured to create an entry for the data to be written and the write address, if the write address is not associated with any entry in the checkpoint store; a writing mechanism that is configured to write the data to be written to the entry in the checkpoint store; a checkpointing mechanism that is configured to periodically perform a checkpointing operation, wherein the checkpointing operation transfers the data to be written from the checkpoint store to the write address in the main memory.
- 18. The apparatus of claim 17,
wherein the receiving mechanism is additionally configured to receive a read operation, the read operation being directed to a read address specifying a location in the main memory to be read from; wherein the lookup mechanism is additionally configured to look up the read address in the checkpoint store; and further comprising a reading mechanism that is configured to perform the read operation; wherein the reading mechanism is configured to retrieve data from the entry in the checkpoint store if the read address is associated with an entry in the checkpoint store; and wherein the reading mechanism is configured to retrieve data from the read address in the main memory to satisfy the read operation if the read address is not associated with any entry in the checkpoint store.
- 19. The apparatus of claim 17, wherein the checkpoint store is organized as a cache memory.
- 20. The apparatus of claim 17, wherein if a new entry is to be added to the checkpoint store and no room exists in the checkpoint store for the new entry, the checkpointing mechanism is configured perform a checkpointing operation to transfer the contents of the checkpoint store to the main memory.
- 21. The apparatus of claim 17, wherein the checkpointing mechanism is configured to:
stop execution of a central processing unit in the computer system; store an internal state of the central processing unit to the main memory; transfer the data to be written from the checkpoint store to the write address in the main memory; and to recommence execution of the central processing unit.
- 22. The apparatus of claim 21, wherein the internal state of the central processing unit includes:
contents of internal registers in the central processing unit; and dirty cache lines associated with the central processing unit.
- 23. The apparatus of claim 17, further comprising an I/O processing mechanism that is configured to delay I/O operations so that the I/O operations are performed after a subsequent checkpoint operation.
- 24. The apparatus of claim 17, further comprising a rollback mechanism, wherein if an error occurs during execution of the computer system the rollback mechanism is configured to roll back to a preceding checkpoint by clearing the checkpoint store and restoring the internal state of the central processing unit from the main memory.
RELATED APPLICATION
[0001] The subject matter of this application is related to the subject matter in a co-pending non-provisional application by the same inventor(s) as the instant application and filed on the same day as the instant application entitled, “Method and Apparatus for Storing Prior Versions of Modified Values to Facilitate Reliable Execution,” having serial number TO BE ASSIGNED, and filing date TO BE ASSIGNED (Attorney Docket No. SUN-PS947-RSH).