This invention relates to apparatus and techniques for achieving fault tolerance in computer systems and, more particularly, to techniques and apparatus for establishing and recording a consistent system state from which all running applications can be safely resumed following a fault.
“Checkpointing” has long been used as a method for achieving fault tolerance in computer systems. It is a procedure for establishing and recording a consistent system state from which all running applications can be safely resumed following a fault. In particular, in order to checkpoint a system, the complete state of the system, that is, the contents of all processor and I/O registers, cache memories, and main memory at a specific instance in time, is periodically recorded to form a series of checkpointed states. When a fault is detected, the system, possibly after first diagnosing the cause of the fault and circumventing any malfunctioning component, is returned to the last checkpointed state by restoring the contents of all registers, caches and main memory from the values stored during the last checkpoint. The system then resumes normal operation. If inputs and outputs (I/Os) to and from the computer are correctly handled, and if, in particular, the communication protocols being supported provide appropriate protection against momentary interruptions, this resumption from the last checkpointed state can be effected with no loss of data or program continuity. In most cases, the resumption is completely transparently to users of the computer.
Checkpointing has been accomplished in commercial computers at two different levels. Early checkpoint-based fault-tolerant computers relied on application-directed checkpointing. In this technique, one or more backup computers were designated for each running application. The application was then designed, or modified, to send periodically to its backup computer, all state information that would be needed to resume the application should the computer on which it was currently running fail in some way before the application was able to establish the next checkpoint.
This type of checkpointing could be accomplished without any specialized hardware, but required that all recoverable applications be specially designed to support this feature, since most applications would normally not write the appropriate information to a backup computer. This special design placed a severe burden on the application programmer not only to ensure that checkpoints were regularly established, but also to recognize what information had to be sent to the backup computer. Therefore, in general, application-directed checkpointing has been used only for those programs that have been deemed especially critical and therefore worth the significantly greater effort required to program them to support checkpointing.
System-directed checkpointing has also been implemented in commercial computer systems. The term “system-directed” refers to the fact that checkpointing is accomplished entirely at the system software level and applications do not have to be modified in any way to take advantage of the fault-recovery capability offered through checkpointing. System-directed checkpointing has the distinct advantage of alleviating the application programmer from all responsibility for establishing checkpoints. Unfortunately, its implementation has been accomplished through the use of specialized hardware and software, making it virtually impossible for such systems to remain competitive in an era of rapidly advancing state-of-the-art commodity computers.
More recently, techniques have been disclosed for achieving system-directed checkpointing on standard computer platforms. These techniques, however, all require either specialized plug-in hardware components or else modifications to the operating system kernel. The plug-in components intercept either reads from memory, or writes to memory, so that the information needed to establish a checkpoint can be made available to the checkpointing software. This procedure suffers from the fact that the intercepting hardware introduces additional delays in the processor-to-memory path, making it difficult to meet the increasingly tight timing requirements for memory access in state-of-the-art computers. This problem can be circumvented if the operating system kernel is modified to enable certain memory writes to be interrupted momentarily so that either the pre-image of the addressed section of memory, or the address itself, can be captured and recorded elsewhere in memory. The problem with this approach is that it can be implemented only on systems having operating systems that have be so modified.
Additional features are embedded in an otherwise standard memory controller enabling it to support a number of different system-directed checkpoint strategies. Moreover, subsets of these features can support each of the various strategies individually. In particular, in the simplest embodiment of the present invention, the features embedded in the controller enable it to store, into a buffer located either in a dedicated region of main memory or to a designated I/O device, the address of each block of memory being written to, and, optionally, a copy of the data being written. In addition, it is also given the ability, under explicit command, to handle all accesses to memory from any I/O device in a non-standard way that prevents checkpointed data from being corrupted and prevents protected data from being inadvertently released. These enhancements along with the appropriate software support make it possible to capture and retain the computer state at each checkpoint by flushing all of the modified contents of each processor's cache to main memory and then transferring the memory blocks that have been modified since the last checkpoint either to a local shadow memory or over an I/O communication link to a backup computer and to restore the checkpointed state following a fault.
In a slightly more complex embodiment, the controller is also given the ability, under explicit command, to access those blocks in order to transfer their contents, along with their associated addresses, to a local shadow memory or to a remotely located backup computer.
In another embodiment of the invention, the controller is further is further embedded with features that enable it to store the relevant memory addresses onto a main-memory-resident buffer in response to any of the following processor bus operations: read with intent to modify, read with exclusive ownership, cache-line invalidation. This added capability can be used to eliminate the need to flush the processors' caches to establish a checkpoint.
In still another embodiment of the invention, a bit-map memory (or alternatively, an interface to an external bit-map memory), containing one bit for each main-memory block, is integrated into the memory controller. This bit-map memory offers advantages when used with any of the aforementioned enhancements by eliminating the need to copy more than once blocks having the same memory address. A second bit-map memory is also added in a further enhancement in accordance with the present invention. With two bit-map memories, blocks can be copied in the background, while normal processing continues, without the need for a buffer for storing modified data blocks. A bit is set in one of the bit maps whenever the corresponding main memory block address has been stored in the address buffer, and reset in the second bit map, which reflects the buffer state as of the last checkpoint, when the corresponding block has been copied to the shadow memory. Following each checkpoint, the roles of the two bit-maps are reversed. For this embodiment of the invention, the memory controller must also be enhanced so as to delay writes to memory blocks that are scheduled to be copied to shadow memory, as indicated in the relevant bit map, but have not yet been copied, until that copy can be effected. Alternatively, in yet another embodiment of the invention, the two bit-map memories can be used to enable a locally resident shadow memory to be kept in a state reflecting the most recent checkpoint without the need for any main memory blocks whatsoever to be copied from one location to another. In this case, checkpoints can be established simply by flushing the processor caches and reinitializing the bit maps.
In all of these embodiments of the invention, the write-address-buffering technique used for remote checkpointing can also be used in a clustered environment with each computer effectively serving as the unique backup for one other computer in the cluster.
All of the preceding embodiments of the invention require the existence of a shadow memory either locally or in a second computer. Another embodiment of the invention, however, allows local checkpointing to be accomplished without the need for a shadow memory in this case, additional logic is embedded in the memory controller that, on each memory write, delays the write until the memory block being accessed is copied to a main-memory-resident data buffer and its associated address to a main-memory-resident address buffer. Checkpointing is then accomplished simply by flushing the processors' caches. Memory-to-memory copies are needed only in the event of a fault in which event fault recovery entails halting I/O-initiated writes to main memory and copying the buffered data back from the buffer to the corresponding main-memory locations in last-in, first-out order. This enhancement can also be combined with the aforementioned processor bus snooping capability to obviate the need to flush the processor caches and, independently, with the integrated bit map to eliminate the need to intervene in a write to any given memory block more than once during any checkpoint interval.
All of the aforementioned memory controller enhancements enable checkpointing techniques to be realized using otherwise standard hardware platforms running standard operating systems. As a consequence, when these techniques are used in conjunction with the checkpointing and rollback procedures described in U.S. Pat. No. 6,622,263, standard computers can be rendered fault tolerant without requiring the major hardware and software modifications normally associated with fault-tolerant computers. All applications receive the benefit of fault tolerance without having to be modified in any way.
The above and further advantages of the invention may be better understood by referring to the following description in conjunction with the accompanying drawings in which:
a is a modified version of
a is a modified version of
a is a flowchart illustrating the procedure performed by memory controller to copy the modified data blocks, identified using the procedure described in
b is a flowchart showing the procedure performed by memory controller to establish a checkpoint when the procedures described in
a and 11b are flowcharts illustrating the procedures executed by the memory controller to support checkpoint and rollback operations, respectively, when the block-state-labeling method is used to implement local post-image checkpointing.
a and 13b are flowcharts a flowchart showing the procedures executed by the memory controller to support checkpoint and rollback operations, respectively, in support of pre-image checkpointing.
Several embodiments of the invention are described. All of these embodiments can be implemented with the same enhanced memory controller since the required logic elements are similar for each of them. The different embodiments will be described separately, however, since none of them requires the full complement of enhancements. All of the required enhancements can be easily implemented using standard procedures by anyone knowledgeable in the state of the art and, with the possible exception of those embodiments utilizing integrated memory, represent a small increment in the complexity of the logic already present in existing memory controllers.
The checkpointing strategies implemented by these various embodiments fall into two general categories. The first is referred to as “post-image” checkpointing and requires the existence of a shadow memory located either in the subject computer itself, hereafter called the “primary” or “protected” computer, or in a second computer called the “backup” or “remote” computer. In either case, the shadow memory is updated at the conclusion of each checkpoint interval to reflect the state of the primary computer at that instant in time. If the shadow memory is in a backup computer, a strategy referred to as “remote” checkpointing, the updating process preferably involves first copying any shadow updates to a buffer in the backup and from there to the shadow memory. Handling the updates in this manner guarantees that the shadow does indeed represent a consistent checkpoint state even if the primary fails while the updates are being transferred. If the shadow memory is located in the primary computer, a strategy called “local” checkpointing, such precautions are unnecessary because any failure that would prevent the copying process from being resumed would presumably be fatal in any case. Nevertheless, local checkpointing is attractive since it has been shown to provide a high degree of resilience to faults caused both by software bugs and by hardware transient events and since these two types of events together account for a large majority of computer crashes.
The second checkpointing strategy, “pre-image” checkpointing, does not require a shadow memory and is applicable only to local checkpointing. In this case, the pre-image of any memory block is captured before it is allowed to be modified following a checkpoint and stored in a buffer location along with its address. The recovery process following a fault then entails copying the pre-images, i.e., the memory images that prevailed at the time of the last successful checkpoint, back to their original locations in main memory, thereby restoring the system state that existed at the time of that checkpoint.
It should be noted all system-level checkpointing strategies rely on the assumption that the entire state of the system is captured at each checkpoint. This requires the processors in a multiprocessor system to rendezvous when it is time to establish a checkpoint and for each of them to force its state onto the appropriate memory stack and possibly, depending on the particular embodiment of the invention being implemented, to flush the modified contents of their caches out to main memory. In addition, sufficient state must be retained in main memory to ensure that I/O operations can be restarted correctly following a fault. These requirements can be satisfied through the use of separate I/O processors or through other procedures discussed in detail in U.S. Pat. No. 6,622,263. Similarly, the rollback and recovery procedures discussed in that patent are identical to those assumed here. The focus of this disclosure is on an apparatus and associated procedure for enabling the relevant contents of main memory to be captured at each checkpoint and either retained until the next checkpoint for use, in the event of a fault, to restore memory to its last checkpointed state, or else used to maintain a shadow memory in a state identical with the state of main memory at the time of the most recent checkpoint and, in either case, to do so with minimum modifications to an otherwise standard computer.
Regardless of how it is implemented, however, the memory control unit 112 contains the logic needed to communicate between main memory and the processors and I/O control units. The memory control unit typically implements the following features that are of particular interest in the present invention:
The present disclosure entails no physical modification to this generic architecture other than the memory controller enhancements to be described here. In some embodiments of the invention, it requires a small segment of main memory (113) to be partitioned off and used as an address buffer (119) and in other embodiments, it also requires a second segment of memory to be partitioned as a data buffer (120). In all embodiments, the required memory controller enhancements include the ability to implement certain memory-access and data-transfer sequences to be described, either autonomously after being commanded to do so by one of the processors, or under step-by-step processor control. In support of these activities, the memory controller is also enhanced with a status register containing status bits that can be individually set by the processors to command certain controller operations and read by the processors to determine when these various operations have been completed. Some of these status bits can also be set or reset by the controllers themselves to indicate when certain operations have been completed. These status bits can either be monitored by the processors or, preferably, at the time they are set or reset, cause the memory controller to generate an interrupt to the processors informing them of that fact.
In all of the local checkpoint embodiments of the invention, the memory controller is also enhanced so as to support a “fault mode” of operation. The controller is commanded by one or more processors to enter fault mode immediately upon detection of a fault and remains in fault mode until explicitly commanded to exit that mode of operation. When in fault mode, the controller continues to respond to I/O-initiated memory-accesses in the normal way, using normal hand-shake protocols, but no data written to memory is in fact actually stored in memory and, at least during pre-image restoration, all data read from memory is either read from the same, previously initialized, memory location, regardless of the memory location being addressed or else is simply replaced by a string of zeros. This is to insure that memory is not corrupted with I/O data while it is being restored to the state that existed at the time of the last successfully established checkpoint and that no protected data is inadvertently transmitted to an I/O device before memory restoration is completed and I/O activity can be restarted following recovery.
Finally, since it may be desirable to suppress the enhancements described herein in cases in which checkpointing is not needed or not feasible for other reasons, the enhanced controller features are activated only after a processors sets a “checkpoint-enabled” status bit and are deactivated when this bit is reset.
In the following description of the various embodiments of the invention, the term “memory block” or simply “block” will be used repeatedly. This refers to a fixed-size segment of memory. At minimum, its size is the smallest segment of memory that can be modified in one operation, typically a cache line. It can, however, be as large as a memory page or even larger. The most efficient size is a function of both the bus transfer parameters of the computer in question and of the specific embodiment of concern. The specific block size, however, is not material so far as the details of the various embodiments are concerned.
1) Post-Image Checkpointing Using a Memory-Resident Address Buffer
The simplest of the embodiments of the present invention implements a post-image checkpointing strategy and involves only a main-memory resident address buffer (119) and the memory controller enhancements needed to implement the flowchart shown in
In this embodiment, as well as in all subsequent post-image checkpointing embodiments, the controller may implement either only local or only remote checkpointing, or if designed to implement both (i.e., to support both memory-to-memory and memory-to-I/O transfers of backup data) it must contain a status bit through which either the checkpointing software or a hardwired input pin can inform it which strategy is being implemented.
In accordance with the flowchart in
When it is time to establish a checkpoint, the computer's processors rendezvous in the usual manner; each processor flushes its internal state and the contents of all its modified cache lines out to main memory. When they have completed flushing their caches, they again rendezvous and a designated processor sends a command to the memory controller placing it in checkpoint mode. The processors then cease normal program execution and either periodically poll a status register in the memory controller to determine when it has exited checkpoint mode or, alternatively, await an interrupt from the controller informing them of that fact, before resuming normal execution. Upon exiting checkpoint mode in the case of remote checkpointing, either one of the local processors or the controller itself sends a checkpoint-complete message to the backup computer so that it can recognize a boundary in its buffer indicating that all blocks received prior to this boundary can now be moved to the appropriate locations in the backup's shadow memory. Since, in some implementations, it may be possible in rare circumstances for the backup computer to experience a buffer overflow, caused by data generated during the current checkpoint interval arriving faster than data buffered during the previous checkpoint can be transferred to the shadow memory, standard flow-control protocols are used in such cases to halt the copying process and leaving the memory controller in checkpoint mode until the buffer is able to accept new data. To prevent a failure in the backup computer from causing excessive delays, processors in the protected computer monitor the amount of time spent in checkpoint mode and reset the checkpoint-enable status bit causing the controller to exit checkpoint mode and to cease further checkpoint operations. Alternatively, if the remote buffer does overflow, the backup can signal the protected computer to transmit the contents of its entire memory to the backup shadow memory using standard protocols for remote checkpointing resynchronization.
The decision to enter checkpoint mode is governed by a number of factors (e.g., elapsed time since the last checkpoint, pended synchronous I/O events, etc.) one of which may be the fact that the address buffer is approaching capacity. To prevent buffer overflow, the memory controller may either make the buffer-address register available to be read by the processors or, alternatively, may generate an interrupt when the buffer reaches a pre-defined fraction of its capacity. In the latter case, the fraction precipitating the interrupt is preferably settable by the checkpoint software since different applications may require different strategies.
The controller operations in checkpoint mode are shown in
While the operations in the previous paragraph are described as though the controller itself implements the control functions needed to carry them out, it should be apparent that they can equally well be implemented by one or more processors reading the successive addresses from the address buffer and effecting the copy through ordinary read and store operations. Implementing these functions in the memory controller, however, adds only modest complexity to the controller and can significantly reduce the amount of time needed to effect the data transfer.
2) Post-Image Checkpointing Using Expanded “Block-Capture” Operation
In a second embodiment of the invention, the definition of “block-capture operation” is expanded to include, in addition to write operations, any operation that indicates the possibility of a deferred write to main memory, e.g., in the case of the MESI cache-coherency protocol, read with exclusive ownership or read with intent to modify and cache-line invalidate operations. With this change in definition and with the proviso that all data must be recognized as shared data, both the normal-mode operation shown in
3) Post-Image Checkpointing Using I/O-Resident Address and Data Buffers
The memory-resident buffers required with the first of the two previously described implementations can be replaced with buffers in an external I/O device dedicated, or partially dedicated, to this purpose. If the address and data associated with the write operation are both simultaneously stored to an I/O buffer, and if the checkpoints are to be established in a remote computer, the previously described memory controller functions can be relegated instead to the I/O device itself. On any memory write, the memory controller also simultaneously relays the address and associated data to the I/O device. If the controller-to-I/O transfer rate is less than the controller-to-memory rate, however, the memory controller must be able to delay successive write operations to accommodate the reduced I/O rate.
The I/O device transfers the captured addresses and data to address and data buffers in the corresponding I/O device in the backup. This I/O device, in turn, uses standard direct-memory-access (DMA) techniques to transfer the data into the backup's main memory once it has been sent a command indicating that a checkpoint has been established. The need to halt processing while the copy is taking place can also be eliminated if the buffers in the I/O device are designed to accept new post-checkpoint data while also transferring the pre-check point data to the backup computer. Checkpointing occurs as previously described but once the processors have flushed their caches and signaled the I/O device that the checkpoint has been established, normal processing can resume. To prevent a buffer overflow in the I/O device, either: 1) the I/O device must have a readable status register by which the processors can monitor how nearly the buffers are filled to capacity; 2) the I/O device must be designed to generate a processor-visible interrupt indicating that capacity is being approached; or 3) the memory controller must implement either of these preceding two functions, as previously described.
The need for cache flushing can be eliminated in this case as well if all operations that can result in a deferred write to main memory are included in the definition of “block-capture operations”. Since the memory locations corresponding to the captured addresses must all be read following each checkpoint using this approach, however, the checkpoint operations are essentially identical, regardless of whether they are implemented in the memory controller or in the I/O device.
4) Post-Image Checkpointing Using Two Memory-Resident Address and Two Memory-Resident Data Buffers
Another embodiment of the invention allows the data to be copied in background mode simultaneously with normal processing and without requiring a dedicated I/O device of the sort required for the previous implementation. To accomplish this, three more main-memory buffers are defined, a second address buffer (119) and two data buffers (120), with each data buffer entry equal in size to a memory block. To support these additional buffers, the memory controller contains a total of four hardwired or, preferably, settable, base address registers, each pointing to the initial location of one of the buffers, two counters, an end-count register and a three additional bits in its status register. Subsequent addresses are determined, as before, by concatenating the contents of these base address registers with the contents of a counter. One counter is used for one address and data buffer pair and the second used for the other. Since a data block is generally larger than an address, the counter contents are shifted to the left by the amount needed to account for this difference before being concatenated with the remainder of the address. The end-count register is used to hold the incremented content of the buffer address counter at each checkpoint. The three status bits, called the “current-buffer pointer”, the checkpoint-complete bit and the “checkpoint-copy-complete” bit, enable the controller to determine, among other things, which set of buffers is to be used for current write operations and which for copy operations. In particular, the exclusive-nor of the bits in the first and third of these status bits determines which set of buffers is currently being copied to the shadow location. As before, a fourth status bit, either hardwired or settable by software, informs controllers designed to support both local and remote checkpointing whether the shadow memory is located locally or in a backup computer.
As shown in
Checkpointing is initiated as before, but is accomplished without having to wait for the modified data blocks to be copied. As shown in
In a slight variation on this embodiment, the two address and two data buffers can be combined into one circular buffer with one counter (the buffer-address counter) indicating the next available buffer location to which addresses and data are to be stored and the second (the checkpoint counter) the next buffer location from which addresses and data are to be copied to the backup location. In this case, the two counters point to different locations in the same address buffer and different locations in the same data buffer. The response to a write operation is again that depicted in
When the shadow memory resides in a backup computer, no I/O event pended on checkpoint completion, however, can be released until all memory blocks that were modified during the interval immediately preceding that checkpoint have been copied to the remote buffer. Before releasing those I/O operations, therefore, the processors wait for the checkpoint-copy-complete status bit to be set and, as with the checkpoint mode status bit, are informed of that event either by polling or, preferably, through an interrupt.
Once the controller resets the checkpoint-copy-complete bit, the buffer copy routine can immediately begin copying the buffer currently being filled. This is illustrated in the flowchart in
If the checkpoint-copy-complete bit is set (612) and remote checkpointing is in effect (617), the copy operation can continue from the buffer currently being filled since the data blocks and addresses are copied to a buffer in the backup computer and are not moved to the backup's shadow memory until a checkpoint is declared by the protected computer. If the protected computer fails before the next checkpoint, the contents of remote buffer that were copied to it after the last declared checkpoint are simply ignored. Thus, if the contents of the checkpoint counter and the current address counter are not equal, i.e., if there are modified blocks that have not yet been transferred to the remote buffer (618), the corresponding block identified by the checkpoint counter can be copied as previously described (614). In this case, since the checkpoint-copy-complete bit is set, the block is copied from the buffer currently being filled. The primary advantages of doing this are the reduction in the size of the local buffers and, since it reduces the interval between the time the protected computer establishes a checkpoint and the time the checkpoint-copy-complete bit is set, a potentially substantial reduction in the delay before checkpoint-pended I/O can be released. In the vast majority of cases, blocks will be copied immediately after they are modified, thereby reducing the time needed to establish a checkpoint to a minimum.
It should be noted that, if the memory controller is implemented to carry out these copying operations autonomously, this same controller functionality can be used in the backup computer enabling it to support the concurrent loading of one buffer pair through DMA operations from the designated I/O device while it is moving data from the second data buffer to the addresses specified in its associated address buffer. In this case, the status bit used to distinguish between remote and local checkpointing is set to “local”. The I/O device, upon receipt of a checkpoint-copy-complete message, generates a processor-visible interrupt and sends data indicating the number of blocks that have been transferred since the last checkpoint-copy-complete message. The processor then loads this count into the memory controller's end-count register and resets its checkpoint-copy-complete bit and toggles its base-address-register pointer. When operating thus in the backup computer, the memory controller copy routine remains as shown in
Further, it should be apparent that a memory controller can be implemented to provide the functionality needed for it to implement concurrently any combination of the operations described in the previous paragraphs, and, in particular, operations needed both to enable a computer to accept checkpoint data from a remote computer and to transmit its own checkpoint data to a remote backup computer. The number of registers and counters it would have to support, of course, has to equal the sum of those needed for each role. For example, if it is to support both roles simultaneously using two address registers and two data buffers for each role, it would have to support four address and four data registers. Other combinations, e.g., using two address and two data registers to support checkpointing its own data in combination with an I/O device that simultaneously implements the transfer of a remote computer's checkpoint data into its shadow memory, are also possible as are combinations of any of the previously described implementations with any of those that follow.
5) Post-Image Checkpointing Using a Bit-Map Memory
It should also be noted that the copying time resulting from any of the aforementioned embodiments of the invention using memory-resident buffers could be reduced somewhat by integrating the address buffers into the controller itself, thereby saving one external memory access on each transfer. A generally more efficient use of internal memory is possible, however, by integrating into the controller a memory segment containing a single bit for each memory block in physical memory. In all the previously described post-image checkpointing embodiments of the invention, memory blocks are copied to their backup locations in first-in, first-out (FIFO) fashion. That is, the first blocks to be modified are the first copied. This ensures that, in the event of multiple modifications to a given block, the last modification is the one that survives, overwriting any earlier modifications of that same block in the copying process. But the need to copy any given block more than once can be eliminated entirely by copying, instead, in last-in, first-out (LIFO) order and by setting a bit in the controller's integrated memory corresponding to each physical memory block copied. Prior to any copy, the controller then checks this bit-map to determine if the block has already been copied and, if it has, skipping to the next (in this case, previous) address on the queue of addresses to be copied. Once all blocks have been copied, the controller's memory is cleared. The copying time in all of the previously described embodiments can be reduced somewhat using this procedure.
When this embodiment is used, however, the checkpoint procedure needs to be modified slightly as shown in
In addition, the copying routine shown in
If a unified buffer is implemented, the only difference is that only one base address register is used in step (622) and the checkpoint counter test involves comparing it with the contents of the prior-boundary register (628), a match indicating that the most recent version of all relevant blocks have now been copied.
Note that, in contrast to the copy routine described in
6) Post-Image Checkpointing Using Two Bit-Map Memories
An alternative use of two integrated (or accessible external) single-bit-wide memories is possible if one is used as a bit-map showing which memory blocks have been modified since the last checkpoint and the second used to show which of the blocks that were modified prior to the last checkpoint have been copied to a local shadow memory or remote computer. In this case, background copying can be supported without any main-memory-resident address or data buffers. The memory controller routine needed to exploit this enhancement is shown in
On any memory access the routine first checks to see if it is a block-capture operation (711), with the term “block-capture” as previously defined (i.e., either only a write operation or any of the operations that will potentially result in the modification of the block in question, including, of course write operations). If it is not, the access is handled in the normal way (716). If it is, the controller sets the bit in the modified map corresponding to the addressed block (712) and checks whether the copy-complete bit has been set (713). If it has, the access is again handled in the normal way; if it is not, the routine checks the corresponding bit in the copy map (714). If the latter bit is set, then, depending on whether local or remote checkpointing is being supported, the controller copies the current contents of the block to either the local shadow or the block contents and its associated address to the remote shadow buffer and then resets the copy bit (715). Following that, or if the copy bit is not set, it again handles the access in the normal way (716).
A flowchart of the copying routine implemented by the memory controller to support this embodiment of the invention is shown in
The controller routine needed to commit a checkpoint in this embodiment is depicted in the flowchart in
Note that this last action restarts the scan for modified, but not yet copied, memory blocks even though, in the case of remote checkpointing many of the modified blocks may already have been copied. Since the main memory will, in general, contain a large number of blocks and since the vast majority of those blocks will not have been modified since the last checkpoint, it is preferable, with this embodiment of the invention, for a number, say 32 or 64, of copy-map bits to be scanned simultaneously. If all bits are zero, as will typically be the case, the copy routine can immediately proceed to the next set without having to test each bit individually.
7) Checkpointing Using a Block-State Memory
Even greater efficiencies can be realized with a bit-map memory containing two bits for each memory block in physical memory when checkpointing is directed to a local shadow memory. In this case, the need for memory-to-memory copies for checkpointing purposes can be eliminated entirely if, on each memory access, the controller checks the state of its internal memory location corresponding to the block being accessed and directs the access to either of two main memory locations in accordance with that state. In this embodiment, the computer's primary and shadow memories are no longer fixed physical locations; rather, either of two physical locations can be the primary location at any given time while the other retains the state of the system that existed at the time of the last checkpoint. The algorithm used by the controller to determine which is which is shown in
To realize this embodiment, the memory controller implements the flowchart shown in
When a checkpoint is declared, the controller is sent a command to enter into checkpoint mode following which it sets the checkpoint-mode status bit and executes the routine shown in the flowchart in
When it is necessary to institute a rollback, the controller executes the routine shown in
Note that rollback mode and fault mode are two different things, the former subsumed by the latter. As previously stated, the controller is commanded to enter fault mode immediately on the discovery of a fault and remains in that mode until recovery is completed, handling I/O accesses as described above. Rollback mode in this instance simply forces the memory controller to execute the rollback routine depicted in the flowchart in
8) Pre-Image Checkpointing
Memory-controller enhancements of the sort described in the previous paragraphs can also be used to implement pre-image checkpointing. In this case, a partition of main memory is used to buffer the pre-images of any blocks that are modified following the establishment of each checkpoint and a second partition used to store the physical addresses of those blocks. Following each checkpoint, the buffers are effectively cleared by zeroing out the buffer counter and the process starts anew. If a fault is detected, the contents of the data buffer accumulated since the last checkpoint are copied back to the locations indicated by the corresponding addresses in the address buffer.
The procedure implemented by the memory controller to accomplish this is shown in
Once the data block and associated address are copied to the buffers, the buffer addresses are both incremented to point to the next available location (1213) and the controller then carries on in the normal way (1214) executing the standard memory access procedures and bus protocols. For purposes discussion, it is assumed that the buffer addresses are generated as previously described using one counter, here called the checkpoint counter, concatenated with either hard-wired or settable base registers.
To effect a checkpoint, the processors rendezvous in the usual way, save their states and, if required, the flush their caches, then command the memory controller to enter checkpoint mode (
Following a fault, the controller is, as always, first put in fault mode and then into rollback mode. In rollback mode, it executes the procedure shown in
As with post-mage buffering, the possibility of copying to the same main-memory location more than once can be eliminated by implementing a small memory having one bit for every physical block in main memory. In this case, the corresponding bit is inspected before any block is copied to the buffer (cf.
This application is related to, and claims priority of, U.S. provisional application Ser. No. 60/640,356, filed on Jan. 3, 2005, by Jack J. Stiffler and Donald Burn.
| Number | Date | Country | |
|---|---|---|---|
| 60640356 | Jan 2005 | US |