The invention relates generally to managing the volatile storage of information and more particularly to assessing the integrity of executable code stored in volatile memory without significantly adding to the cost or the processing overhead of the system in which the volatile memory resides.
As integrated circuit geometries continue to decline, circuits become more susceptible to both hard errors and soft errors. Hard errors are introduced by contaminants that physically damage the circuitry. For example, a hard error may occur when a particle is lodged in the gate oxide of a dynamic random access memory (DRAM) cell during fabrication. Typically, hard errors require that the entire integrated circuit be discarded. On the other hand, soft errors are transient in nature. In a DRAM, a soft error is an error in memory content that can be corrected by a fresh writing to the DRAM.
In addition to the decline in integrated circuit geometries in order to increase circuit density, reductions in power supply voltages increase the susceptibility of circuits to soft errors. A DRAM cell is able to store a charge in a capacitor, with the charged capacitor representing a logic “one” and a discharged capacitor representing a logic “zero.” A lower supply voltage translates into a smaller capacitor charge, so there is a greater likelihood that the charge will be unintentionally depleted to a level at which it will be mistakenly read as logic “zero.”
Another factor that affects the susceptibility of a circuit to soft errors relates to the operation of the device in which the circuit resides. Some devices must retain data in internal memories for extended periods of time between actual uses. If the executable code is stored in volatile memory during the periods of inactivity, soft errors in the executable code will accumulate and be undetected until the next use. In a DRAM cell, charge leakage may be the result of bombardment by radiation, such as alpha particles. Alpha particle radiation is generated as a consequence of decay of trace radioactive elements in the packaging material of the integrated circuit chip. With extended periods of inactivity, the adverse effects of such decay are more likely to cause system failures when the corrupted executable code is finally run.
Two examples of devices which are often operated with extended periods of inactivity between peak periods of high use are printer controllers and handheld computing devices.
In operation, the cache SRAM 24 operates as conventional on-chip cache. The cache memory controller 22 manages the use of the cache SRAM 24 to free and fill space for instructions that are likely to be requested (i.e., fetched) by the processor 14. The embedded SRAM 28 and DRAM 26 may be managed by the second memory controller 30 to handle larger blocks of information for possible use by the processor. A printer having the ASIC is able to format incoming data in a page description language (PDL) and execute the instructions of a PDL interpreter program from cache, while the PDL program as a whole is stored in off-chip memory 12, which is typically memory of a host computer, such as a desktop computer.
There are a wide number of variations of ASICs that may be used in a printer, but each such circuit is likely to be operated in conditions in which it is “idle” for long periods between uses. It is during these idle times that soft errors are most likely to occur in the storage of executable code and/or data. Techniques for reducing such occurrences are known. For example, there are integrated circuit packaging materials that contain low levels of impurities that decay to generate alpha particles. By reducing the number of alpha particles that are generated, the statistical probability of a soft error is reduced. However, the cost of the materials is significantly greater than the cost of conventionally used packaging material. Thus, this solution increases the cost of integrated circuit chips.
Circuitry may be added to the device to provide detection of soft errors while the code is being run. One known scheme is to incorporate parity detection circuitry for detecting single and/or multiple bit errors. However, there is significant circuitry overhead that is added in providing the parity bit or bits. The additional circuitry increases the cost of the device. Another approach is to provide error-correction circuitry (ECC). ECC may be used to detect and correct single and/or multiple bit errors. The concerns with this approach are that there is an even greater increase in circuitry overhead that adds to the cost and there is additional complexity that can result in decreased performance in the SRAM/DRAM data path. The memory controller logic may become more complicated with the handling of extra processing states that may be required by the “repair” feature of the ECC.
What is needed is a low cost method and device for providing recovery from memory errors, particularly soft memory errors in executable code stored in volatile memory.
Managing volatile storage of information that is processed during operation of a device having extended periods of inactivity between periods of activity includes systematically checking the contents of volatile memory between periods of activity. In one application, the information is executable code for operating the device and the check of volatile memory is initiated upon detection of passage of a preselected time period. The volatile memory is random access memory that is particularly susceptible to soft memory errors, such as DRAM and SRAM. By limiting the volatile memory checking to times in which the device is inactive, there is no performance penalty resulting from the check. Moreover, by limiting the occurrences of volatile memory checking, the effect on power consumption can be managed. For example, the checking may be limited to once per hour or once per day. As an alternative to time-based checking, event-based checking can be utilized, such as initiating the volatile memory check of executable code immediately prior to use of the device following an extended period of inactivity.
In one embodiment of the invention, the system of managing volatile memory is implemented within an integrated circuit, such as an application specific integrated circuit (ASIC). The integrated circuit includes a processor, embedded volatile memory, and an integrated self-tester capability. The self-tester capability includes stored test code that is specific to detecting errors within the information (e.g., executable code) residing within the volatile memory. The stored test code includes instructions which implement memory testing routines. The test code may be stored within embedded non-volatile memory of the integrated circuit.
The self-testing capability also includes a module for triggering the execution of the stored test code. In the time-based embodiment, the module is a timing module which is coupled to an embedded clock or an external clock. The timing may identify times of day or may be based upon time increments since the last period of activity, such as initiating the test routines once per hour between periods of activity. Preferably, the memory checking capability is inhibited during periods of device activity.
The integrated circuit also includes a recovery module that is responsive to the self-testing capability to induce an operational sequence that transfers fresh information to the input of the volatile memory when the code error condition is detected. The memory checking may consist of a cyclic redundancy check (CRC) or a checksum on the entire memory space of the volatile memory. Alternatively, if sufficient non-volatile memory is available, a redundant copy of the entire volatile memory space can be stored and a “compare” routine may be run. Error correction is a possibility in the case that a code error condition is detected, but the preferred method is one in which the device reinitializes itself from the non-volatile memory and the information (e.g., executable code) is reloaded to the volatile memory. While not critical, the recovery module may include a “watchdog” that resets the system if a fatal error results in the checking routines being unintentionally terminated (such as by a system-wide “hang”).
In addition to applications to ASICs, the techniques for managing the volatile storage of information may be applied to applications in which the volatile memory is external to the integrated circuit chip that includes the processor. Moreover, the techniques may be applied to other “semi-permanent” memory types that are potentially susceptible to soft errors, including flash memory, random access memory and magnetic memory.
As previously noted, an advantage of the invention is that because the volatile memory checking only occurs when the system is inactive (“idle”), there is no performance penalty that is imposed by the implementation of the invention. While the invention is particularly suited for use in checking soft errors introduced into executable code, the techniques may be applied to data sections of the volatile memory, if desired. Another advantage of the invention is that the timing of the error checking routines can be set so as to impose minimal additional power consumption. In the embodiment in which the error checking occurs within an ASIC of a printer, the power consumption is typically not a significant concern. However, for a handheld computing device or a cellular telephone, the low-power feature is significant.
With reference to
A printer 34 typically includes both volatile memory and memory. “Volatile memory” is defined herein in the conventional manner as memory in which data is lost when power to the memory is terminated. On the other hand, non-volatile memory retains information in power-off conditions. In
The printer 34 may be an ink jet printer having a printhead 44, a carriage motor 46 that moves the printhead laterally, and a paper feed motor 48 that steps paper in a direction perpendicular to the path of the printhead-supporting carriage. While not shown in
Referring now to
In one application of the invention, the self-testing for soft errors occurs as a function of time. Thus, a test initialization module 60 tracks time. Time increments may be based upon the time of day. For example, the self-testing may be initiated once per day at the same time of day or may be initiated once per hour. In a preferred embodiment, the timing is based upon the last use of the ASIC. Consequently, in the printer application, self-testing routines may be initiated after each increment of time starting from the termination of the previous print operation. The test initialization module 60 is connected to an external clock 62 via a bus 64, allowing the module to track time. Alternatively, a clock may be embedded within the ASIC to cooperate with the test initialization module in determining when self-testing should be initiated. While the test initialization module is shown as being a separate component within the ASIC 42, the term “module” is defined herein as encompassing circuit hardware, programming software, and a combination of circuit hardware and programming software.
The test initialization module 60 may also be event-based. That is, rather than triggering self-testing on the basis of timing, the module 60 may sense an event that causes the module to initiate the soft error testing routines. The most suitable event is the onset of a print operation. As another alternative to time-based testing, the basis for initiating the self-testing routines may be a combination of timing and event sensing. For example, initiating self-testing immediately before a print operation may be limited to occasions in which it has been an extended period of time (e.g., one hour) since the last print operation.
It should be noted that the timing is not based upon conventional DRAM refresh rates of the volatile memory. That is, the invention is distinguishable from conventional refresh operations. As another item of interest, the self-testing preferably is inhibited during print operations. Thus, the module 60 may include or may be connected to an interrupt controller that is coupled to the conventional memory controller 66 of the ASIC 42. Flags may be imposed which both ensure that print operations do not occur during self-testing and ensure that self-testing is not initiated during an ongoing print operation.
During normal use, the memory controller 66 and the DRAM 56 operate in a conventional manner. The on-chip processor 58 sends addresses to the memory controller which uses the addresses to determine which instructions of the DRAM-stored executable code are to be sent to the processor. While not shown in
The checking routines of the test code that resides within the ROM 68 are not critical to the invention. In one embodiment, the volatile memory check consists of calculating a cyclic redundancy check (CRC) or checksum of the entire code space of the DRAM 56. The CRC may generate a polynomial with terms fed back at various stages so as to detect errors within the executable code sent from the DRAM 56 to the processor 58. If there is sufficient available non-volatile memory, a redundant copy of the entire code space may be stored, allowing a “compare” routine. In some applications, it may be possible to provide error correction, but this is not critical.
A recovery module 70 is activated if a specific error condition is detected. The error condition may merely be a first detection that there is an error within the executable code. On the other hand, more error-tolerant systems may not be adversely affected by a single error, so that the recovery module is activated only if multiple errors are detected or if a system-threatening error is detected. Activation of the recovery module causes the device (e.g., the printer) to reinitialize itself from the non-volatile memory 40 and a request for a fresh firmware download is sent to the host. As an added feature, the recovery module 70 may include a “watchdog” capability in which the system is reset if the system freezes during a memory checking routine.
A second embodiment of the invention is shown in
The significant difference between the embodiment of
In order to more clearly describe the invention,
At step 76, the memory checking parameters are set. This includes establishing the conditions for initiating the checking routines. As previously noted, the conditions for initiation may be time based, event based (such as immediately prior to each use of the device in which the ASIC 42 resides), or a combination of timing and event occurrences.
As the device (e.g., the printer) is left in a power-on state, the self-testing capability will loop through determinations at step 78, until an affirmative response is made to the issue of whether the testing conditions have been met. Thus, if the testing conditions are merely time-based, the affirmative response occurs when the appropriate time has passed, such as one hour. In the preferred embodiment, the process then moves to a step 80 of determining whether the system is presently active, but step 80 is not critical. In the printer application, if a print operation is occurring at the time that the affirmative response occurs at step 78, the process loops back to the input at step 78. Thus, if the system is active, the self-testing will not be initiated at the specified time. In essence, it will be assumed that the system activity will provide evidence that the executable code is not corrupt. Alternatively, an affirmative response at step 80 will cause a loop back onto itself, so that the memory checking routines will be initiated at step 82 as soon as the system is inactive.
The memory checking may consist of calculating the CRC or checksum on the entire code space of the volatile memory being checked. As previously noted, if sufficient memory is available, a redundant copy of the entire code space may be generated and a “compare” routine may be run. In step 84, if it is determined that a specific error condition exists, the device may reinitialize itself from the non-volatile memory and a fresh set of executable code may be sent to the volatile memory. The device reinitialization is represented by step 86. The process then returns to the input of step 78.
While the invention has been described primarily with regard to use with on-chip volatile memory, the memory management techniques can be extended to stand-alone volatile memory chips and even to storage of information within similar types of “semi-permanent memory,” i.e., memory that is susceptible to losses in the form of soft errors. These other types of memory include ROM, flash memory, and magnetic memory.
Number | Name | Date | Kind |
---|---|---|---|
4757503 | Hayes et al. | Jul 1988 | A |
5414861 | Horning | May 1995 | A |
5532962 | Auclair et al. | Jul 1996 | A |
5619513 | Shaffer et al. | Apr 1997 | A |
5748640 | Jiang et al. | May 1998 | A |
6085334 | Giles et al. | Jul 2000 | A |
6128094 | Smith | Oct 2000 | A |
6415403 | Huang et al. | Jul 2002 | B1 |
6560733 | Ochoa | May 2003 | B1 |
6625749 | Quach | Sep 2003 | B1 |
Number | Date | Country | |
---|---|---|---|
20030131307 A1 | Jul 2003 | US |