The present invention generally relates to digital computers and more specifically to dynamic memory devices.
Memory devices subjected to elevated radiation environments such as in high altitudes and outer space, suffer from single event effect sensitivity where memory cell values can change as a result of being bombarded with radiation particles (i.e. upsets). Traditional solutions to this problem include error scrubbing, where each memory location is periodically read and checked for errors. If an error is found, the correct value is retrieved from a redundant source and written back into the memory location. Additionally, many memory devices, such as dynamic random access memory (DRAM) must be periodically refreshed for proper function. Refreshing and error scrubbing of memory devices each consume memory access time that could otherwise be used for applications that store data in the memory devices. Radiation hardened memory technologies, while less susceptible to single event effects, are typically less dense, requiring more devices to obtain the same storage capacity available from a fewer number of non-hardened devices.
For the reasons stated above and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the specification, there is a need in the art for improved methods and systems for refreshing and error scrubbing memory devices.
The Embodiments of the present invention provide methods and systems for refreshing and error scrubbing memory devices and will be understood by reading and studying the following specification.
In one embodiment, a method for implementing a refresh-error-scrubbing function for a memory device is provided. The method comprises sequentially reading data contained in a plurality of memory locations of a memory device, checking for errors in data stored in the plurality of memory locations, and correcting memory location data when an error is found. Every memory location within the plurality of memory locations is repeatedly read with a periodicity not exceeding a refresh-scrub-access-period time interval.
In another embodiment, a system for storing data is provided. The system comprises one or more dynamic memory devices and a memory management module coupled to the one or more memory devices. The memory management module is adapted to periodically perform a refresh-scrub function on the one or more memory devices such that memory locations within the one or more memory devices are error-scrubbed within a time required by a refresh-scrub-access-period time interval.
In still another embodiment, a system for storing data is provided. The system comprises means for storing data wherein the means for storing data stores data values in a plurality of memory elements, means for reading the data values stored in the plurality of memory elements, means for identifying errors in data values stored in the plurality of memory elements, and means for correcting data values when an error is found. The means for reading reads each of the plurality of memory elements with a periodicity not exceeding a refresh-scrub-access-period time interval.
In yet another embodiment, a computer-readable medium having computer-executable program instructions for a method for maintaining the integrity of data stored in dynamic memory devices is provided. The method comprises performing an error-scrubbing function on each memory element of a dynamic memory device having a plurality of memory elements at a periodicity not exceeding a refresh-scrub-access-period time interval.
The present invention can be more easily understood and further advantages and uses thereof more readily apparent, when considered in view of the description of the preferred embodiments and the following figures in which:
In accordance with common practice, the various described features are not drawn to scale but are drawn to emphasize features relevant to the present invention. Reference characters denote like elements throughout figures and text.
In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific illustrative embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical and electrical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense.
Embodiments of the present invention increase the availability of memory devices for the use of applications by combining error scrubbing and memory refresh functions into a single activity. Error scrubbing functions must check each memory device location within some minimum interval based on the probability of occurrence of an error. This probability is a function of both memory device characteristics and environmental factors. If an error is found, the correct value is retrieved from a redundant source and written back into the memory location.
An error scrub rate for a memory device is chosen to ensure that upsets are discovered and corrected before the number of cumulative upsets reaches a point where they can no longer be fixed. Typically, for a given device operating in a given environment, a radiation effect analysis is used to determine the probability of an upset occurring and how often an upset can be expected. The error scrub rate is typically chosen such that the entire memory is scrubbed within the probability time period that one upset is expected. The probability time period establishes the element error scrub interval within which error scrubbing a specific element (i.e. a device address) must be repeated. For example, where the probability time period for expecting one upset in a 1600 element device is 1.6 seconds then the error scrub rate for that device is equal to 1600(elements)/1.6(seconds), or 1000 elements per second. The error scrub function must error-scrub at a rate of at least 1000 elements every second to satisfy the 1.6 second element error scrub interval.
Dynamic memory devices, such as DRAM, require refreshing the entire contents of the memory device within some minimum interval to prevent loss of the memory device's content. In one embodiment, the refresh rate for a memory device is determined based on manufacture's specified row refresh interval, which specifies the maximum time interval for repeating the refresh cycle to a specific row. The entire contents of a memory device must be refreshed within the time interval specified by the manufacturer to prevent loss of the memory device's content. For example, where the manufacturer's row refresh interval for a 6400 row memory device is 64 milliseconds, then the refresh rate for that device is equal to 6400(rows)/64(milliseconds), or 100,000 rows per second. A refresh function must perform a refresh at a rate of at least 100,000 rows every second to satisfy the 64 millisecond row refresh interval requirement.
Embodiments of the present invention appropriately match the refresh and error scrubbing intervals, and define memory device access patterns that result in accomplishing both the refresh and error scrubbing functions simultaneously, thus reducing memory access time wasted. by these overhead activities.
Embodiments of the present invention take advantage of a characteristic of DRAM that reading from a memory location performs a refresh of the contents of the memory location. As long as memory locations within a memory device are continuously read in a cyclical pattern, a separate refresh operation is not necessary. Normal execution of computer applications cannot be relied upon to perform all the needed memory location reads to refresh an entire memory device because typical computer applications only access memory by reading from, and writing to, specific memory locations as required. Accordingly, access of the memory device by applications results in a mostly random pattern of reads and writes that is insufficient to ensure that every memory location is refreshed with enough frequency to satisfy the refresh rate requirement of the memory device. In contrast, because error scrubbing functions access every memory location while looking for upsets to fix, error scrubbing functions can be relied upon to refresh the memory device. Embodiment of the present invention provide methods for performing error scrubbing memory accesses of sufficient frequency and pattern as to satisfy both the error scrub rate and the refresh rate, eliminate the need to perform separate refresh operations, and thus simultaneously reduce the memory device's overhead time requirements while increasing the time the memory device is available for applications to access.
In one embodiment, method 100 pauses for time interval T (130), allowing applications to access the memory device. The pause time interval is chosen to ensure that the refresh-scrub function reads the contents of each memory location (122) within the time required by the refresh-scrub-access-period time interval. In one embodiment, the pause for time interval T (130) is placed between each subsequent error-scrubbing access to distribute the error-scrub accesses evenly throughout the refresh-scrub-access-period time interval. Distributing the delay throughout the interval reduces the access latency for memory access for applications. If the refresh-scrub is performed all at one time, the application access to the memory must be suspended for the duration. Some application may not be able to tolerate that behavior. Alternately, if an application is known to have periods of limited memory access, choosing to perform a refresh-scrub function during those periods is beneficial. The timing and granularity of refresh-scrub actions should be viewed as application-specific, so long as the constraints of the refresh-scrub-access-period are met. Most applications are likely to prefer a fine-grained, distributed behavior that introduces minimum latency for functional access to memory.
The algorithm of method 200 defines three constants: LOC_INC, LOC_RANGE and INNER_LOOP_ITERATION_RANGE (ILIR). LOC_INC is defined as the difference in memory device locations between the first element on one row and the first element of the adjacent row (i.e. the number of memory location elements contained per row). LOC_RANGE is defined as the total number of memory location elements which must be refreshed and error-scrubbed within the refresh-scrub-access-period time interval. INNER_LOOP_ITERATION_RANGE is equal to LOC_RANGE divided by LOC_INC.
The algorithm of 200 comprises a first loop sequence (210) which cycles in one step increments from one to LOC_INC. The variable OUTER_INDEX equals the current value of the first loop sequence minus one (220). Within the first loop sequence, a second loop sequence (230) cycles in one step increments from 1 to ILIR. INNER_INDEX equals the current value of the second loop sequence minus one (240). Within each second loop sequence, the algorithm cycles through the rows of memory device 350 (illustrated by 320-1 to 4) executing an error-scrub access (270) with one memory location within each row. In one embodiment, method 200 further defines a function POSITION (250) as equal to (OUTER_INDEX+(INNER_INDEX * LOC_INC). The POSITION function calculates an integer value correlating the current OUTER_INDEX and INNER_INDEX values to memory location (as illustrated in
In order to satisfy the refresh-scrub-access-period time interval, the time required for completing the second loop (230) sequences must be less than the row refresh interval for the memory device. Additionally, the time required for completing the first loop (210) sequences must be less than the element error-scrub interval. However, allowing method 200 to continuously cycle prevents applications from accessing memory device 350 for storing or retrieving data. Therefore, method 200 further comprises time delay (280) that allows applications to access memory device 350 for a time interval of T1 after each error scrub access. Time interval T1 is chosen to ensure that the entire refresh-scrub function of method (200) is performed within the time required by the refresh-scrub-access-period time interval. In one embodiment, time interval T1 is chosen as the largest time interval possible that still satisfies the refresh rate and error-scrub rate requirements. Pausing after each error scrub access distributes error scrub accesses more evenly throughout the refresh-scrub-access-period time interval thus reducing access latency for applications needing to access memory device 350.
As would be appreciated by one skilled in the art upon reading this specification, the sequential reading of data in memory elements is not limited to a sequence of consecutive memory device rows or columns, but can comprise any arbitrary sequence, as long as the refresh rate and error-scrub rate requirements are satisfied. In one embodiment, the algorithm of method 200 may be modified to take advantage of still other specific characteristics of the specific memory device used. For example, certain dynamic memory devices, such as but not limited to synchronous dynamic random access memory (SDRAM), are optimized for burst access to memory elements. In one embodiment, a memory device is optimized for burst access on the order of two to eight sequential addresses per burst. The total access overhead per address can be reduced by performing error-scrub accesses in bursts instead of one element location at a time. For example, in one embodiment, memory device 350 is optimized for burst access on the order of two addresses per burst. To take advantage of this two element burst access optimization, an alternate algorithm of one embodiment of the present invention cycles through POSITION values of 0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11, 14, 15, as illustrated in
Within each third loop sequence, the algorithm cycles through memory device 350 (illustrated by 330-1 to 330-3) executing error-scrub accesses (470) to a total number of memory locations equal to BURST. In one embodiment, method 400 further defines POSITION (450) as equal to (OUTER_INDEX+(INNER_INDEX * LOC_INC))+BURST_INDEX. The POSITION function calculates an integer value correlating the current OUTER_INDEX, INNER_INDEX and BURST_INDEX values to memory locations as described with respect to
Method 400 further comprises time delay (480) that allows applications to access memory device 350 for a time interval of T2 after each error scrub access burst. As with method 200, in order to satisfy the refresh-scrub-access-period time interval, the time required for completing the second loop (430) sequences must be less than the row refresh interval for memory device 350. Additionally, the time required for completing the first loop (410) sequences must be less than the element error-scrub interval. Therefore, time interval T2 is chosen to ensure that the entire refresh-scrub function of method (400) is performed within the time required by the refresh-scrub-access-period time interval. In one embodiment, time interval T2 is chosen as the largest time interval possible that still satisfies the refresh rate and error-scrub rate requirements. Pausing after each error scrub access burst distributes error scrub accesses more evenly throughout the refresh-scrub-access-period time interval thus reducing access latency for applications needing to access memory device 350.
Although FIGS. 2, 3A-C and 4 illustrate the application of methods with a hypothetical memory device comprising four rows of four elements, persons skilled in the art upon reading this specification would readily appreciate that embodiments of the present invention are not limited to this hypothetical device. To the contrary, embodiments of the present invention are applicable to any dynamic memory device possessing the characteristic where reading data from a memory location provides a refresh. For example, in one embodiment, memory device 350 is a double-data rate synchronous DRAM (DDR SDRAM) memory device having 8192 rows on 4 banks with a required refresh interval of 64 milliseconds.
In one embodiment, memory management module 520 periodically performs error-scrub accesses to ensure that an error-scrub access is performed on every memory location within the time required by a refresh-scrub-access-period time interval. The refresh-scrub-access-period time interval is the time interval within which every memory location within memory devices 510 must be read to ensure that both the refresh rate requirements of memory devices 510, and the error scrub rate requirements for correcting upsets, are satisfied.
Several means are available to implement the methods and algorithms, and realize a memory management module such as those described above. These means include controllers such as, but not limited to, digital computer systems, programmable controllers, or field programmable gate arrays. Therefore other embodiments of the present invention are program instructions resident on computer readable media which when implemented by such controllers, enable the controllers to implement embodiments of the present invention. Computer readable media include any form of computer memory, including but not limited to punch cards, magnetic disk or tape, any optical data storage system, flash read only memory (ROM), non-volatile ROM, programmable ROM (PROM), erasable-programmable ROM (E-PROM), random access memory (RAM), or any other form of permanent, semi-permanent, or temporary memory storage system or device. Program instructions include, but are not limited to computer-executable instructions executed by computer system processors and hardware description languages such as Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL).
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement, which is calculated to achieve the same purpose, may be substituted for the specific embodiment shown. This application is intended to cover any adaptations or variations of the present invention. Therefore, it is manifestly intended that this invention be limited only by the claims and the equivalents thereof.