System and method for facilitating elastic error correction code in memory

Description

BACKGROUND
Field

This disclosure is generally related to the technical field of data storage. Specifically, this disclosure is related to a system and method for facilitating elastic error correction code in memory.

Related Art

The memory capacity in modern servers have been continuously increasing due to an increasing demand for server applications, e.g., web applications, web services, etc., that are resource intensive. In addition, the modern servers are expected to provide a reliable service. It is expected that the server-level Reliability, Availability and Serviceability (RAS) is sufficient to meet the requirements of cloud service providers in terms of Total Cost of Ownership (TCO) and of customer Service-Level Agreement (SLA). There can be several factors that affect the RAS level, one of the dominant factors being the occurrence of soft errors in the server's dynamic random access memory (DRAM) devices. Soft errors occur in memory system when cosmic rays or particles with certain electrical charges hit a memory cell, thereby causing the cell to change its state to a different value. However, the memory cell is functional and there is no damage caused to the physical structure of the memory cell.

In order to improve the RAS level of the modern servers and to protect DRAM devices against the occurrence of soft errors, several error correction techniques have been integrated into memory devices in the modern servers. In the following paragraphs, some of the conventional error correction techniques and the challenges encountered by these error correction techniques are addressed.

Most of the modern server-class DRAMs are typically protected by standard error correction codes (ECC) that has the capability of Single-Error Correction and Double Error Detection (SECDED). Previously, such standard SECDED ECC provided reliable operation of memory devices, but in recent years this standard SECDED ECC has been incapable of meeting the high level of RAS requirements of the modern servers. Such poor performance of the standard SECDED ECC is due to the following factors. First, the memory capacity in modern servers has been continuously increasing. Specifically, the memory capacity in memory systems is increased by densely packing a high number of memory devices, e.g., DRAMs. Such dense packing of memory devices results in an increase in a percentage of multi-bit errors. Since the standard SECDED ECC is only capable of correcting a single bit error, it does not provide sufficient error protection when the DRAM devices are subject to multi-bit errors.

Second, with the on-going evolution of Double Data Rate (DDR) memories, there has also been a continuous drop in the operating voltage of DRAM devices. Table 1 below shows the different DDR versions and their corresponding operating voltages.

TABLE 1

Operating voltages of different DDR versions

DDR version
Operating voltage

DDR3
1.5 V-1.65 V

DDR4
1.2 V-1.4 V

DDR5
1.1 V

With the decrease in the operating voltage of the DRAM devices, the noise margin is also lowered, thereby causing the DRAM devices to be susceptible to multi-bit soft errors that cannot be sufficiently corrected by the standard SECDED ECC. Such a poor performance of the standard SECDED ECC in modern servers has lead to the development of advanced error correction techniques to ensure server reliability.

One error correcting technique uses remapping or re-organization of bits of an ECC word, to correct bit errors. The ECC word includes both data bits and check bits. The error correcting technique technique is suitable for a scenario when soft errors are clustered. The error correcting technique technique scatters the bits of the ECC word across multiple memory chips. For example, instead of storing an entire cache line in one DRAM device, the error correcting technique technique re-arranges the data in the cache line by spreading the data across multiple DRAM devices. Hence, a failure of any single memory chip would affect only one ECC bit per word. However, the error correcting technique is not effective when the soft errors are uniformly distributed across the memory chips.

Another existing method for correcting multi-bit errors is full or partial memory mirroring. In this technique, a range of memory or half of the memory is duplicated in the DRAM available in the memory system. When the ECC is incapable of correcting the errors in a DRAM device, the mirrored or duplicated copy of data is used for processing the subsequent data access requests. Such a mirroring technique is capable of providing robust error correction, this is because even if the data bits is a portion of memory is completely corrupted, the system can use the uncorrupted data bits in the mirrored copy of this portion of the memory. However, this technique reduces the effective memory capacity by half resulting in an expensive RAS feature.

Due to the above-mentioned drawbacks associated with different error correction techniques, some challenges still remain in designing an effective error correction technique that is capable of correcting multi-bit errors and providing a high level RAS.

SUMMARY

According to one embodiment of the present disclosure, a system for performing error correction in memory is provided. During operation, the system can receive a memory access request from a host processor. The system can then compare a memory address specified in the memory access request with a set of entries in an error correction code (ECC) mapping table. In response to the system determining that the memory address corresponds to at least one entry in the ECC mapping table, the system may perform the following operations: determining, based on a value in the counter field, whether the memory address belongs to a first portion or a second portion of the address range specified in the ECC mapping table entry; selecting a current ECC mode when the memory address belongs to the first portion; and selecting a previous ECC mode when the memory address belongs to the second portion. The system may then process the memory access request based on the selected ECC mode.

In a variation on this embodiment, each entry in the ECC memory mapping table can include: a start address field, an end address field, a previous ECC mode field, a current ECC mode field, and a counter field.

In a variation on this embodiment, the previous ECC mode and the current ECC mode use a class of cyclic error correcting codes that is capable of performing: a 4-bit error correction and 5-bit error detection; a 5-bit error correction and 6-bit error detection; and a 6-bit error correction and 7-bit error detection.

In a variation on this embodiment, in response to determining that the memory address is not included in the ECC mapping table, selecting a default ECC mode. The default ECC mode represents a Hamming code with 64 bits data and 8 bit parity code.

In a further variation on this embodiment, the system can use a counter field in the ECC mapping table entry to track a boundary separating the address range into the two regions: the first portion of the address range and the second portion of the address range. The address range is defined by a start address and an end address specified in the ECC mapping table entry.

In a variation on this embodiment, the memory in the system can include a dynamic random access memory (DRAM).

In further variation on this embodiment, the system can determine that the memory access request is a write request when the memory address is the last address in the first portion. Next the system can in response to determining that the memory access request is the write request, update the ECC mapping table by: setting a write ECC mode field in the ECC mapping table to the current ECC mode and increment a value in the counter field of the ECC mapping table entry.

According to another embodiment of the present disclosure, a system for performing error correction in memory by performing memory scrubbing and ECC mapping table update is provided. During operation, the system can monitor an ECC decoding statistics to identify a set of intensities of soft errors in different address ranges in memory. The system can in response to determining that an intensity of soft errors in an address range in memory is greater than at least one threshold in a set of thresholds: read an ECC mapping table. Further, the system can in response to determining that the address range is fully or partially included in an entry of the ECC mapping table, prioritize memory scrubbing when the address range is not completely protected with an ECC mode specified in a current mode field of the entry in the ECC mapping table. The system can then update the ECC mapping table.

In a variation on this another embodiment, the system can update the ECC mapping table by: updating a previous ECC mode field in the entry of the ECC mapping table with a mode specified in the current mode field; setting, based on the threshold, the current mode field to a new mode; and resetting a counter field in the entry of the ECC mapping table. The new mode has a higher strength than the mode specified in the updated previous ECC mode field.

In a further variation on this another embodiment, the system can in response to determining that the address range is not included in any entry of the ECC mapping table: add a new entry to the ECC mapping table; set a previous ECC mode field in the new entry to a default mode; set, based on the threshold, the current ECC mode field in the new entry to a new mode, and reset a counter field in the entry of the ECC mapping table.

In a further variation on this another embodiment, the set of thresholds includes a first threshold, a second threshold, and a third threshold. When the system determines that the intensity of soft errors exceeds the first threshold, the current ECC mode field is set to mode 1. When the system determines that intensity of soft errors exceeds the second threshold, the current ECC mode field is set to mode 2. Next, when the system determines that the intensity of soft errors exceeds the third threshold, the current ECC mode field is set to mode 3.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1A shows a conventional EEC word used in a standard SECDED ECC, in accordance with the prior art.

FIG. 1B shows two ECC modes used in an existing MECC technique, in accordance with the prior art.

FIG. 2 shows three different exemplary ECC modes, according to one embodiment of the present disclosure.

FIG. 3 shows an exemplary elastic error correction system architecture, in accordance with an embodiment of the present disclosure.

FIG. 4A presents a flowchart illustrating an exemplary process for performing elastic error correction in memory, in accordance with an embodiment of the present disclosure.

FIG. 4B is a continuation of FIG. 4A, in accordance with an embodiment of the present disclosure.

FIG. 5A presents a flowchart illustrating an exemplary process for performing elastic error correction in memory by applying memory scrubbing and ECC mapping table update, in accordance with an embodiment of the present disclosure.

FIG. 5B is a continuation of FIG. 5A, in accordance with an embodiment of the present disclosure.

FIG. 5C is a continuation of FIG. 5A, in accordance with an embodiment of the present disclosure.

FIG. 6 illustrates an exemplary computer system that facilitates elastic error correction in memory, according to one embodiment of the present disclosure.

FIG. 7 illustrates an exemplary apparatus that facilitates elastic error correction in memory, according to one embodiment of the present disclosure.

In the figures, like reference numerals refer to the same figure elements.

DETAILED DESCRIPTION

The following description is presented to enable any person skilled in the art to make and use the embodiments, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Overview

In recent years, the idea of merging different levels of error correction capabilities in a single system has been explored. One such technique that uses different levels of error correction capabilities is Morphable ECC (MECC). FIG. 1A shows a conventional EEC word used in standard SECDED ECC, in accordance with the prior art. Standard SECDED operate at 8 byte granularity, i.e., 8 byte data block (100) has 8 ECC bits (102). In other words, with every 64-bit of payload data SECDED uses 8-bits of ECC.

MECC technique proposes to use SECDED on 64 byte block granularity instead of the conventional 8 byte block granularity. Further, in addition to the conventional SECDED with 64 byte block granularity, MECC provides another error correcting capability on the 64 byte block granularity. Specifically, MECC provides an error correction mode that is capable of correcting 6-bit errors and is denoted as ECC-6. ECC-6 provides a better error correction capability than the conventional SECDED on a 64 byte block granularity.

The MECC derives its 6-bit ECC from the standard SECDED. Specifically, for a 64 byte block granularity the 8 ECC bits used in the standard SECDED on 8 byte block granularity are merged to provide 64 bits ECC for a 64 byte block. Hence, the 64-bits of ECC can be sufficient to support the 6-bit error correction code, ECC-6. Since each mode in MECC may use different combination of the ECC bits for each level of error protection, MECC uses the left-most four bits in the ECC field as the mode bits. These mode bits are used to identify an ECC mode or level of error protection used for a current 64 byte data block. FIG. 1B shows two modes used in an existing MECC technique, in accordance with the prior art. Each ECC word includes 64 bytes of data block (104) and 64-bits of ECC (106). MECC uses the 64 bits of ECC to perform error correction using ECC mode 0 (108) or ECC mode 1 (110).

MECC was specifically designed for improving RAS feature on mobile computing platforms. Since mobile computing devices come with a limited battery power, it was important to reduce their energy consumption to provide a longer period of operation with battery power. A process that consumes a significant portion of the available battery power on mobile computing platforms is a refreshing operation performed on DRAM. Reducing the number of refreshes performed on DRAMs would save the battery power. Therefore, MECC was designed to reduce the number of DRAM refreshing by using a stronger error correction mode, i.e., ECC-6, thereby also reducing the DRAM power consumption for mobile systems.

Although MECC provides a better error protection technique compared to the standard SECDED, the performance of MECC deteriorates when it is used for improving the reliability of DRAMs in data centers. In the following paragraphs some of the inherent drawbacks associated with using the MECC error correction technique have been addressed.

Although standard SECDED is capable of correcting just one bit errors, the error correction process or the ECC checking process can be overlapped with the transfer of 8 byte data, thereby partially hiding the ECC checking latency. But MECC performs ECC checking only after the 64 bytes of data have been transferred. MECC performs such ECC checking irrespective of the mode used, i.e., either SECDED or ECC-6. Therefore, MECC does not hide the ECC checking latency during the transfer of a cache line, thereby incurring an increase in memory access latency. Moreover, performing ECC checking on 64 bytes of data block takes longer time to complete than performing ECC checking on 8 bytes of data block, thereby causing additional tens of cycles of latency when accessing a cache line sized data in DRAM.

Further, MECC is exclusively a hardware solution for correcting errors in DRAM. While this can be a desirable feature in certain applications scenarios, it proved to be disadvantageous for data center management systems. One of the reasons why MECC is incompatible with data center management systems is that data centers need to be aware of soft-error rates of the system to take proactive actions. The proactive actions can include: disabling the failing DRAM Dual in-line memory module (DIMM), or preventing service disruption by migrating the applications away from a failing node. Since MECC is a hardware mechanism, it completely shields soft-error rate information from the data center management system, thereby causing the data center management system to be affected by abrupt service disruption.

Moreover, in MECC, the mode bits used for identifying the ECC mode could also be subject to soft errors. MECC provides a solution for addressing such soft errors in the modes bits by duplicating the mode bits 4 times. However, this solution is only capable of correcting one-bit errors in the mode bits. Therefore, when two-bit errors occur in the mode bits, MECC can be unable to identify a correct ECC mode to be used. Without a correct identification the ECC mode to be used no error correction would take place, thereby leading to accumulation of soft errors in memory which would cause a severe degradation in the system performance. Therefore, due to the above-mentioned drawbacks of MECC, the performance of MECC deteriorates when it is used for improving the reliability of DRAM in data centers.

Table 2 below provides a comparison between the MECC error correction technique and the elastic ECC technique proposed in the present disclosure.

TABLE 2

Comparison between MECC and Elastic ECC

MECC
ELASTIC ECC

Can support only two modes: SECDED and
Is capable of supporting four modes, thereby

ECC-6
proving a better flexibility in controlling the ECC.

Since MECC can support only two modes,
Since Elastic ECC can support 4 modes, it

moderate soft error intensities are managed
provides less memory access latency overhead for

by using strong ECC-6, which need several
moderate soft error intensities. Furthermore,

cycles to perform the encode and decode
elastic ECC provides a smooth and graceful trade-

operations. Thereby, increasing the memory
off between memory access latency overhead and

access latency overhead.
ECC protection levels.

MECC stores ECC mode information in
Stores ECC mode information in memory

DRAM, hence can be susceptible to soft
registers of the memory controller, which has

errors.
better resilience to soft errors when compared

with DRAM. Hence, elastic ECC provides a

better DRAM reliability than MECC.

Does not provide any control on mapping of
Is capable of allowing the operating system to

ECC and address range in memory.
control the mapping of ECC to a given address

range in memory. Such control on mapping can

allow the operating system to take proactive

actions before the soft error intensities increase

beyond a threshold value.

Further, unlike full or partial memory mirroring, the present disclosure using elastic ECC does not incur memory capacity overhead, thereby reducing the server cost when performing multi-bit error correction. Moreover, the memory mirroring method involves additional memory writes which impacts the memory bandwidth. The present disclosure does not include such additional memory writes, hence not impacting the memory bandwidth.

According to one embodiment of the present disclosure, a system for performing error correction in memory is provided. During operation, the system can receive a memory access request from a host processor. The system can then compare a memory address specified in the memory access request with a set of entries in an error correction code (ECC) mapping table. In response to the system determining that the memory address corresponds to at least one entry in the ECC mapping table, the system may perform the following operations: determining, based on value in the counter field, whether the memory address belongs to a first portion or a second portion of the address range specified in the ECC mapping table entry; selecting a current ECC mode when the memory address belongs to the first portion; and selecting a previous ECC mode when the memory address belongs to the second portion. The system may then process the memory access request based on the selected ECC mode

According to another embodiment of the present disclosure, a system for performing error correction in memory by performing memory scrubbing and ECC mapping table update is provided. During operation, the system can monitor an error correction code (ECC) decoding statistics to identify a set of intensities of soft errors in different address ranges in memory. The system can in response to determining that an intensity of soft errors in an address range in memory is greater than at least one threshold in a set of thresholds, read an ECC mapping table. Further, the system can in response to determining that the address range is fully or partially included in an entry of the ECC mapping table, prioritize memory scrubbing when the address range is not completely protected with an ECC mode specified in a current mode field. The system can then update the ECC mapping table.

Furthermore, the present disclosure is capable of addressing memory reliability issues in a flexible and cost effective manner. The system can expand the size of the data blocks that ECC bits can protect from 64 bits to 512 bits or 64 bytes, e.g., a cache line size in X86 systems. Further, the system can use the aggregated 64 bit ECC bits for multi-bit error correction of the cache line block. In addition, the system includes an integrated memory controller in a central processing unit (CPU) by introducing an ECC mapping table that can include address ranges and corresponding ECC modes. The system is capable of correcting multi-bit errors at the cache block level without incurring additional overhead in memory capacity. The system is also capable of allowing the co-existence of multiple ECCs and can provide flexibility on the type of ECC modes. Further, the system can also provide flexibility on protecting different memory regions; hence the system is capable of adapting to various application demands.

Elastic Error Correction Code Modes

Unlike MECC, the present disclosure is capable of providing additional programmable ECC protection modes for 64 byte cache line data. The system can use the default Mode 0, which is a conventional <72,64> Hamming code with 64-bit data and 8-bit parity code (see FIG. 1A). This ECC mode provides SECDED capability on 64-bit data granularity. When the system uses Mode 0, encoding and decoding operations can be fast. The system can also pipeline the encoding and decoding operations with DRAM accesses, thereby hiding most of the ECC decoding/encoding overhead.

FIG. 2 shows three different exemplary ECC modes, according to one embodiment of the present disclosure. In addition to the conventional SECDED, the present disclosure can provide three additional modes: Mode 1 (206), Mode 2 (208), and Mode 3 (210). These additional ECC modes can support up to 6 bit error correction and 7 bit error detection at the cache line level. The system can use these three modes to protect the 64 byte data block 200 by merging together the corresponding 8 of the 8-bit parity codes. Further, the system can use a class of cyclic error correcting codes that are capable of correcting random multi-bit errors, e.g., Bose-Chaudhuri-Hocquenghem (BCH) code. For example, when using the BCH code for correcting t errors and detecting t+1 errors in d-bit data, the constraints shown in Table 3 can be satisfied.

TABLE 3

BCH code constraints

Length of code word
[(t * m) + 1] bits

Length of data block, d
d < 2^m− 1

Based on the constraints listed in Table 3, the system may use different error correction levels with 64 bytes of data block granularity. For Mode 1 (206) the system can use 41 ECC bits for error correction which has the capability of correcting 4-bit errors and detecting 5-bit errors. The remaining bits in the ECC field are unused. Similarly, for Mode 2 (208) the system may use 51 ECC bits for 5-bit error correction and 6-bit error detection. For Mode 3 (210), the system can use 61 ECC bits for 6-bit error correction and 7-bit error detection. In a memory system, the soft-errors can be “localized” or “clustered”, to address such soft-errors the system can program the selection of different ECC modes to address such soft-errors. Table 4 below shows the different ECC modes with their corresponding ECC bits, error correction, and error detection capability. With these 4 modes, the system can be capable of providing different levels of granularity, protection strength, and robustness.

TABLE 4

Different ECC modes used in elastic error correction technique

Number of
Capable of
Capable of

Mode
ECC bits
correcting
detecting

0
8 bits
1-bit error
1-bit error

1
41 bits
4-bit errors
5-bit errors

2
51 bits
5-bit errors
6-bit errors

3
61 bits
6-bit errors
7-bit errors

System Architecture and Operation

FIG. 3 shows an exemplary elastic error correction system architecture, in accordance with an embodiment of the present disclosure. System 300 shown in FIG. 3 can include building blocks for facilitating an elastic ECC technique; these building blocks are described below. Memory controller 342 includes an encoder 330 for encoding data 336 from last level cache 334 to DRAM 302, and a decoder 332 for decoding data from DRAM 302 to last level cache 334. Memory controller 342 can include additional features to encoder 330 and decoder 332, so that they can support the three additional ECC modes, i.e., Mode 1, Mode 2, and Mode 3, in addition to the default Mode 0. Memory controller 342 can support an ECC DRAM 302 with 64-bit bus 304 for transferring data and 8-bit bus 306 for transferring the corresponding ECC bits.

Further, system 300 can include additional hardware for an ECC mapping table 318 in memory controller 342. Each entry in ECC mapping table 318 contains the following fields: a 56-bit start address 320, a 56-bit end address 322, a 2-bit previous ECC mode 324, a 2-bit current ECC mode 326, and a 64-bit counter 328. System 300 can allow an operating system to have access to these fields in each entry of ECC mapping table 318 as model specific registers (MSRs). The operating system can read or write to ECC mapping table 318 using instructions rdmsr or wrmsr, respectively. Such a feature allows system 300 to provide flexibility in controlling the ECC.

Start address 320 and end address 322 correspond to a 64 byte cache line address; hence they are 56-bit wide. The address range between start address 320 and end address 322 indicate a physical address range which is to be protected by one of the 4 ECC modes. Memory controller 342 can use counter 328 to track a boundary that separates an address space defined by start address 320 and end address 322 into two regions. A first region in the address space may use current ECC mode 326 and a second region in the address space may use previous ECC mode 324.

Memory controller 342 can further include an ECC mode selector or controller 344. ECC mode controller 344 may continuously monitor the entries in ECC mapping table 318 and may determine an ECC mode for a current DRAM access request. ECC mode controller 344 can also provide an interface that can be used by the operating system to program ECC mapping table 318. Memory controller 342 may enable ECC mode controller 344 to receive an incoming address 314 and a read (RD)/write (WR) command 316 from a host processor or core. ECC mode controller 344 may then translate incoming address 314 into a corresponding DRAM 302 row and column address 310. Further, ECC mode controller 344 can translate RD/WR command 316 into a corresponding DRAM command 308. Address 310 and command 316 are queued in buffer 312 before sending to DRAM 302.

In addition, based on the entries in ECC mapping table 318, ECC mode controller 344 can generate read ECC mode bits and write ECC mode bits. ECC mode controller 344 can send the write ECC mode bits to a buffer 340 with a same number of entries as a buffer 346 that holds the corresponding incoming 64 byte cache line data to be written to DRAM 302. Alternatively, ECC mode controller 344 can queue the read ECC mode bits into a similar buffer 338 that feeds to ECC decoder 332. Memory controller 342 can dequeue the read ECC mode bits whenever a 64 byte data from DRAM 302 has been decoded by ECC decoder 332. Memory controller 342 can use the read ECC mode bits and write ECC mode bits in their corresponding buffers 338 and 340 to synchronize with data traffic coming from DRAM 302 or going out to DRAM 302.

Exemplary Methods for Facilitating Elastic Error Correction

FIG. 4A presents a flowchart 400 illustrating an exemplary process for performing elastic error correction in memory, in accordance with an embodiment of the present disclosure. Flowchart 400 in FIG. 4 describes a process for determining an ECC mode from a current address and entries in the ECC mapping table. During operation, the system may first receive the current address and a RD/WR command from a host processor or a core (operation 402). The system can compare the received current address with each entry in the ECC mapping table (operation 404). Let [start_address(i), end_address(i)] represent an address range specified in an i^thentry in the ECC mapping table, where i represents an index of a matching entry. The start_address(i) and end_address(i) correspond to the start address 320 and end address 322 fields in the i^thentry of the ECC mapping table.

Based on the comparison (operation 404) the system may determine whether the current address belongs to any of the address ranges specified in the ECC mapping table (operation 406). When the system determines that the current address is not included in any of the address ranges specified in the ECC mapping table then the system can select a default ECC mode 0 (operation 408). The default ECC mode 0 represents a <72, 64> Hamming code.

When the system determines that the current address is included in an i^thaddress range [start_address(i), end_address(i)] specified in the ECC mapping table, then the system may further determine if the current address is included in the address range [start_address(i), start_address(i)+counter(i)] which can correspond to a first portion of the i^thaddress range (operation 410). If the condition in operation 410 is not satisfied, then the system can indicate that the current address is still using an ECC mode that was previously used (operation 412). In other words, the system can detect that the ECC mode specified in the current ECC mode field of the i^thentry in the ECC mapping table has not been applied to the data corresponding to the current address.

If the system determines that the condition in 410 is satisfied then the system can further compare the current address with [start_address(i)+counter(i)] (operation 414). When the condition in 414 is not satisfied (i.e., the current address is not on the boundary between the first and the second portions of the i^thaddress range) then the system may set the ECC mode for the current address to a current ECC mode specified in the current ECC mode field of the i^thentry in the ECC mapping table.

Note that the system can use the counter to track a boundary that separates the address space defined by [start_address(i), end_address(i)] into two regions. When the system determines that the current address belongs to the first region or first portion, then the ECC mode can be set to the current ECC mode. Alternatively, the system can use the previous ECC mode when the current address belongs to the second region or second portion of the address space. With the integration of such a counter in the ECC mapping table, the system can be capable of providing a smooth transition between different ECC modes without causing disruption in service.

FIG. 4B is a continuation of flowchart 400 in FIG. 4A, in accordance with an embodiment of the present disclosure. When the system determines that the condition in 414 is satisfied then the system may further determine if the current address is associated with a write operation (operation 418). If the condition in 418 is true then the system may set the write ECC mode to the current ECC mode specified in the i^thentry of the ECC mapping table (operation 420). The system may also increment counter(i) by 1 (operation 420), indicating that range of the memory using the current ECC mode is expanded by one cache block. The system can stop incrementing the counter(i) value when: start_address(i)+counter(i)=end_address(i) (operation 422 and 424). When the condition in 422 is not true then the system may return to operation 404. If the condition in 418 is not true then the system can perform operation 416 and the counter(i) is kept unchanged. Note that while the comparisons and other operations shown in FIG. 4A and FIG. 4B appear to be serialized, they can be carried out in parallel in hardware. Therefore, the additional latency introduced by the system (ECC mode controller) is negligible.

FIG. 5A presents a flowchart 500 illustrating an exemplary process for performing error correction in memory by applying memory scrubbing and ECC mapping table update, in accordance with an embodiment of the present disclosure. During operation, the operating system may periodically monitor the ECC decoding statistics to identify an intensity of the soft errors in different memory ranges (operation 502). The operating system (OS) may use predetermined thresholds, T_i, for transitioning between ECC mode i−1 and ECC mode i, where i is an integer and iϵ[1,3].

During the process of monitoring the ECC decoding statistics, if the OS detects that an error intensity in a memory range of [start_address, end_address] is greater than threshold, T_i, (operation 504) then the OS may first read the ECC mapping table (operation 506) with a read instruction, e.g., rdmsr (read from Model Specific Register). When the OS determines that the address range [start_address, end_address] has been fully or partially included in an entry j of the ECC mapping table (operation 508), then the OS may infer that a current ECC mode in entry j may not be strong enough to address an increase of the soft errors in the near future.

The next steps are shown in FIGS. 5B and 5C which are a continuation of FIG. 5A. Operation 510 in FIG. 5B indicates that the OS may further check if the memories specified in entry j of the ECC mapping table has been completely protected by an ECC mode specified in the current ECC mode field of entry j. Specifically, in operation 510, the OS may compare start_address(j)+counter(j) with end_address(j). If they are equal, then the OS may update the different fields in entry j of the ECC mapping table (operation 512) as follows: previous_ECC_mode(j)=current_ECC_mode(j); current_ECC_mode(j)=i; and counter(j)=0. The previous_ECC_mode(j), current_ECC_mode(j), and counter(j) correspond to the previous_ECC_mode field, current_ECC_mode field, and the counter field, respectively, in the j^thentry of the ECC mapping table.

However, when start_address(j)+counter(j) and end_address(j) are not equal, then the OS may prioritize a memory scrubbing process to complete scrubbing the memory region specified in the entry j of the ECC mapping table (operation 514). After the OS completes the memory scrubbing operation 514, it can perform operation 512.

FIG. 5C is a continuation of FIG. 5A, in accordance with an embodiment of the present disclosure. Note that the start_address and the end_address in each entry in the ECC mapping table is kept unchanged. However, when the OS determines that the condition specified in 508 are not satisfied, i.e., the memory ranges specified in the ECC mapping table can be outside of the input address range. In this case, the OS may update the ECC mapping table by adding a new entry k to the ECC mapping table. Specifically, in the new entry k a corresponding start_address(k) and end_address(k) is set to a new address range; previous_ECC_mode(k) is set to mode 0; the current_ECC_mode(k) is set to mode i; and counter(k) is set to 0 (operations 516 and 518).

In one embodiment of the present disclosure, number entries in the ECC mapping table could be 3, 4, or more depending on the number of regions with different ECC modes the system can support simultaneously. However, if the number of entries is allowed to exceed beyond the threshold value, then this may unnecessarily increase the hardware complexity and may affect the other mechanisms in the system that are designed to increase the DRAM reliability.

The OS can reserve one ECC mapping table entry for the purpose of merging multiple entries to one entry. When memory ranges specified in two entries are within a certain threshold distance in memory then these two entries can be selected for merging. Next, the OS may determine if a selected entry in the ECC mapping table is fully protected with a same ECC mode as that used in other entries with neighboring memory ranges. Further, these entries may have the same settings as indicated in (operations 516 and 518 of FIG. 5C). The OS may then merge all these entries into one entry in the ECC mapping table. Therefore, by including a mechanism for merging table entries, the system can prevent the number of table entries from increasing beyond a threshold value.

To summarize, FIGS. 5A-5C illustrate a typical scenario when elastic ECC process is in operation. Specifically, when the error correction method is in operation, the operating system may periodically monitor the ECC decoding statistics to identify an intensity of the soft errors. If the intensity of the soft errors in a certain memory address range is above a certain threshold, then the operating system may anticipate that the DRAM in that address range can be susceptible to soft errors. Therefore, the system may take proactive actions by applying a stronger multi-bit error correction code to prevent any likely data corruption in the near future. Other proactive actions taken by the system can include: disabling and mapping out problematic DRAM DIMM, or migrating applications to a different node even before the soft errors in memory reaches beyond the error correction capability of ECC-6. Furthermore, the system is capable of achieving elasticity on ECC by providing the flexibility of mapping ECCs with various strengths, i.e., ECC modes, to ranges of DRAM addresses without incurring any overhead on memory capacity.

In addition, the system can be capable of allowing the co-existence of traditional SECDED ECC with different ECC modes at 64 byte data block granularity. The system is also capable of providing a better fine-grained and smooth trade-off between different ECC modes and memory access latency overhead than the known ECC techniques.

Exemplary Computer System and Apparatus

FIG. 6 illustrates an exemplary computer system that facilitates elastic error correction in memory, according to one embodiment of the present disclosure. Computer system 600 includes a processor 602, a memory 604, a storage device 606, and a memory controller 608. Computer system 600 can be coupled to a plurality of peripheral input/output devices 632, e.g., a display device 630, a keyboard 626, and a pointing device 628, and can also be coupled via one or more network interfaces to network 634. Storage device 606 can store an operating system 610 and an error correction system 612.

In one embodiment, error correction system 612 can include instructions, which when executed by processor 602 can cause computer system 600 to perform methods and/or processes described in this disclosure. During operation of computer system 600, error correction system 612 can include instructions for receiving memory access request including a current address and a RD/WR command (communication module 614). Error correction system 612 may further include instructions for analyzing the current address by comparing the current address with each entry in an ECC mapping table to determine whether the current address belongs to any of the address ranges specified in the ECC mapping table (analysis module 616). Error correction system 612 may then select an appropriate ECC mode based on a result of the comparison performed in analysis module 616 (ECC mode selector module 618).

Error correction system 612 may further be configured to update or program the ECC mapping table (ECC mapping table update module 620). Based on the selected ECC mode and an entry in the ECC mapping table, error correction system 612 may generate read ECC mode bits for a read command or write ECC mode bits for a write command. Error correction system 612 may use these read ECC mode bits for decoding data (ECC decoder module 622) and write ECC mode bits encoding data (ECC encoder module 624), respectively. In some embodiments, modules 614-624 can be partially or entirely implemented in hardware and can be part of the processor 602.

FIG. 7 illustrates an exemplary apparatus that facilitates elastic error correction in memory, according to one embodiment of the present disclosure. Apparatus 700 can comprise a plurality of units or apparatuses that may communicate with one another via a wired, wireless, quantum light, or electrical communication channel. Apparatus 700 may be realized using one or more integrated circuits, and may include fewer or more units or apparatuses than those shown in FIG. 7. Further, apparatus 700 may be integrated in a computer system, or realized as a separate device that is capable of communicating with other computer systems and/or devices. Specifically, apparatus 700 can comprise units 702-712, which perform functions or operations similar to modules 614-624 of computer system 600 of FIG. 6, including: a communication unit 702, an analysis unit 704, an ECC mode selector unit 706, an ECC mapping table update unit 708, an ECC decoder unit 710, and an ECC encoder unit 712.

The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above. When a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.

The data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.

Furthermore, the methods and processes described above can be included in hardware modules or apparatus. The hardware modules or apparatus can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), dedicated or shared processors that execute a particular software module or a piece of code at a particular time, and other programmable-logic devices now known or later developed. When the hardware modules or apparatus are activated, they perform the methods and processes included within them.

The foregoing descriptions of embodiments of the present disclosure have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the present disclosure to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method for performing error correction in memory, comprising: receiving a memory access request from a host processor;comparing a memory address specified in the memory access request with a set of entries in an error correction code (ECC) mapping table;determining that the memory address corresponds to at least one entry in the ECC mapping table, determining whether the memory address belongs to a first portion or a second portion of the address range specified in the ECC mapping table entry;in response to the memory address belonging to the first portion and not being the last address in the first portion, selecting a first ECC mode;in response to the memory address being the last address in the first portion and the memory access request being a write request selecting the first ECC mode, and indicating that a range of memory using the first ECC mode is expanded by one cache block; andselecting a second ECC mode in response to the memory address belonging to the second portion; andprocessing the memory access request based on the selected ECC mode.
2. The method of claim 1, wherein each entry in the ECC mapping table includes: a start address field;an end address field;a previous ECC mode field;a current ECC mode field; anda counter field.
3. The method of claim 1, wherein the second ECC mode and the first ECC mode use a class of cyclic error correcting codes that is capable of performing at least one of: a 4-bit error correction and 5-bit error detection;a 5-bit error correction and 6-bit error detection; anda 6-bit error correction and 7-bit error detection.
4. The method of claim 1, wherein a counter field in the ECC mapping table entry tracks a boundary separating the address range into two regions: the first portion of the address range and the second portion of the address range, and wherein the address range is defined by a start address and an end address specified in the ECC mapping table entry.
5. The method of claim 1, wherein the memory includes a dynamic random access memory (DRAM).
6. The method of claim 1, wherein in response to the memory address being the last address in the first portion and the memory access request being a write request updating the ECC mapping table by: setting a write ECC mode to the first ECC mode; andincrementing a value in a counter field of the ECC mapping table entry.
7. The method of claim 1, further comprising: in response to determining that the memory address is not included in the ECC mapping table, selecting a default ECC mode.
8. The method of claim 7, wherein the default ECC mode represents a Hamming code with 64 bits data and 8 bit parity code.
9. A system for performing error correction in memory, comprising: a receiving module configured to receive a memory access request from a host processor, wherein the memory includes a dynamic random access memory (DRAM);an analysis module configured to: compare a memory address specified in the memory access request with a set of entries in an error correction code (ECC) mapping table; anddetermine that the memory address corresponds to at least one entry in the ECC mapping table;determine whether the memory address belongs to a first portion or a second portion of the address range specified in the ECC mapping table entry;an ECC mode selector module configured to: in response to the memory address belonging to the first portion and not being the last address in the first portion selecting the first ECC mode;in response to the memory address being the last address in the first portion and the memory access request being a write request selecting the first ECC mode, and indicating that a range of memory using the first ECC mode is expanded by one cache block; andselect a second ECC mode in response to the memory address belonging to the second portion; anda processing module configured to process the memory access request based on the selected ECC mode.
10. The system of claim 9, wherein each entry in the ECC memory mapping table includes: a start address field;an end address field;a previous ECC mode field;a current ECC mode field; anda counter field.
11. The system of claim 9, wherein the second ECC mode and the first ECC mode use a class of cyclic error correcting codes for performing at least one of: a 4-bit error correction and 5-bit error detection;a 5-bit error correction and 6-bit error detection; anda 6-bit error correction and 7-bit error detection.
12. The system of claim 9, wherein the ECC mode selector module is further configured to: select a default ECC mode in response to determining that the memory address is not included in the ECC mapping table.
13. The system of claim 9, wherein a counter field in the ECC mapping table entry tracks a boundary separating the address range into two regions: the first portion of the address range and the second portion of the address range, and wherein the address range is defined by a start address and an end address specified in the ECC mapping table entry.
14. The system of claim 9, wherein an ECC mapping table update module is configured to update the ECC mapping table in response to determining that the memory access request is the write request by: setting a write ECC mode to the first ECC mode; andincrementing a value in the counter field of the ECC mapping table entry.
15. An apparatus for performing error correction in memory, comprising: one or more processors; anda storage medium storing instructions that, when executed by the one or more processors, cause the apparatus to perform a method comprising: monitoring error correction code (ECC) decoding statistics to determine intensity of soft errors in different address ranges in memory; andin response to determining that intensity of soft errors in an address range is greater than a corresponding threshold: reading an ECC mapping table; andin response to determining that the address range is fully or partially included in an entry of the ECC mapping table, prioritizing memory scrubbing when the address range is not completely protected with a current ECC mode specified in a current ECC mode field of the entry in the ECC mapping table; andupdating a previous ECC mode field in the entry of the ECC mapping table with the current ECC mode.
16. The apparatus of claim 15, further comprising: setting, based on the threshold, the current ECC mode field to a new mode, wherein the new mode has a higher strength than the mode in the updated previous ECC mode field; andresetting a counter field in the entry of the ECC mapping table.
17. The apparatus of claim 15, the method further comprising: in response to determining that the address range is not included in any entry of the ECC mapping table, adding a new entry to the ECC mapping table;setting a previous ECC mode field in the new entry to a default mode;setting, based on the threshold, the current ECC mode field in the new entry to a new mode, wherein the new mode has higher strength than the default mode; andresetting a counter field in the entry of the ECC mapping table.
18. The apparatus of claim 15, wherein each entry in the ECC mapping table includes: a start address field;an end address field;a previous ECC mode field;the current ECC mode field; anda counter field.
19. The apparatus of claim 15, wherein the memory includes a dynamic random access memory (DRAM).
20. The apparatus of claim 15, wherein the threshold being a first threshold, a second threshold, or a third threshold; and wherein: when the intensity of soft errors exceeds the first threshold the current ECC mode field is set to mode 1;when the intensity of soft errors exceeds the second threshold the current ECC mode field is set to mode 2; andwhen the intensity of soft errors exceeds the third threshold, the current ECC mode field is set to mode 3.

US Referenced Citations (316)

Number	Name	Date	Kind
3893071	Bossen	Jul 1975	A
4562494	Bond	Dec 1985	A
4718067	Peters	Jan 1988	A
4775932	Oxley	Oct 1988	A
4858040	Hazebrouck	Aug 1989	A
5394382	Hu	Feb 1995	A
5602693	Brunnett	Feb 1997	A
5732093	Huang	Mar 1998	A
5802551	Komatsu	Sep 1998	A
5930167	Lee	Jul 1999	A
6098185	Wilson	Aug 2000	A
6148377	Carter	Nov 2000	A
6226650	Mahajan et al.	May 2001	B1
6243795	Yang	Jun 2001	B1
6457104	Tremaine	Sep 2002	B1
6658478	Singhal	Dec 2003	B1
6795894	Neufeld	Sep 2004	B1
7351072	Muff	Apr 2008	B2
7565454	Zuberi	Jul 2009	B2
7599139	Bombet	Oct 2009	B1
7953899	Hooper	May 2011	B1
7958433	Yoon	Jun 2011	B1
8085569	Kim	Dec 2011	B2
8144512	Huang	Mar 2012	B2
8166233	Schibilla	Apr 2012	B2
8260924	Koretz	Sep 2012	B2
8281061	Radke	Oct 2012	B2
8452819	Sorenson, III	May 2013	B1
8516284	Chan	Aug 2013	B2
8527544	Colgrove	Sep 2013	B1
8751763	Ramarao	Jun 2014	B1
8825937	Atkisson	Sep 2014	B2
8868825	Hayes	Oct 2014	B1
8904061	O'Brien, III	Dec 2014	B1
8949208	Xu	Feb 2015	B1
9015561	Hu	Apr 2015	B1
9043545	Kimmel	May 2015	B2
9088300	Chen	Jul 2015	B1
9092223	Pani	Jul 2015	B1
9129628	Fallone	Sep 2015	B1
9141176	Chen	Sep 2015	B1
9208817	Li	Dec 2015	B1
9280472	Dang	Mar 2016	B1
9280487	Candelaria	Mar 2016	B2
9311939	Malina	Apr 2016	B1
9336340	Dong	May 2016	B1
9436595	Benitez	Sep 2016	B1
9529601	Dharmadhikari	Dec 2016	B1
9588698	Karamcheti	Mar 2017	B1
9588977	Wang	Mar 2017	B1
9607631	Rausch	Mar 2017	B2
9747202	Shaharabany	Aug 2017	B1
9852076	Garg	Dec 2017	B1
9875053	Frid	Jan 2018	B2
9946596	Hashimoto	Apr 2018	B2
10013169	Fisher	Jul 2018	B2
10199066	Feldman	Feb 2019	B1
10229735	Natarajan	Mar 2019	B1
10235198	Qiu	Mar 2019	B2
10318467	Barzik	Jun 2019	B2
10361722	Lee	Jul 2019	B2
10437670	Koltsidas	Oct 2019	B1
10642522	Shu	May 2020	B2
10649657	Zaidman	May 2020	B2
20010032324	Slaughter	Oct 2001	A1
20020010783	Primak	Jan 2002	A1
20020039260	Kilmer	Apr 2002	A1
20020073358	Atkinson	Jun 2002	A1
20020095403	Chandrasekaran	Jul 2002	A1
20020161890	Chen	Oct 2002	A1
20030074319	Jaquette	Apr 2003	A1
20030145274	Hwang	Jul 2003	A1
20030163594	Aasheim	Aug 2003	A1
20030163633	Aasheim	Aug 2003	A1
20030217080	White	Nov 2003	A1
20040010545	Pandya	Jan 2004	A1
20040066741	Dinker	Apr 2004	A1
20040103238	Avraham	May 2004	A1
20040143718	Chen	Jul 2004	A1
20040255171	Zimmer	Dec 2004	A1
20040268278	Hoberman	Dec 2004	A1
20050038954	Saliba	Feb 2005	A1
20050097126	Cabrera	May 2005	A1
20050149827	Lambert	Jul 2005	A1
20050174670	Dunn	Aug 2005	A1
20050177672	Rao	Aug 2005	A1
20050177755	Fung	Aug 2005	A1
20050195635	Conley	Sep 2005	A1
20050235067	Creta	Oct 2005	A1
20050235171	Igari	Oct 2005	A1
20060031709	Hiraiwa	Feb 2006	A1
20060156012	Beeson	Jul 2006	A1
20070033323	Gorobets	Feb 2007	A1
20070061502	Lasser	Mar 2007	A1
20070101096	Gorobets	May 2007	A1
20070250756	Gower	Oct 2007	A1
20070283081	Lasser	Dec 2007	A1
20070283104	Wellwood	Dec 2007	A1
20070285980	Shimizu	Dec 2007	A1
20080034154	Lee	Feb 2008	A1
20080065805	Wu	Mar 2008	A1
20080082731	Karamcheti	Apr 2008	A1
20080112238	Kim	May 2008	A1
20080163033	Yim	Jul 2008	A1
20080301532	Uchikawa	Dec 2008	A1
20090006667	Lin	Jan 2009	A1
20090089544	Liu	Apr 2009	A1
20090113219	Aharonov	Apr 2009	A1
20090125788	Wheeler	May 2009	A1
20090183052	Kanno	Jul 2009	A1
20090282275	Yermalayeu	Nov 2009	A1
20090287956	Flynn	Nov 2009	A1
20090307249	Koifman	Dec 2009	A1
20090310412	Jang	Dec 2009	A1
20100031000	Flynn	Feb 2010	A1
20100169470	Takashige	Jul 2010	A1
20100217952	Iyer	Aug 2010	A1
20100229224	Etchegoyen	Sep 2010	A1
20100241848	Smith	Sep 2010	A1
20100321999	Yoo	Dec 2010	A1
20100325367	Kornegay	Dec 2010	A1
20100332922	Chang	Dec 2010	A1
20110031546	Uenaka	Feb 2011	A1
20110055458	Kuehne	Mar 2011	A1
20110055471	Thatcher	Mar 2011	A1
20110072204	Chang	Mar 2011	A1
20110099418	Chen	Apr 2011	A1
20110153903	Hinkle	Jun 2011	A1
20110161784	Selinger	Jun 2011	A1
20110191525	Hsu	Aug 2011	A1
20110218969	Anglin	Sep 2011	A1
20110231598	Hatsuda	Sep 2011	A1
20110239083	Kanno	Sep 2011	A1
20110252188	Weingarten	Oct 2011	A1
20110258514	Lasser	Oct 2011	A1
20110292538	Haga	Dec 2011	A1
20110299317	Shaeffer	Dec 2011	A1
20110302353	Confalonieri	Dec 2011	A1
20120039117	Webb	Feb 2012	A1
20120084523	Littlefield	Apr 2012	A1
20120089774	Kelkar	Apr 2012	A1
20120096330	Przybylski	Apr 2012	A1
20120117399	Chan	May 2012	A1
20120147021	Cheng	Jun 2012	A1
20120159099	Lindamood	Jun 2012	A1
20120159289	Piccirillo	Jun 2012	A1
20120173792	Lassa	Jul 2012	A1
20120203958	Jones	Aug 2012	A1
20120210095	Nellans	Aug 2012	A1
20120233523	Krishnamoorthy	Sep 2012	A1
20120246392	Cheon	Sep 2012	A1
20120278579	Goss	Nov 2012	A1
20120284587	Yu	Nov 2012	A1
20120324312	Moyer	Dec 2012	A1
20120331207	Lassa	Dec 2012	A1
20130013880	Tashiro	Jan 2013	A1
20130024605	Sharon	Jan 2013	A1
20130054822	Mordani	Feb 2013	A1
20130061029	Huff	Mar 2013	A1
20130073798	Kang	Mar 2013	A1
20130080391	Raichstein	Mar 2013	A1
20130145085	Yu	Jun 2013	A1
20130145089	Eleftheriou	Jun 2013	A1
20130151759	Shim	Jun 2013	A1
20130159251	Skrenta	Jun 2013	A1
20130159723	Brandt	Jun 2013	A1
20130166820	Batwara	Jun 2013	A1
20130173845	Aslam	Jul 2013	A1
20130191601	Peterson	Jul 2013	A1
20130219131	Alexandron	Aug 2013	A1
20130238955	D'Abreu et al.	Sep 2013	A1
20130254622	Kanno	Sep 2013	A1
20130318283	Small	Nov 2013	A1
20130318395	Kalavade	Nov 2013	A1
20140006688	Yu	Jan 2014	A1
20140019650	Li	Jan 2014	A1
20140025638	Hu	Jan 2014	A1
20140082273	Segev	Mar 2014	A1
20140095827	Wei	Apr 2014	A1
20140108414	Stillerman	Apr 2014	A1
20140181532	Camp	Jun 2014	A1
20140195564	Talagala	Jul 2014	A1
20140233950	Luo	Aug 2014	A1
20140250259	Ke	Sep 2014	A1
20140279927	Constantinescu	Sep 2014	A1
20140304452	De La Iglesia	Oct 2014	A1
20140310574	Yu	Oct 2014	A1
20140359229	Cota-Robles	Dec 2014	A1
20140365707	Talagala	Dec 2014	A1
20150019798	Huang	Jan 2015	A1
20150082317	You	Mar 2015	A1
20150106556	Yu	Apr 2015	A1
20150106559	Cho	Apr 2015	A1
20150121031	Feng	Apr 2015	A1
20150142752	Chennamsetty	May 2015	A1
20150199234	Choi	Jul 2015	A1
20150227316	Warfield	Aug 2015	A1
20150234845	Moore	Aug 2015	A1
20150269964	Fallone	Sep 2015	A1
20150277937	Swanson	Oct 2015	A1
20150294684	Qjang	Oct 2015	A1
20150301964	Brinicombe	Oct 2015	A1
20150304108	Obukhov	Oct 2015	A1
20150347025	Law	Dec 2015	A1
20150363271	Haustein	Dec 2015	A1
20150363328	Candelaria	Dec 2015	A1
20150372597	Luo	Dec 2015	A1
20160014039	Reddy	Jan 2016	A1
20160026575	Samanta	Jan 2016	A1
20160041760	Kuang	Feb 2016	A1
20160048341	Constantinescu	Feb 2016	A1
20160077749	Ravimohan	Mar 2016	A1
20160077968	Sela	Mar 2016	A1
20160098344	Gorobets	Apr 2016	A1
20160098350	Tang	Apr 2016	A1
20160103631	Ke	Apr 2016	A1
20160110254	Cronie	Apr 2016	A1
20160154601	Chen	Jun 2016	A1
20160155750	Yasuda	Jun 2016	A1
20160162187	Lee	Jun 2016	A1
20160179399	Melik-Martirosian	Jun 2016	A1
20160188223	Camp	Jun 2016	A1
20160188890	Naeimi	Jun 2016	A1
20160203000	Parmar	Jul 2016	A1
20160232103	Schmisseur	Aug 2016	A1
20160234297	Ambach	Aug 2016	A1
20160239074	Lee	Aug 2016	A1
20160239380	Wideman	Aug 2016	A1
20160274636	Kim	Sep 2016	A1
20160306853	Sabaa	Oct 2016	A1
20160321002	Jung	Nov 2016	A1
20160342345	Kankani	Nov 2016	A1
20160343429	Nieuwejaar	Nov 2016	A1
20160350002	Vergis	Dec 2016	A1
20160350385	Poder	Dec 2016	A1
20160364146	Kuttner	Dec 2016	A1
20170004037	Park	Jan 2017	A1
20170010652	Huang	Jan 2017	A1
20170075583	Alexander	Mar 2017	A1
20170075594	Badam	Mar 2017	A1
20170091110	Ash	Mar 2017	A1
20170109199	Chen	Apr 2017	A1
20170109232	Cha	Apr 2017	A1
20170147499	Mohan	May 2017	A1
20170161202	Erez	Jun 2017	A1
20170162235	De	Jun 2017	A1
20170168986	Sajeepa	Jun 2017	A1
20170177217	Kanno	Jun 2017	A1
20170177259	Motwani	Jun 2017	A1
20170199823	Hayes	Jul 2017	A1
20170212708	Suhas	Jul 2017	A1
20170220254	Warfield	Aug 2017	A1
20170221519	Matsuo	Aug 2017	A1
20170228157	Yang	Aug 2017	A1
20170242722	Qiu	Aug 2017	A1
20170249162	Tsirkin	Aug 2017	A1
20170262176	Kanno	Sep 2017	A1
20170262178	Hashimoto	Sep 2017	A1
20170262217	Pradhan	Sep 2017	A1
20170269998	Sunwoo	Sep 2017	A1
20170285976	Durham	Oct 2017	A1
20170286311	Juenemann	Oct 2017	A1
20170322888	Booth	Nov 2017	A1
20170344470	Yang	Nov 2017	A1
20170344491	Pandurangan	Nov 2017	A1
20170353576	Guim Bernat	Dec 2017	A1
20180024772	Madraswala	Jan 2018	A1
20180024779	Kojima	Jan 2018	A1
20180033491	Marelli	Feb 2018	A1
20180052797	Barzik	Feb 2018	A1
20180067847	Oh	Mar 2018	A1
20180074730	Inoue	Mar 2018	A1
20180076828	Kanno	Mar 2018	A1
20180088867	Kaminaga	Mar 2018	A1
20180107591	Smith	Apr 2018	A1
20180143780	Cho	May 2018	A1
20180150640	Li	May 2018	A1
20180165038	Authement	Jun 2018	A1
20180167268	Liguori	Jun 2018	A1
20180173620	Cen	Jun 2018	A1
20180188970	Liu	Jul 2018	A1
20180189182	Wang	Jul 2018	A1
20180212951	Goodrum	Jul 2018	A1
20180226124	Perner	Aug 2018	A1
20180232151	Badam	Aug 2018	A1
20180270110	Chugtu	Sep 2018	A1
20180293014	Ravimohan	Oct 2018	A1
20180300203	Kathpal	Oct 2018	A1
20180329776	Lai	Nov 2018	A1
20180336921	Ryun	Nov 2018	A1
20180349396	Blagojevic	Dec 2018	A1
20180356992	Lamberts	Dec 2018	A1
20180373428	Kan	Dec 2018	A1
20180373655	Liu	Dec 2018	A1
20180373664	Vijayrao	Dec 2018	A1
20190012111	Li	Jan 2019	A1
20190065085	Jean	Feb 2019	A1
20190073261	Halbert	Mar 2019	A1
20190073262	Chen	Mar 2019	A1
20190087115	Li	Mar 2019	A1
20190087328	Kanno	Mar 2019	A1
20190171532	Abadi	Jun 2019	A1
20190205206	Hornung	Jul 2019	A1
20190227927	Miao	Jul 2019	A1
20190272242	Kachare	Sep 2019	A1
20190278654	Kaynak	Sep 2019	A1
20190339998	Momchilov	Nov 2019	A1
20190377632	Oh	Dec 2019	A1
20190377821	Pleshachkov	Dec 2019	A1
20190391748	Li	Dec 2019	A1
20200004456	Williams	Jan 2020	A1
20200004674	Williams	Jan 2020	A1
20200013458	Schreck	Jan 2020	A1
20200042223	Li	Feb 2020	A1
20200097189	Tao	Mar 2020	A1
20200159425	Flynn	May 2020	A1

Foreign Referenced Citations (4)

Number	Date	Country
2003022209	Jan 2003	JP
2011175422	Sep 2011	JP
9418634	Aug 1994	WO
1994018634	Aug 1994	WO

Non-Patent Literature Citations (12)

Entry
https://web.archive.org/web/20071130235034/http://en.wikipedia.org:80/wiki/logical_block_addressing wikipedia screen shot retriefed on wayback Nov. 20, 2007 showing both physical and logical addressing used historically to access data on storage devices (Year: 2007).
Ivan Picoli, Carla Pasco, Bjorn Jonsson, Luc Bouganim, Philippe Bonnet. “uFLIP-OC: Understanding Flash I/O Patterns on Open-Channel Solid-State Drives.” APSys'17, Sep. 2017, Mumbai, India, pp. 1-7, 2017, <10.1145/3124680.3124741>. <hal-01654985>.
EMC Powerpath Load Balancing and Failover Comparison with native MPIO operating system solutions. Feb. 2011.
Tsuchiya, Yoshihiro et al. “DBLK: Deduplication for Primary Block Storage”, MSST 2011, Denver, CO, May 23-27, 2011 pp. 1-5.
Chen Feng, et al. “CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Devices”> FAST '11, San Jose, CA Feb. 15-17, 2011, pp. 1-14.
Wu, Huijun et al. “HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud”, Cornell Univ. arXiv: 1702.08153v2[cs.DC], Apr. 16, 2017, pp. 1-14https://www.syncids.com/#.
WOW: Wise Ordering for Writes—Combining Spatial and Temporal Locality in Non-Volatile Caches by Gill (Year: 2005).
Helen H. W. Chan et al. “HashKV: Enabling Efficient Updated in KV Storage via Hashing”, https://www.usenix.org/conference/atc18/presentation/chan, (Year: 2018).
S. Hong and D. Shin, “NAND Flash-Based Disk Cache Using SLC/MLC Combined Flash Memory,” 2010 International Workshop on Storage Network Architecture and Parallel I/Os, Incline Village, NV, 2010, pp. 21-30.
Arpaci-Dusseau et al. “Operating Systems: Three Easy Pieces”, Originally published 2015; Pertinent: Chapter 44; flash-based SSDs, available at http://pages.cs.wisc.edu/˜remzi/OSTEP/.
Jimenex, X., Novo, D. and P. Ienne, “Pheonix:Reviving MLC Blocks as SLC to Extend NAND Flash Devices Lifetime,” Design, Automation & Text in Europe Conference & Exhibition (DATE), 2013.
Yang, T. Wu, H. and W. Sun, “GD-FTL: Improving the Performance and Lifetime of TLC SSD by Downgrading Worn-out Blocks,” IEEE 37th International Performance Computing and Communications Conference (IPCCC), 2018.

Related Publications (1)

	Number	Date	Country
	20210294692 A1	Sep 2021	US

System and method for facilitating elastic error correction code in memory

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC