Descriptions are generally related to computer memory, and more particular descriptions are related to RAS-based domains for memory.
There are a variety of computer memory technologies with varying techniques for preventing and mitigating errors; however, memory errors continue to be a hindrance to achieving desired system performance and uptime. This is especially true for the owners and operators of large data centers and cloud service providers (e.g., hyperscalers), which perform large memory application deployments. With the rapid growth in data volumes and the need to access data at memory speeds (e.g., for real time analytics, artificial intelligence (AI), etc.), the demand for memory capacity continues to increase across multiple domains in the cloud. With the increasing demand for capacity usage, there has been heightened focus on memory reliability and application recoverability.
The following description includes discussion of figures having illustrations given by way of example of an implementation. The drawings should be understood by way of example, and not by way of limitation. As used herein, references to one or more examples are to be understood as describing a particular feature, structure, or characteristic included in at least one implementation of the invention. Phrases such as “in one example” or “in an alternative example” appearing herein provide examples of implementations of the invention, and do not necessarily all refer to the same implementation. However, they are also not necessarily mutually exclusive.
Descriptions of certain details and implementations follow, including non-limiting descriptions of the figures, which may depict some or all examples, and well as other potential implementations.
As described herein, RAS-based memory domains can enable applications to store data in memory domains having different degrees of reliability to reduce downtime and data corruption due to memory errors. In one example, memory resources are classified into different RAS-based memory domains based on their expected likelihood of encountering errors. The mapping of memory resources into RAS-based memory domains can be dynamically managed and updated when information indicative of reliability (such as the occurrence of errors or other information) suggests that a memory resource is becoming less reliable. The RAS-based memory domains can be exposed to applications to enable applications to allocate memory in high reliability memory for critical data.
Memory errors have become a leading focus area for hyperscalers with large memory application deployments. For example, database as a service (DBaaS) deployments involving large amounts of memory per node can result in statistically higher memory error rates per node.
Some existing techniques attempt to reduce downtime due to memory errors; however, there have been some challenges with full application stack adoption. For example,
In the case of uncorrectable errors in pages being used by applications, there is more complexity in handling the errors. In some cases, the data can be reconstructed; however, enterprise-class usage typically require full MCA-R (machine check recovery). Implementing a full MCA-R solution has been challenging due to the complexity of covering every single use case. For example, some regions of memory (e.g., database tables) are duplicated (e.g., in a disk) but others are no. Also, some regions of memory may be under a lock or be operated on by multiple threads, in which case an uncorrectable error typically needs all the threads accessing that line to be notified. In some cases, the data cannot be reconstructed by the application 106, and the application 106 is terminated. Thus, the techniques for handling uncorrectable errors in pages being used by applications can be especially complex and require adoption across the entire software stack.
In addition to reactive techniques that handle memory errors after they occur, other solutions seek to predict memory failures. It is all but certain that we have memory errors, including uncorrectable ones. As different memory regions get used and accessed differently, over time, the probability of bit errors can increase. Memory failure prediction can be a function of usage, temperature, etc. As we get to more diverse landscapes, including memory pools, or greater numbers of DIMMs in a platform (e.g., tens of DIMMs or more), it becomes evident that all memory regions are not equal, and there existing techniques to identify where memory errors are more likely to occur. However, existing memory error prediction schemes fail to consider that not all data used by an application have the same tolerance for errors.
Where (from an application memory usage standpoint) uncorrectable memory errors occur matters in terms of the impact of the errors. Let us consider the following example, where there are different memory regions used by an application. For example,
Given that not all memory regions are the same and that it is possible to know where memory errors are more likely to occur, techniques described herein enable data to be stored in different reliability memory domains. With at least two different reliability memory domains, error-sensitive data (e.g., data that cannot be recovered or that is recoverable but with significant performance impacts such as regions 204A and 204B of
In one example, memory regions or domains are categorized (e.g., through prediction) and carved out that are less likely to fail and more likely to fail, and exposed (e.g., as different reliability flags in the RAS-based memory domains). Note that domains do not necessarily have physically contiguous addresses within the domain; thus, hardware and/or software tracks address mapping for the different RAS-based memory domains. Further, RAS mitigation mechanisms like Adaptive Double DRAM Device Correction (ADDDC) and aggressive usage of ECC can be implemented in domains where it is desired to guarantee a higher RAS capability.
In order to take advantage of different RAS-based memory domains, in one example, applications can request higher or lower reliability memory when allocating memory. For example, while making an mmap( )or malloc( )function call, applications can pass a bit or flag specifying to the hardware that a given memory region is critical or non-critical from a RAS/reliability perspective. Adding a field or flag to memory allocation functions gives the OS and HW a hook to manage the RAS-based memory domains to ensure that critical memory regions (such as regions 204A and 204B of
In one example, the operating system manages the RAS-based memory domains. For example,
The CPU 302 includes one or more caching agents 304. The caching agent(s) include hardware logic (e.g., circuitry) to ensure cache coherency of the system 300. The CPU 302 coupled with memory resources including a pooled memory node 326, local memory 328, and other devices or memory 330. The pooled memory node (or a memory pool) 326 represents memory resources (e.g., typically remote memory resources) made available to the system 300. In one example, the pooled memory node 326 is a server (e.g., a memory server or memory node) with a large capacity of memory resources that are made available to other systems or servers. In one example, the pooled memory node 326 includes one or more pooled memory drawers/sleds/chassis in a rack to provide a memory pool. In one example, the pooled memory node 326 is coupled with the CPU 302 via a Compute Express Link (CXL).
In the illustrated example, the system 300 also includes local memory 328 (such as DRAM DIMMs coupled with the CPU 302), and other memory 330. Other memory may include, for example, another processor's local memory, or other memory that is accessible by the CPU 302.
In the example illustrated in
The local memory 328 is coupled with the CPU 302 via the controller 322. In one example, the controller 322 includes a DRAM memory controller. In one such example, controller 322 is an integrated memory controller (iMC) that is integrated into the CPU 302. Other memory devices 330 may also be coupled with the CPU via a controller 324. Although the system 300 depicts a controller for each of the memory resources 326, 328, and 330, in other examples, a system can include a single memory controller to control multiple memory resources. Additionally, although a single controller is shown for each of the memory resources 326, 328, and 330, it will be understood that different and/or additional components can couple the memory resources 326, 328, and 330 with the CPU 302 (e.g., fabric managers, switches, root ports, or other components).
The memory resources are mapped to a physical address range 301 to provide system memory for the system 300. In an example in which pooled memory makes up part of the system memory, the physical address range 301 is a distributed coherent address range (e.g., distributed CXL coherent address range). The example in
In one example, the operating system 309 includes prediction logic 310 to make predictions regarding the likelihood of errors in the memory resources 326, 328, and 330, error monitoring logic 312 to monitor errors in the memory resources 326, 328, and 330, RAS features management logic 316 to track RAS features of the memory resources 326, 328, and 330, and NUMA RAS logic 317 to determine which memory resources 326, 328, 330 to assign to the different RAS-based memory domains based on RAS capabilities and/or error monitoring. In one example, for the high and medium domains (both using advanced RAS features), the operating system monitors the occurrence of correctable errors being detected in the medias where those type of memory ranges are mapped. When those numbers surpass specific thresholds (e.g., configurable percentages or other thresholds) or prediction logic in the OS, memory, or the controllers indicate that there is a likelihood of uncorrectable errors, the operating system will remap and move these memory ranges to other medias (e.g., pooled memory or other memory available in the system) that have a lower percentage or rate of correctable errors. For example, data migration logic 314 can move data and trigger address remapping when a memory resource is moved to a different domain.
To enable applications to take advantage of the RAS-based memory domains, in one example, the operating system and CPU offer a new interface that allows to the software stacks to allocate memory regions with certain reliability. For example, the OS 309 includes domain-aware memory allocation logic 318 to enable applications to allocate memory in the high RAS memory domain 332 for critical data. For example, the operating system provides a new type of allocation function (such as malloc( ) or mmap( )) or type of API (application programming interface) that allows applications to specify what type of RAS memory domain the allocated memory range should be mapped to. In one such example, the parameter is a reliability flag, or RAS memory domain flag or field, that enables applications to request memory with high reliability. A reliability or RAS memory domain parameter can also be added to existing APIs.
Although
The memory controller 408 includes input/output (1/0) interface circuitry 409 to enable the memory controller 408 to interface with the memory resources 326, 328, and 330, the CPU 302. In the example illustrated in
The actual mapping and RAS-based domain information can be implemented in the address decoding logic in the caching agents and the memory controller, in the operating system, or in a combination of hardware and the operating system. In one example, a set of new bits of the system address decoder (e.g., address decoder 305 in the caching agents 304/CPU 302) and address decoder of the memory controller 408 to understand what system physical address ranges in the CPU domain are mapped into a specific ranges for the RAS-based memory domains. In one example, the system address decoder and the physical address decoder do not need to include additional bits to indicate which RAS-based domains the physical addresses are mapped to, and instead, the operating system tracks which RAS-based memory domains the physical address ranges are in. For example, the page tables can include the type of RAS memory domain that they are mapped to and may also track replication. In another example, both the operating system and hardware track which physical address ranges correspond to which RAS-based memory domains.
In addition to tracking which physical address ranges correspond to which RAS-based memory domains, in one example, the RAS capabilities of the memory resources are identified and used to determine which domain to assign the memory resources to. For example, the table 540 is an example of RAS feature data tracked for the memory resources. In the example illustrated in
In one example, a list of available RAS features that can be configured and/or a list of potential knobs (e.g., configurable options) for each RAS feature is tracked. Examples of RAS features include, error correction code (ECC) capabilities, single device data correction (SDDC), adaptive double DRAM device correction (ADDDC), advanced error detection and correction (AEDC), local machine check exceptions (LMCE), sparing, memory mirroring, and other RAS features. Examples of RAS feature configurable options (e.g., include enabling or disabling RAS features, the granularity at which features are supported, and other configurable options. In one example, each RAS feature can be enabled or disabled, and some RAS features may include other configurable options, such as the granularity at which the feature is applied. RAS features help mitigate reliability problems. Thus, memory with higher reliability (higher RAS) is less likely to need RAS features enabled; conversely RAS features are critical for memory with poor reliability (lower RAS).
Different RAS features can have different levels of complexity and effectiveness, and logic in the memory controller and/or operating system (e.g., the RAS feature management logic 416 and/or prediction logic 410 of
In addition to determining the likelihood of errors in a memory resource based on the RAS features and configurations, actual observed errors and other RAS telemetry data can be used to determine the likelihood of errors in a memory resource. For example, error monitoring logic in the memory controller and/or operating system (e.g., error monitoring logic 412 of
In one example, memory resources provide information regarding encountered errors including one or more of: the memory range or list of sub-medias (e.g., ranks) in which an error occurred, the current percentage of correctable errors, the current percentage of uncorrectable errors, and/or other types of RAS telemetry data. Error data can be stored by the memory controller and/or operating system and used by the prediction logic (e.g., prediction logic 410 of
In one example, prediction logic (e.g., prediction logic 410 of
In one example, once a likelihood of failure is detected (or once there is a change in the likelihood of errors that exceeds a threshold), the information is provided to NUMA RAS logic 418 (e.g., the NUMA RAS logic 418 of
In one example, when media is moved to a lower RAS-based memory domain due to media becoming less reliable, data may need to be moved in addition to reconfiguring the mapping of physical addresses to memory resources. Depending on implementation, remapping of memory ranges and data migration can be triggered by the operating system or the memory controller. In one example, the memory controller 408 includes an interface (e.g., one of the interfaces 409) that allows the memory controller to specify that a memory range is moved from one media(s) to another media(s) because there is a likelihood the media is becoming less resilient and memory ranges in the high and medium RAS memory domains that are mapped there need to be moved. In one such example, such an interface allows the memory controller to provide a memory range or a list of memory ranges being remapped (e.g., in case of interleaving across multiple DIMMS), and the new media mapped into each memory range. During the data movement, the caching agents 304 can block accesses to those regions and unblock and execute the accesses once the memory controller 408 acknowledges that data has been moved. Alternatively, as data gets moved, the memory controller can provide this feedback to the caching agents, which may be a beneficial approach when large data blocks are being moved.
Turning first to
The information to indicate reliability of the memory resource can be received in response to the memory resource being added as an available memory resource to the system (e.g., in response to the memory resource being hot plugged into the system or otherwise added to the system), or in response to some other trigger, such as an error being encountered, an error metric exceeding a threshold, a temperature metric exceeding a threshold, or some other trigger. Referring to
Referring again to
The method then involves classifying the memory resource into one of multiple RAS-based memory domains based on the likelihood of errors in the memory resource, at block 606. For example, NUMA RAS logic (e.g., the NUMA RAS logic 317 of
The RAS-based memory domains include at least two domains, where one RAS memory domain represents higher reliability or RAS than the other (e.g., a higher RAS memory domain and a lower RAS memory domain, where the lower RAS memory indicates reliability relative to the higher RAS memory domain and not necessarily low RAS capabilities). As illustrated in
For example,
While monitoring the memory resources for errors, the method 700 involves receiving information regarding errors encountered in a memory resource or other information indicative of reliability, at block 704. For example, error monitoring logic (e.g., the error monitoring logic 312 of
After reclassifying a memory resource to a lower RAS memory domain, data migration 711 may be needed to ensure the data at the reclassified memory locations is moved to a sufficiently reliable location. For example, data migration logic (e.g., data migration logic 314 of
The method 700 also involves updating reliability information (e.g., a field or flag indicating the RAS memory domain) in a page table for physical addresses moved to a different RAS-based memory domain, at block 716. For example, in an implementation in which the page tables include one or more bits to indicate which RAS memory domain a physical page is in, when the physical page is moved to a different domain, the operating system updates the page table entry for that physical page to reflect the reclassification.
The physical address space 1009 is assigned or classified into RAS memory domains. In the example of
Referring to
The method 800 begins when a memory controller detects that a memory resource is added to the system (e.g., hot plugged), at block 802. The memory controller registers the RAS features of the newly added memory resource and notifies the operating system of the RAS features (e.g., by interrupting the operating system). The operating system receives notification of the memory resource and its RAS features, at block 804. The operating system classifies the memory resource into one of multiple RAS-based domains, at block 806. The operating system triggers reconfiguration of physical address-to memory mapping, at block 808. For example, the operating system causes the CPU to update system address decoders to remap physical addresses in the assigned RAS-based domain to a memory controller/CXL controller, at block 810, and the memory controller or CXL controller to map physical addresses in the assigned RAS-based memory domain to the memory resources, at block 812.
The operating system updates page tables so that the entries for physical pages mapped to the newly added memory resources indicate the assigned RAS-based memory domain, at block 814. The memory controller and operating system monitor the occurrence of errors, at blocks 816 and 818. When the memory controller detects an error, it notifies the operating system of the detected errors, at block 820. The operating system receives the error data, at block 822, and determines the likelihood of errors based on the received data, at block 824. If the likelihood of errors exceeds a threshold, the operating system reclassifies the memory range with the errors into a different RAS-based domain, at block 826. The operating system then triggers data migration and remapping, at block 828. For example, the operating system causes data to be moved from the reclassified locations into locations in the higher RAS memory domain, remap affected memory ranges (e.g., by updating system address decoders at block 810 and mapping in the memory or CXL controllers at block 812), and update page tables as needed.
The request can also include a parameter to indicate whether the requested reliability is strict or preferred. In one such example, a memory allocation request with a strict reliability request will return NULL (or another value to indicate failure to allocate memory with the requested reliability) if memory in the requested RAS-based memory domain is unavailable. In one such example, a memory allocation request with a preferred reliability request can allocate memory in a non-preferred RAS-based memory domain if memory in the preferred RAS-based memory domain is unavailable. For example, consider a request that is received to allocate memory with parameters indicating “high reliability” and “preferred.” If the operating system is unable to allocate memory in the highest RAS memory domain, the operating system can attempt to allocate memory in the next highest RAS memory domain, and so forth, until the operating system is able to allocate memory.
Referring again to
Thus, the methods in
The processor 1110 represents a processing unit of a computing platform that may execute an operating system (OS) and applications, which can collectively be referred to as the host or the user of the memory. The OS and applications execute operations that result in memory accesses. The processor 1110 can include one or more separate processors. Each separate processor can include a single processing unit, a multicore processing unit, or a combination. The processing unit can be a primary processor such as a CPU (central processing unit), a peripheral processor such as a GPU (graphics processing unit), or a combination. Memory accesses may also be initiated by devices such as a network controller or hard disk controller. Such devices can be integrated with the processor in some systems or attached to the processer via a bus (e.g., PCI express), or a combination. The system 1100 can be implemented as an SOC (system on a chip), or be implemented with standalone components. The system of
Reference to memory devices can apply to different memory types. Memory devices often refers to volatile memory technologies. Volatile memory is memory whose state (and therefore the data stored on it) is indeterminate if power is interrupted to the device. Nonvolatile memory refers to memory whose state is determinate even if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state. One example of dynamic volatile memory includes DRAM (dynamic random access memory), or some variant such as synchronous DRAM (SDRAM). A memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR4 (Double Data Rate version 4, initial specification published in September 2012 by JEDEC (Joint Electronic Device Engineering Council)), DDR4E (DDR version 4), LPDDR3 (Low Power DDR version 3, JESD209-3B, August 2013 by JEDEC), LPDDR4 (LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (Wide Input/Output version 2, JESD229-2 originally published by JEDEC in August 2014), HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013), DDR5 (DDR version 5, JESD79-5A, published October, 2021), DDR version 6 (DDR6) (currently under draft development), LPDDR5, HBM2E, HBM3, and HBM-PIM, or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications. The specification for LPDDR6 is currently under development. The JEDEC standards are available at www.jedec.org.
In addition to, or alternatively to, volatile memory, in one example, reference to memory devices can refer to a nonvolatile memory device whose state is determinate even if power is interrupted to the device. A memory device can include a three dimensional crosspoint memory device, or other byte addressable nonvolatile memory devices. A memory device can include a nonvolatile, byte addressable media that stores data based on a resistive state of the memory cell, or a phase of the memory cell. In one example, the memory device can use chalcogenide phase change material. In one example, the memory device can be or include single or multi-level phase change memory (PCM) or phase change memory with a switch (PCMS), a resistive memory, nanowire memory, ferroelectric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM) memory that incorporates memristor technology, or spin transfer torque (STT)-MRAM, or a combination of any of the above, or other memory.
The memory controller 1120 represents one or more memory controller circuits or devices for the system 1100. In one example, the memory controller 1120 is part of host processor 1110, such as logic implemented on the same die or implemented in the same package space as the processor. The memory controller 1120 represents control logic that generates memory access commands in response to the execution of operations by the processor 1110. The memory controller 1120 accesses one or more memory devices 1140. The memory devices 1140 can be DRAM devices in accordance with any referred to above. In one example, the memory devices 1140 are organized and managed as different channels, where each channel couples to buses and signal lines that couple to multiple memory devices in parallel. Each channel is independently operable. Thus, each channel is independently accessed and controlled, and the timing, data transfer, command and address exchanges, and other operations are separate for each channel. Coupling can refer to an electrical coupling, communicative coupling, physical coupling, or a combination of these. Physical coupling can include direct contact. Electrical coupling includes an interface or interconnection that allows electrical flow between components, or allows signaling between components, or both. Communicative coupling includes connections, including wired or wireless, that enable components to exchange data.
The memory controller 1120 includes registers 1131. The registers 1131 represent one or more storage devices or storage locations that provide configuration or settings for the operation of the memory device. In one example, the registers 1131 include one or more registers that can be initialized or otherwise programmed to store data related to RAS-based memory domains as described herein. In one example, settings for each channel are controlled by separate mode registers or other register settings. In one example, each memory controller 1120 manages a separate memory channel, although system 1100 can be configured to have multiple channels managed by a single controller, or to have multiple controllers on a single channel.
The memory controller 1120 includes I/O interface logic 1122 to couple to a memory bus, such as a memory channel as referred to above. The I/O interface logic 1122 (as well as I/O interface logic 1142 of memory device 1140) can include pins, pads, connectors, signal lines, traces, or wires, or other hardware to connect the devices, or a combination of these. The I/O interface logic 1122 can include a hardware interface. As illustrated, the I/O interface logic 1122 includes at least drivers/transceivers for signal lines. Commonly, wires within an integrated circuit interface couple with a pad, pin, or connector to interface signal lines or traces or other wires between devices. The I/O interface logic 1122 can include drivers, receivers, transceivers, or termination, or other circuitry or combinations of circuitry to exchange signals on the signal lines between the devices. The exchange of signals includes at least one of transmit or receive. While shown as coupling the I/O 1122 from memory controller 1120 to the I/O 1142 of the memory device 1140, it will be understood that in an implementation of the system 1100 where groups of memory devices 1140 are accessed in parallel, multiple memory devices can include I/O interfaces to the same interface of the memory controller 1120. In an implementation of the system 1100 including one or more memory modules 1170, the I/O 1142 can include interface hardware of the memory module in addition to interface hardware on the memory device itself. Other memory controllers 1120 will include separate interfaces to other memory devices 1140.
The bus between memory controller 1120 and memory devices 1140 can be implemented as multiple signal lines coupling memory controller 1120 to memory devices 1140. The bus may typically include at least clock (CLK) 1132, command/address (CMD) 1134, and write data (DQ) and read data (DQ) 1136, and zero or more other signal lines 1138. In one example, a bus or connection between memory controller 1120 and memory can be referred to as a memory bus. In one example, the memory bus is a multi-drop bus. The signal lines for CMD can be referred to as a “C/A bus” (or ADD/CMD bus, or some other designation indicating the transfer of commands (C or CMD) and address (A or ADD) information) and the signal lines for write and read DQ can be referred to as a “data bus.” In one example, independent channels have different clock signals, C/A buses, data buses, and other signal lines. It will be understood that in addition to the lines explicitly shown, a bus can include at least one of strobe signaling lines, alert lines, auxiliary lines, or other signal lines, or a combination.
The memory devices 1140 represent memory resources for system 1100. In one example, each memory device 1140 is a separate memory die. In one example, each memory device 1140 can interface with multiple (e.g., 2) channels per device or die. Each memory device 1140 includes I/O interface logic 1142, which has a bandwidth determined by the implementation of the device (e.g., x16 or x8 or some other interface bandwidth). The I/O interface logic 1142 enables the memory devices to interface with the memory controller 1120. I/O interface logic 1142 can include a hardware interface, and can be in accordance with the I/O 1122 of the memory controller, but at the memory device end.
In one example, memory devices 1140 are disposed directly on a motherboard or host system platform (e.g., a PCB (printed circuit board) on which processor 1110 is disposed) of a computing device. In one example, memory devices 1140 can be organized into memory modules 1170. In one example, memory modules 1170 represent dual inline memory modules (DIMMs). In one example, memory modules 1170 represent other organization of multiple memory devices to share at least a portion of access or control circuitry, which can be a separate circuit, a separate device, or a separate board from the host system platform. Memory modules 1170 can include multiple memory devices 1140, and the memory modules can include support for multiple separate channels to the included memory devices disposed on them. In another example, memory devices 1140 may be incorporated into the same package as memory controller 1120, such as by techniques such as multi-chip-module (MCM), package-on-package, through-silicon via (TSV), or other techniques or combinations. Similarly, in one example, multiple memory devices 1140 may be incorporated into memory modules 1170, which themselves may be incorporated into the same package as memory controller 1120. It will be appreciated that for these and other implementations, the memory controller 1120 may be part of the host processor 1110.
The memory devices 1140 each include one or more memory arrays 1160. The memory array 1160 represents addressable memory locations or storage locations for data. Typically, the memory array 1160 is managed as rows of data, accessed via wordline (rows) and bitline (individual bits within a row) control. The memory array 1160 can be organized as separate channels, ranks, and banks of memory. Channels may refer to independent control paths to storage locations within memory devices 1140. Ranks may refer to common locations across multiple memory devices (e.g., same row addresses within different devices) in parallel. Banks may refer to sub-arrays of memory locations within a memory device 1140. In one example, banks of memory are divided into sub-banks with at least a portion of shared circuitry (e.g., drivers, signal lines, control logic) for the sub-banks, allowing separate addressing and access. It will be understood that channels, ranks, banks, sub-banks, bank groups, or other organizations of the memory locations, and combinations of the organizations, can overlap in their application to physical resources. For example, the same physical memory locations can be accessed over a specific channel as a specific bank, which can also belong to a rank. Thus, the organization of memory resources will be understood in an inclusive, rather than exclusive, manner.
In one example, the memory devices 1140 include one or more registers 1144. The register 1144 represents one or more storage devices or storage locations that provide configuration or settings for the operation of the memory device. In one example, the register 1144 can provide a storage location for memory device 1140 to store data for access by memory controller 1120 as part of a control or management operation. In one example, the registers 1144 include one or more Mode Registers. In one example, the registers 1144 include one or more multipurpose registers. The configuration of locations within the registers 1144 can configure the memory device 1140 to operate in different “modes,” where command information can trigger different operations within memory device 1140 based on the mode. Additionally or in the alternative, different modes can also trigger different operation from address information or other signal lines depending on the mode. Settings of register 1144 can indicate configuration for I/O settings (e.g., timing, termination or ODT (on-die termination), driver configuration, or other I/O settings).
In one example, the registers 1144 include one or more registers that indicate a temperature of the memory device 1140, the memory module 1170, or both. For example, the register value can be indicative of a temperature of the memory device 1140 or memory module 1170 based on one or more thermal sensors on the memory device 1140 or memory module 1170 (e.g., the thermal sensor 1135). It can also indicate the temperature of thermal sensor 1133 on the processor or memory controller, temperature of one or more dies for stacked memory dies, a case temperature, or any other memory subsystem or system temperature. The controller 1150 of the memory device 1140 can sample the temperature from the thermal sensor and store a value representing the temperature, a range of temperatures, a temperature gradient, a change in temperature, or some other temperature information based on the reading of the thermal sensor. In one example, the thermal sensor(s) are sampled at regular intervals and the register storing temperature information can be updated at regular intervals. In another example, a thermal event (such as a temperature reaching or exceeding a threshold temperature) may trigger the register to be updated. Temperature data from the thermal sensors can be used in determining which RAS-based memory domain a memory resource is assigned to.
The memory device 1140 includes the controller 1150, which represents control logic within the memory device to control internal operations within the memory device. For example, the controller 1150 decodes commands sent by memory controller 1120 and generates internal operations to execute or satisfy the commands. The controller 1150 can be referred to as an internal controller, and is separate from memory controller 1120 of the host. The controller 1150 can determine what mode is selected based on the registers 1144, and configure the internal execution of operations for access to the memory resources 1160 or other operations based on the selected mode. The controller 1150 generates control signals to control the routing of bits within the memory device 1140 to provide a proper interface for the selected mode and direct a command to the proper memory locations or addresses. The controller 1150 includes command logic 1152, which can decode command encoding received on command and address signal lines. The command logic 1152 can be or include a command decoder. With the command logic 1152, memory device can identify commands and generate internal operations to execute requested commands.
Referring again to the host memory controller 1120, the memory controller 1120 includes address decoding logic 1123 to decode physical address information received from the processor 1110 into device addresses for memory devices 1140. The memory controller 1120 includes command (CMD) logic 1124, which represents logic or circuitry to generate commands to send to the memory devices 1140. The generation of the commands can refer to the command prior to scheduling, or the preparation of queued commands ready to be sent. Generally, the signaling in memory subsystems includes address information within or accompanying the command to indicate or select one or more memory locations where the memory devices should execute the command. In response to scheduling of transactions for the memory device 1140, the memory controller 1120 can issue commands via the I/O 1122 to cause the memory device 1140 to execute the commands. In one example, the controller 1150 of memory device 1140 receives and decodes command and address information received via I/O 1142 from the memory controller 1120. Based on the received command and address information, the controller 1150 can control the timing of operations of the logic and circuitry within the memory device 1140 to execute the commands. The controller 1150 is responsible for compliance with standards or specifications within the memory device 1140, such as timing and signaling requirements. The memory controller 1120 can implement compliance with standards or specifications by access scheduling and control.
The memory controller 1120 includes scheduler 1130, which represents logic or circuitry to generate and order transactions to send to memory device 1140. From one perspective, the primary function of the memory controller 1120 could be said to schedule memory access and other transactions to the memory device 1140. Such scheduling can include generating the transactions themselves to implement the requests for data by the processor 1110 and to maintain integrity of the data (e.g., such as with commands related to refresh). The transactions can include one or more commands, and result in the transfer of commands or data or both over one or multiple timing cycles such as clock cycles or unit intervals. Transactions can be for access such as read or write or related commands or a combination, and other transactions can include memory management commands for configuration, settings, data integrity, or other commands or a combination.
In the illustrated example, the memory controller 1120 includes RAS-based memory domain logic 1137. The RAS-based memory domain logic 1137 includes hardware logic for implementing one or more aspects of a RAS-based memory domain infrastructure, such as one or more of the interfaces 409, prediction logic 410, error monitoring logic 412, data migration logic 414, RAS features management logic 416, and NUMA RAS logic 418 of
Processors 1270 and 1280 are shown including integrated memory controller (IMC) circuitry 1272 and 1282, respectively. Processor 1270 also includes interface circuits 1276 and 1278; similarly, second processor 1280 includes interface circuits 1286 and 1288. Processors 1270, 1280 may exchange information via the interface 1250 using interface circuits 1278, 1288. IMCs 1272 and 1282 couple the processors 1270, 1280 to respective memories, namely a memory 1232 and a memory 1234, which may be portions of main memory locally attached to the respective processors.
Processors 1270, 1280 may each exchange information with a network interface (NW I/F) 1290 via individual interfaces 1252, 1254 using interface circuits 1276, 1294, 1286, 1298. The network interface 1290 (e.g., one or more of an interconnect, bus, and/or fabric, and in some examples is a chipset) may optionally exchange information with a coprocessor 1238 via an interface circuit 1292. In some examples, the coprocessor 1238 is a special-purpose processor, such as, for example, a high-throughput processor, a network or communication processor, compression engine, graphics processor, general purpose graphics processing unit (GPGPU), neural-network processing unit (NPU), embedded processor, or the like.
A shared cache (not shown) may be included in either processor 1270, 1280 or outside of both processors, yet connected with the processors via an interface such as P-P interconnect, such that either or both processors' local cache information may be stored in the shared cache if a processor is placed into a low power mode.
Network interface 1290 may be coupled to a first interface 1216 via interface circuit 1296. In some examples, first interface 1216 may be an interface such as a Peripheral Component Interconnect (PCI) interconnect, a PCI Express interconnect or another I/O interconnect. In some examples, first interface 1216 is coupled to a power control unit (PCU) 1217, which may include circuitry, software, and/or firmware to perform power management operations with regard to the processors 1270, 1280 and/or coprocessor 1238. PCU 1217 provides control information to a voltage regulator (not shown) to cause the voltage regulator to generate the appropriate regulated voltage. PCU 1217 also provides control information to control the operating voltage generated. In various examples, PCU 1217 may include a variety of power management logic units (circuitry) to perform hardware-based power management. Such power management may be wholly processor controlled (e.g., by various processor hardware, and which may be triggered by workload and/or power, thermal or other processor constraints) and/or the power management may be performed responsive to external sources (such as a platform or power management source or system software).
PCU 1217 is illustrated as being present as logic separate from the processor 1270 and/or processor 1280. In other cases, PCU 1217 may execute on a given one or more of cores (not shown) of processor 1270 or 1280. In some cases, PCU 1217 may be implemented as a microcontroller (dedicated or general-purpose) or other control logic configured to execute its own dedicated power management code, sometimes referred to as P-code. In yet other examples, power management operations to be performed by PCU 1217 may be implemented externally to a processor, such as by way of a separate power management integrated circuit (PMIC) or another component external to the processor. In yet other examples, power management operations to be performed by PCU 1217 may be implemented within BIOS or other system software.
Various I/O devices 1214 may be coupled to first interface 1216, along with a bus bridge 1218 which couples first interface 1216 to a second interface 1220. In some examples, one or more additional processor(s) 1215, such as coprocessors, high throughput many integrated core (MIC) processors, GPGPUs, accelerators (such as graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays (FPGAs), or any other processor, are coupled to first interface 1216. In some examples, second interface 1220 may be a low pin count (LPC) interface. Various devices may be coupled to second interface 1220 including, for example, a keyboard and/or mouse 1222, communication devices 1227 and storage circuitry 1228. Storage circuitry 1228 may be one or more non-transitory machine-readable storage media as described below, such as a disk drive or other mass storage device which may include instructions/code and data 1230. Further, an audio I/O 1224 may be coupled to second interface 1220. Note that other architectures than the point-to-point architecture described above are possible. For example, instead of the point-to-point architecture, a system such as multiprocessor system 1200 may implement a multi-drop interface or other such architecture.
As discussed above, in some embodiment the processors illustrated herein may comprise Other Processing Units (collectively termed XPUs). Examples of XPUs include one or more of Graphic Processor Units (GPUs) or General Purpose GPUs (GP-GPUs), Tensor Processing Units (TPUs), Data Processing Units (DPUs), Infrastructure Processing Units (IPUs), Artificial Intelligence (AI) processors or AI inference units and/or other accelerators, FPGAs and/or other programmable logic (used for compute purposes), etc. While some of the diagrams herein show the use of CPUs, this is merely exemplary and non-limiting. Generally, any type of XPU may be used in place of a CPU in the illustrated embodiments. Moreover, as used in the following claims, the term “processor” is used to generically cover CPUs and various forms of XPUs.
While various embodiments described herein use the term System-on-a-Chip or System-on-Chip (“SoC”) to describe a device or system having a processor and associated circuitry (e.g., Input/Output (“I/O”) circuitry, power delivery circuitry, memory circuitry, etc.) integrated monolithically into a single Integrated Circuit (“IC”) die, or chip, the present disclosure is not limited in that respect. For example, in various embodiments of the present disclosure, a device or system can have one or more processors (e.g., one or more processor cores) and associated circuitry (e.g., Input/Output (“I/O”) circuitry, power delivery circuitry, etc.) arranged in a disaggregated collection of discrete dies, tiles and/or chiplets (e.g., one or more discrete processor core die arranged adjacent to one or more other die such as memory die, I/O die, etc.). In such disaggregated devices and systems the various dies, tiles and/or chiplets can be physically and electrically coupled together by a package structure including, for example, various packaging substrates, interposers, active interposers, photonic interposers, interconnect bridges and the like. The disaggregated collection of discrete dies, tiles, and/or chiplets can also be part of a System-on-Package (“SoP”).
Examples of RAS-based memory domains follow.
Example 1: a method including: receiving information to indicate reliability of a memory resource, determining, based on the information to indicate the reliability of the memory resource, a likelihood of errors in the memory resource, and classifying the memory resource into one of multiple reliability, availability, and serviceability (RAS)-based memory domains based on the likelihood of errors in the memory resource.
Example 2: The method of example 1, wherein: the memory resource includes one or more of: a memory pool, a memory module, device-attached memory, and a dual inline memory module (DIMM).
Example 3: The method of examples 1 or 2, wherein: the multiple RAS-based memory domains include at least two domains of memory resources, including a lower RAS memory domain and a higher RAS memory domain.
Example 4: The method of any of examples 1-3, wherein: the information to indicate the reliability of the memory resource includes one or more of: information related to errors encountered in the memory resource, RAS capabilities for the memory resource, and temperature data.
Example 5: The method of any of examples 1-4, wherein: the information to indicate the reliability of the memory resource is received in response to the memory resource being added as an available memory resource.
Example 6: The method of any of examples 1-5, wherein: the information to indicate the reliability of the memory resource is received in response to an error encountered in the memory resource or in response to an error threshold being exceeded.
Example 7: The method of any of examples 1-6, further including: receiving, from an application, a request to allocate memory, the request including a value to indicate a requested level of memory reliability, and in response to the request, allocating memory in a RAS-based memory domain based on the requested level of memory reliability.
Example 8: The method in example 7, wherein: allocating the memory in the RAS-based memory domain includes: allocating memory mapped to one or more physical addresses assigned to the RAS-based memory domain.
Example 9: The method of any of examples 1-8, further including: reclassifying at least one page in the memory resource into a second RAS-based memory domain based on a change in the likelihood of errors in the page and remapping a physical address range based on the reclassification.
Example 10: The method of example 9, further including: copying data from the reclassified page in the memory resource to a different page in a desired RAS-based memory domain in response to the reclassification.
Example 11: The method of any of examples 1-10, further including: updating RAS-based memory domain information in a page table for physical addresses moved to a different RAS-based memory domain.
Example 12: The method of any of examples 1-11, wherein: a RAS-based memory domain includes at least a portion of multiple memory resources.
Example 13: A method including: classifying memory resources into RAS-based memory domains based on an expected likelihood of errors in the memory resources, receiving, from an application, a request to allocate memory, the request including a value to indicate a requested level of memory reliability, and in response to the request, allocating memory in one of multiple RAS-based memory domains based on the requested level of memory reliability.
Example 14: The method of example 13, further including: receiving information to indicate reliability of a memory resource, and determining, based on the information to indicate the reliability of the memory resource, the likelihood of errors in the memory resource.
Example 15: A non-transitory machine-readable medium having instructions stored thereon configured to be executed on one or more processors to perform a method in accordance with any of examples 1-14.
Example 16: A controller including: input/output (I/O) interface circuitry to couple with one or more memory resources, and logic to: reconfigure a mapping of physical addresses to locations in the one or more memory resources in response to reclassification of at least one location in the one or more memory resources from a first RAS-based memory domain to a second RAS-based memory domain.
Example 17: The controller of example 16, wherein: the logic is to reconfigure the mapping in response to a request from an operating system.
Example 18: The controller of any of examples 16-17, wherein: the logic is to: monitor errors in the one or more memory resources, and reconfigure the mapping in response to a number, percentage, or rate of errors exceeding a threshold.
Example 19: The controller of any of examples 16-18, wherein: the logic is to: copy data from the at least one reclassified location to a different location in the first RAS-based memory domain.
Example 20: The controller of example 19, wherein: the logic to reconfigure the mapping is to: remap a physical address or address range previously mapped to the at least one reclassified location to the different location.
Example 21: The controller of example 19, wherein: remapping a second physical address or address range to the at least one reclassified location.
Flow diagrams as illustrated herein provide examples of sequences of various process actions. The flow diagrams can indicate operations to be executed by a software or firmware routine, as well as physical operations. In one embodiment, a flow diagram can illustrate the state of a finite state machine (FSM), which can be implemented in hardware and/or software. Although shown in a particular sequence or order, unless otherwise specified, the order of the actions can be modified. Thus, the illustrated embodiments should be understood only as an example, and the process can be performed in a different order, and some actions can be performed in parallel. Additionally, one or more actions can be omitted in various embodiments; thus, not all actions are required in every embodiment. Other process flows are possible.
Note that actions triggered in response to a value being greater than or lower than a threshold can mean greater than or equal to, or lower than or equal to, and are design choices. Thus, it is understood that the terms “greater than” or “lower than” a threshold are intended to encompass embodiments in which a trigger occurs in response to the value being “greater than or equal to” or “lower than or equal to.”
To the extent various operations or functions are described herein, they can be described or defined as software code, instructions, configuration, and/or data. The content can be directly executable (“object” or “executable” form), source code, or difference code (“delta” or “patch” code). The software content of the embodiments described herein can be provided via an article of manufacture with the content stored thereon, or via a method of operating a communication interface to send data via the communication interface. A machine readable storage medium can cause a machine to perform the functions or operations described and includes any mechanism that stores information in a form accessible by a machine (e.g., computing device, electronic system, etc.), such as recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.). A communication interface includes any mechanism that interfaces to any of a hardwired, wireless, optical, etc., medium to communicate to another device, such as a memory bus interface, a processor bus interface, an Internet connection, a disk controller, etc. The communication interface can be configured by providing configuration parameters and/or sending signals to prepare the communication interface to provide a data signal describing the software content. The communication interface can be accessed via one or more commands or signals sent to the communication interface.
Various components described herein can be a means for performing the operations or functions described. Each component described herein includes software, hardware, or a combination of these. The components can be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, application specific integrated circuits (ASICs), digital signal processors (DSPs), etc.), embedded controllers, hardwired circuitry, etc.
The hardware design embodiments discussed above may be embodied within a semiconductor chip and/or as a description of a circuit design for eventual targeting toward a semiconductor manufacturing process. In the case of the later, such circuit descriptions may take of the form of a (e.g., VHDL or Verilog) register transfer level (RTL) circuit description, a gate level circuit description, a transistor level circuit description or mask description or various combinations thereof. Circuit descriptions are typically embodied on a computer readable storage medium (such as a CD-ROM or other type of storage technology).
Besides what is described herein, various modifications can be made to the disclosed embodiments and implementations of the invention without departing from their scope. Therefore, the illustrations and examples herein should be construed in an illustrative, and not a restrictive sense. The scope of the invention should be measured solely by reference to the claims that follow.