The LPDDR4 (Low-Power Double Data Rate 4th Generation) and LPDDR5 standards support a mode called BYTE mode where the device width is cut in half from an x16 device and the number of rows in a bank are doubled. A goal of BYTE mode is to increase the DRAM capacity by doubling the number of DRAMs (e.g., LPDDR4 or LPDDR5 DRAM chips) on a rank. However, the number of bank resources remain the same.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified:
Embodiments of methods and apparatus implementing half width modes in DRAM and doubling of bank resources and associated are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
For clarity, individual components in the Figures herein may also be referred to by their labels in the Figures, rather than by a particular reference number. Additionally, reference numbers referring to a particular type of component (as opposed to a particular component) may be shown with a reference number followed by “(typ)” meaning “typical.” It will be understood that the configuration of these components will be typical of similar components that may exist but are not shown in the drawing Figures for simplicity and clarity or otherwise similar components that are not labeled with separate reference numbers. Conversely, “(typ)” is not to be construed as meaning the component, element, etc. is typically used for its disclosed function, implement, purpose, etc.
To better understand aspects of the teachings and principles of the embodiments disclosed herein, a brief primer on the operation of DRAM is provided with reference an exemplary memory subsystem illustrated in
As further shown in
As described herein, reference to memory devices (e.g., DRAM devices) can apply to different volatile memory types. Volatile memory is memory whose state (and therefore the data stored on it) is indeterminate if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state. One example of dynamic volatile memory includes DRAM, or some variant such as synchronous DRAM (SDRAM). A memory subsystem as described herein may be compatible with a number of memory technologies or standards, such as DDR3 (double data rate version 3, JESD79-3, originally published by JEDEC (Joint Electronic Device Engineering Council) on Jun. 27, 2007), DDR4 (DDR version 4, JESD79-4, originally published in September 2012 by JEDEC), LPDDR3 (low power DDR version 3, JESD209-3B, originally published in August 2013 by JEDEC), LPDDR4 (low power DDR version 4, JESD209-4, originally published by JEDEC in August 2014), WI02 (Wide IO 2 (WideIO2), JESD229-2, originally published by JEDEC in August 2014), HBM (high bandwidth memory DRAM, JESD235, originally published by JEDEC in October 2013), LPDDR5 (originally published by JEDEC in February 2019, current version published in June 2021), HBM2 ((HBM version 2), originally published by JEDEC in December 2018), DDR5 (DDR version 5, originally published by JEDEC in July 2020), or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications. In addition to the foregoing, the specification for LPDDR6 is currently being developed.
Under conventional (S)DRAM memory, data are generally accessed (Read and Written) using cachelines (also called cache lines) comprising a sequence of memory cells (bits) in a wordline. The cachelines for a given memory architecture generally have a predetermined width or size, such as 64 Bytes, noting other widths/sizes maybe used.
Reference to memory devices may apply to different memory types. Memory devices often refers to volatile memory technologies such as DRAM. In addition to, or alternatively to, volatile memory, in some examples, reference to memory devices can refer to a nonvolatile memory device whose state is determinate even if power is interrupted to the device. In one example, the nonvolatile memory device is a block addressable memory device, such as NAND or NOR technologies. A memory device may also include byte or block addressable types of non-volatile memory having a 3-dimensional (3-D) cross-point memory structure that includes, but is not limited to, chalcogenide phase change material (e.g., chalcogenide glass) hereinafter referred to as “3-D cross-point memory”. Non-volatile types of memory may also include other types of byte or block addressable non-volatile memory such as, but not limited to, multi-threshold level NAND flash memory, NOR flash memory, single or multi-level phase change memory (PCM), resistive memory, nanowire memory, ferroelectric transistor random access memory (FeTRAM), anti-ferroelectric memory, resistive memory including a metal oxide base, an oxygen vacancy base and a conductive bridge random access memory (CB-RAM), a spintronic magnetic junction memory, a magnetic tunneling junction (MTJ) memory, a domain wall (DW) and spin orbit transfer (SOT) memory, a thyristor based memory, a magnetoresistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque MRAM (STT-MRAM), or a combination of any of the above.
Descriptions herein referring to a “RAM” or “RAM device” can apply to any memory device that allows random access, whether volatile or nonvolatile. Descriptions referring to a “DRAM”, “SDRAM, “DRAM device” or “SDRAM device” may refer to a volatile random access memory device. The memory device, SDRAM or DRAM may refer to the die itself, to a packaged memory product that includes one or more dies, or both. In some examples, a system with volatile memory that needs to be refreshed may also include at least some nonvolatile memory.
Memory controller 220, as shown in
According to some examples, settings for each channel are controlled by separate mode registers or other register settings. For these examples, memory controller 220 may manage a separate memory channel, although system 200 may be configured to have multiple channels managed by a single memory controller, or to have multiple memory controllers on a single channel. In one example, memory controller 220 is part of processor 210, such as logic and/or features of memory controller 220 are implemented on the same die or implemented in the same package space as processor 210, sometimes referred to as an integrated memory controller.
Memory controller 220 includes Input/Output (I/O) interface circuitry 222 to couple to a memory bus, which is replicated for two memory channels 0 and 1. I/O interface circuitry 222 (as well as I/O interface circuitry 242 of memory device(s) 240) may include pins, pads, connectors, signal lines, traces, or wires, or other hardware to connect the devices, or a combination of these. I/O interface circuitry 222 may include a hardware interface. As shown in
In some examples, memory controller 220 may be coupled with memory device(s) 240 via multiple signal lines. The multiple signal lines may include at least a clock (CLK) 232, command/address (C/A) 234, and write data (DQ) and read data (DQ) 236, and zero or more other signal lines 238. According to some examples, a composition of signal lines coupling memory controller 220 to memory device(s) 240 may be referred to collectively as a memory bus. The signal lines for C/A 234 may be referred to as a “command bus”, a “C/A bus” or a CMD/ADD bus, or some other designation indicating the transfer of commands and/or address data. The signal lines for DQ 236 may be referred to as a “data bus”.
According to some examples, independent channels may have different clock signals, command buses, data buses, and other signal lines. For these examples, system 200 may be considered to have multiple “buses,” in the sense that an independent interface path may be considered a separate bus. It will be understood that in addition to the signal lines shown in
In some examples, the bus between memory controller 220 and memory device(s) 240 includes a subsidiary command bus routed via signal lines included in C/A 234 and a subsidiary data bus to carry the write and read data routed via signal lines included in DQ 236. In some examples, C/A 234 and DQ 236 may separately include bidirectional lines. In other examples, DQ 236 may include unidirectional write signal lines to write data from the host to memory and unidirectional lines to read data from the memory to the host.
According to some examples, in accordance with a chosen memory technology and system design, signals lines included in other 238 may augment a memory bus or subsidiary bus. For example, strobe line signal lines for a DQS. Based on a design of system 200, or memory technology implementation, a memory bus may have more or less bandwidth per memory device included in memory device(s) 240. The memory bus may support memory devices included in memory device(s) 240 that have either a x32 interface, a x16 interface, a x8 interface, or other interface. The convention “xW,” where W is an integer that refers to an interface size or width of the interface of memory device(s) 240, which represents a number of signal lines to exchange data with memory controller 220. The interface size of these memory devices may be a controlling factor on how many memory devices may be used concurrently per channel in system 200 or coupled in parallel to the same signal lines. In some examples, high bandwidth memory devices, wide interface memory devices, or stacked memory devices, or combinations, may enable wider interfaces, such as a x128 interface, a x256 interface, a x512 interface, a x1024 interface, or other data bus interface width.
According to some examples, memory device(s) 240 represent memory resources for system 200. For these examples, each memory device included in memory device(s) 240 is a separate memory die. Separate memory devices may interface with multiple (e.g., 2) channels per device or die. A given memory device of memory device(s) 240 may include I/O interface circuitry 242 and may have a bandwidth determined by an interface width associated with an implementation or configuration of the given memory device (e.g., x16 or x8 or some other interface bandwidth). I/O interface circuitry 242 may enable the memory devices to interface with memory controller 220. I/O interface circuitry 242 may include a hardware interface and operate in coordination with I/O interface circuitry 222 of memory controller 220.
In some examples, multiple memory device(s) 240 may be connected in parallel to the same command and data buses (e.g., via C/A 234 and DQ 236). In other examples, multiple memory device(s) 240 may be connected in parallel to the same command bus but connected to different data buses. For example, system 200 may be configured with multiple memory device(s) 240 coupled in parallel, with each memory device responding to a command, and accessing memory resources 260 internal to each memory device. For a write operation, an individual memory device of memory device(s) 240 may write a portion of the overall data word, and for a read operation, the individual memory device may fetch a portion of the overall data word. As non-limiting examples, a specific memory device may provide or receive, respectively, 8 bits of a 128-bit data word for a read or write operation, or 8 bits or 16 bits (depending for a x8 or a x16 device) of a 256-bit data word. The remaining bits of the word may be provided or received by other memory devices in parallel.
According to some examples, memory device(s) 240 may be disposed directly on a motherboard or host system platform (e.g., a PCB (printed circuit board) on which processor 210 is disposed) of a computing device. Memory device(s) 240 may be organized into memory module(s) 270. In some examples, memory module(s) 270 may represent dual inline memory modules (DIMMs). In some examples, memory module(s) 270 may represent other organizations or configurations of multiple memory devices that share at least a portion of access or control circuitry, which can be a separate circuit, a separate device, or a separate board from the host system platform. In some examples, memory module(s) 270 may include multiple memory device(s) 240, and memory module(s) 270 may include support for multiple separate channels to the included memory device(s) 240 disposed on them.
In some examples, memory device(s) 240 may be incorporated into a same package as memory controller 220. For example, incorporated in a multi-chip-module (MCM), a package-on-package with through-silicon via (TSV), or other techniques or combinations. Similarly, in some examples, memory device(s) 240 may be incorporated into memory module(s) 270, which themselves may be incorporated into the same package as memory controller 220. It will be appreciated that for these and other examples, memory controller 220 may be part of or integrated with processor 210.
As shown in
According to some examples, as shown in
In some examples, writing to or programming one or more registers of register(s) 244 may configure memory device(s) 240 to operate in different “modes”. For these examples, command information written to or programmed to the one or more register may trigger different modes within memory device(s) 240. Additionally, or in the alternative, different modes can also trigger different operations from address information or other signal lines depending on the triggered mode. Programmed settings of register(s) 244 may indicate or trigger configuration of I/O settings. For example, configuration of timing, termination, on-die termination (ODT), driver configuration, or other I/O settings.
According to some examples, memory device(s) 240 includes ODT 246 as part of the interface hardware associated with I/O interface circuitry 242. ODT 246 may provide settings for impedance to be applied to the interface to specified signal lines. For example, ODT 246 may be configured to apply impedance to signal lines include in DQ 236 or C/A 234. The ODT settings for ODT 246 may be changed based on whether a memory device of memory device(s) 240 is a selected target of an access operation or a non-target memory device. ODT settings for ODT 246 may affect timing and reflections of signaling on terminated signal lines included in, for example, C/A 234 or DQ 236. Control over ODT setting for ODT 246 can enable higher-speed operation with improved matching of applied impedance and loading. Impedance and loading may be applied to specific signal lines of I/O interface circuitry 242, 222 (e.g., C/A 234 and DQ 236) and is not necessarily applied to all signal lines.
In some examples, as shown in
Referring again to memory controller 220, memory controller 220 includes CMD logic 224, which represents logic and/or features to generate commands to send to memory device(s) 240. The generation of the commands can refer to the command prior to scheduling, or the preparation of queued commands ready to be sent. Generally, the signaling in memory subsystems includes address information within or accompanying the command to indicate or select one or more memory locations where memory device(s) 240 should execute the command. In response to scheduling of transactions for memory device(s) 240, memory controller 220 can issue commands via I/O interface circuitry 222 to cause memory device(s) 240 to execute the commands. In some examples, controller 250 of memory device(s) 240 receives and decodes command and address information received via I/O interface circuitry 242 from memory controller 220. Based on the received command and address information, controller 250 may control the timing of operations of the logic, features and/or circuitry within memory device(s) 240 to execute the commands. Controller 250 may be arranged to operate in compliance with standards or specifications such as timing and signaling requirements for memory device(s) 240. Memory controller 220 may implement compliance with standards or specifications by access scheduling and control.
In some examples, memory controller 220 includes refresh (REF) logic 226. REF logic 226 may be used for memory resources that are volatile and need to be refreshed to retain a deterministic state. REF logic 226, for example, may indicate a location for refresh, and a type of refresh to perform. REF logic 226 may trigger self-refresh within memory device(s) 240 or execute external refreshes which can be referred to as auto refresh commands by sending refresh commands, or a combination. According to some examples, system 200 supports all bank refreshes as well as per bank refreshes. All bank refreshes cause the refreshing of banks within all memory device(s) 240 coupled in parallel. Per bank refreshes cause the refreshing of a specified bank within a specified memory device of memory device(s) 240. In some examples, controller 250 within memory device(s) 240 includes a REF logic 254 to apply refresh within memory device(s) 240. REF logic 254, for example, may generate internal operations to perform refresh in accordance with an external refresh received from memory controller 220. REF logic 254 may determine if a refresh is directed to memory device(s) 240 and determine what memory resources 260 to refresh in response to the command.
In accordance with aspects of the embodiments describe and illustrated herein, a half-width mode (also referred to as a BYTE mode) is provided that doubles the bank resources for a DRAM device. Doubling the bank resources substantially increases channel efficiency for both random read and random write accesses. This also improves (reduces) average channel latency.
Under half-width mode configuration 300a, partial channels 0 and 1 operate independently and are enabled to concurrently access memory banks within respective memory bank groups 304-0 and 304-1 when coupled to separate channel I/O interfaces for a memory controller. However, partial channel 0 cannot be used to access any memory banks 266 in bank groups 304-1 and partial channel 1 cannot be used access any memory banks 266 in bank groups 304-1.
ACT-1 (ACTIVATE-1 command) must be followed by ACT-2 (ACTIVATE-2 command) for the same bank. Since the number of Bank Groups for half-width configuration 300b is 8, an extra BG address bit is used in truth table 402. For half-width configure 300a, an additional address bit R19 is added in truth table 400 (relative to the addressing available when using full-width (x24) channels. BL24 means Burst Length 24 bits.
Generally, the x24 LPDDR6 memory die described and illustrated herein may be implemented in a standalone package (e.g., an LPDDR6 integrated circuit package such as a chip, also referred to herein as an LPDDR6 memory device), in a LPDDR6 memory module including two or more LPDDR6 memory devices, or in a memory on package die layer. In some embodiments, an SoC with integrated memory controller and one or more LPDDR6 memory devices are coupled to a motherboard, system board, or the like. In some embodiments, an SoC die may be coupled to an LPDDR6 die via a die-to-die interconnect. In some embodiments a memory controller (or SoC with integrated memory controller) and an LPDDR6 die may be implemented in separate packages called chiplets that are interconnected with an Universal Chiplet Interconnect Express (UCIe) interconnect.
Memory controller 220A is generally configured similar to memory controller 220 in
In some embodiments, DRAM devices such as LPDDR6 chips may be coupled to a memory channel interface for a memory controller and/or SoC with integrated memory controller directly, rather than having the DRAM devices reside on a memory module. An example of a system 600 employing this approach is shown in
System 600 includes a system board 602 to which an SoC 604 is mounted. SoC 604 includes a processor 606 and an integrated memory controller 220B having a configuration similar to memory controllers 220 and 220A discussed above, including I/O interface circuitry 222-0 and 222-1 for memory channels 0 and 1.
System 600 also includes a plurality of LPDDR6 DRAM chips 608 that are mounted to system board 602. Each LPDDR6 DRAM chip 608 includes an integrated memory channel I/O interface circuitry that is configured to interconnect with memory channel I/O interface circuitry for one of the memory channels on memory controller 220B, wherein the interconnect comprises wiring in system board 602. Generally, the number of DQ lines 236, C/A lines 234 and CLK signal lines 232 for the memory channels on the memory controller will be greater than the number of DQ lines and C/A lines connected to an individual LPDDR6 DRAM chip. For example, in one embodiment each memory channel I/O interface on the memory controller side includes 96 DQ lines, while in another embodiment each memory controller memory channel I/O interface includes 192 DQ lines. Meanwhile, in one embodiment the LPDDR6 DRAM chips are x24 LPDDR6 devices having 24 DQ lines, as depicted by x24 memory channel I/O interface circuitry 242B. In yet other embodiments, an LPDDR6 DRAM chip may be configured from the manufacturer to only operate in a single half-width channel mode using an x12 DQ bus.
Generally, an LPDDR6 DRAM chips that is directly connection to a memory controller channel I/O interface will include applicable logic to facilitate operations in accordance with operating modes defined by a forthcoming LPDDR6 standard, including command logic, refresh logic, clock timing logic, etc. Accordingly, as further shown in
In addition to operating in a single channel half-width mode, such as depicted by partial channel 0 in
Generally, the principles and teachings disclosed herein may be applied to various packages and configurations, including stacked die structures and packages, such as processor-in-memory (PIM) modules. (PIM modules may also be called compute on memory modules or compute near memory modules.) PIMs may be used for various purposes but are particularly well-suited for memory-intensive workload such as but not limited to performing matrix mathematics and accumulation operations. In a PIM module (which are sometimes called PIM chips when the stacked die structures are integrated on the same chip), the processor or CPU and stacked memory structures are combined in the same chip or package.
An example of a PIM module 700 is shown in
An aspect of PIM modules is that the logic layer may perform compute operations that are separate from the compute operations performed by the CPU, hence comprise a compute die. In some instances, the logic layer comprises a processor die or the like. For example, a system may be implemented using a 3D stacked structure similar to that shown in
In addition to systems with CPUs, the teaching and principles disclosed herein may be applied to Other Processing Units (collectively termed XPUs) including one or more of Graphic Processor Units (GPUs) or General Purpose GPUs (GP-GPUs), Tensor Processing Units (TPUs), Data Processing Units (DPUs), Infrastructure Processing Units (IPUs), Artificial Intelligence (AI) processors or AI inference units and/or other accelerators, FPGAs and/or other programmable logic (used for compute purposes), etc. While some of the diagrams herein show the use of CPUs, this is merely exemplary and non-limiting. Generally, any type of XPU may be used in place of a CPU or processor in the illustrated embodiments. Additionally, the term processor in the claims may refer to a CPU or an XPU.
In addition to 3D stacked structures with TSVs, other types of packaging may be used, such as multichip modules and packages using die-to-die or chiplet-to-chiplet interconnect structures. For instance, in one embodiment memory channels 706 in
Performance Improvement Results
Memory efficiency estimates for embodiments described and illustrated above demonstrate significant performance improvement when compared to existing techniques. For example, doubling of bank resources from 16 to 32 improves channel efficiency from 63% to 95% for 100% read case using random accesses (1 CAS per ACT). Doubling of bank resources from 16 to 32 improves channel efficiency from 50% to 100% for 100% write case using random accesses (1 CAS per ACT). In addition, there is a 10-15 ns improvement in average latency as a result of improved channel efficiency.
Although some embodiments have been described in reference to particular implementations, other implementations are possible according to some embodiments. Additionally, the arrangement and/or order of elements or other features illustrated in the drawings and/or described herein need not be arranged in the particular way illustrated and described. Many other arrangements are possible according to some embodiments.
In each system shown in a figure, the elements in some cases may each have a same reference number or a different reference number to suggest that the elements represented could be different and/or similar. However, an element may be flexible enough to have different implementations and work with some or all of the systems shown or described herein. The various elements shown in the figures may be the same or different. Which one is referred to as a first element and which is called a second element is arbitrary.
In the description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Rather, in particular embodiments, “connected” may be used to indicate that two or more elements are in direct physical or electrical contact with each other. “Coupled” may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. Additionally, “communicatively coupled” means that two or more elements that may or may not be in direct contact with each other, are enabled to communicate with each other. For example, if component A is connected to component B, which in turn is connected to component C, component A may be communicatively coupled to component C using component B as an intermediary component.
An embodiment is an implementation or example of the inventions. Reference in the specification to “an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions. The various appearances “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments.
Not all components, features, structures, characteristics, etc. described and illustrated herein need be included in a particular embodiment or embodiments. If the specification states a component, feature, structure, or characteristic “may”, “might”, “can” or “could” be included, for example, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, that does not mean there is only one of the element. If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.
Italicized letters, such as ‘n’ in the foregoing detailed description are used to depict an integer number, and the use of a particular letter is not limited to particular embodiments. Moreover, the same letter may be used in separate claims to represent separate integer numbers, or different letters may be used. In addition, use of a particular letter in the detailed description may or may not match the letter used in a claim that pertains to the same subject matter in the detailed description.
As used herein, a list of items joined by the term “at least one of” can mean any combination of the listed terms. For example, the phrase “at least one of A, B or C” can mean A; B; C; A and B; A and C; B and C; or A, B and C.
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the drawings. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.