Disclosed aspects are directed to processing systems. More particularly, exemplary aspects are directed to reducing power consumption and/or improving performance using a variable length column command
Processing systems may include a backing storage location such as a memory subsystem comprising a main memory. For main memory implementations with large storage capacity, e.g., utilizing double-data rate (DDR) implementations of dynamic random access memory (DRAM) technology, the memory subsystem may be implemented off-chip, e.g., integrated on a memory chip which is different from a processor chip or system on chip (SoC) on which one or more processors which access the memory subsystem are integrated.
Power consumption in memory systems is a well-recognized challenge. Several techniques are known in the art for reducing power consumption in memory, such as voltage scaling. For example, the trend in voltage scaling is seen by considering the supply voltages specified in the Joint Electron Device Engineering Council (JEDEC) standard for several generations or versions of low power DDR (LPDDR). The supply voltage VDD is 1.8V for LPDDR1; 1.2V for LPDDR2 and LPDDR3; 1.1V for LPDDR4. However, for future generations (e.g., LPDDR5, and beyond) the scope for further voltage scaling is limited, because if supply voltage continues to reduce, performance degradations may be observed due to limitations imposed by refresh operations and performance of memory peripheral input/output (IO) circuitry. Thus, any power efficiency gains which may be achieved by further voltage scaling may be offset by performance and quality degradations.
In order to reduce the power consumption, a single data rate (SDR) mode was introduced for a command bus for transferring commands and address transactions between the SoC and the memory subsystem since the command bus was seen to utilize lower bandwidth in comparison to data buses. However, in the SDR mode, the bandwidth utilization of the command bus is seen to be on the rise, for example in the case of applications such as gaming, video playback, and other multimedia applications which utilize large data transfers between masters or processors such as graphics processing units (GPUs) or multimedia controllers on the SoC and the DRAM. This is because in conventional implementations, a separate column command is sent for each transfer of a data block (e.g., 16 or 32 bytes for DDR devices supporting ×8 data interfaces; or 32 or 64 bytes for DDR devices supporting ×16 data interfaces) from the SoC to the DRAM in the memory subsystem. However, the total amount of data transferred in such applications may be of much larger sizes, e.g., spanning entire and sometimes multiple rows or pages, even though the column commands are sent for each of the smaller data block sizes.
Thus, it is seen that conventional implementations may involve a large number of column commands transferred from the SoC to the memory subsystem, with a plurality of column commands directed to different columns within the same row or page of a bank of the DRAM. The plurality of column commands transferred between the SoC and the memory subsystem lead to increased power consumption or redundancy of column commands for read/write operations. Particularly as the industry adopts newer standards such as LPDDR5 and beyond, which are designed to support speeds in the range of 3.2 to 4 GHz, the power consumption due to the increased transfer of the plurality of column commands starts to play a more significant role.
Since there is an ever increasing need to reduce power consumption in processing systems, particularly at advanced technology nodes (e.g., 7 nm technologies which may be seen for systems such as Internet-of-Things and other connected devices which adopt the newer generations of DRAM such as LPDDR5), there is also seen to be a corresponding need to reduce the power consumption of the command bus between the SoC and the memory subsystem.
Exemplary aspects of the invention include systems and methods directed to reducing power consumption and/or improving performance of a processing system comprising a processor subsystem or SoC and a memory subsystem comprising memory such as a DRAM. In some aspects, variable length column commands are used in place of a plurality of column commands directed to a same row or page of a memory bank of the DRAM, for example. The variable length column commands are provided by the SoC based on a detection of a plurality of accesses directed to the same row or page. The memory subsystem, upon receiving a variable length column command, is configured to perform a corresponding plurality of accesses indicated by the variable length column command Transferring the variable length column command on a command bus between the SoC and the memory subsystem consumes less power in comparison to a corresponding transfer of the plurality of column commands Furthermore, transfer of the variable length column command for a particular row or page of a memory bank reduces a time duration before which a subsequent command can be transferred, for example, to a different memory bank, which improves performance
The accompanying drawings are presented to aid in the description of aspects of the invention and are provided solely for illustration of the aspects and not limitation thereof.
Aspects of the invention are disclosed in the following description and related drawings directed to specific aspects of the invention. Alternate aspects may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Likewise, the term “aspects of the invention” does not require that all aspects of the invention include the discussed feature, advantage or mode of operation.
The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of aspects of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes” and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, many aspects are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer-readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, “logic configured to” perform the described action.
Exemplary aspects of this disclosure are directed to reducing power consumption in a processing system comprising a processor subsystem or SoC and a memory subsystem comprising memory such as a DRAM. In some aspects, variable length column commands are used in place of a plurality of column commands directed to a same row or page of a memory bank of the DRAM, for example. The variable length column commands are provided by the SoC based on a detection of a plurality of accesses directed to the same row or page. The memory subsystem, upon receiving a variable length column command, is configured to perform a corresponding plurality of accesses indicated by the variable length column command Transferring the variable length column command on a command bus between the SoC and the memory subsystem consumes less power than the transfer of the plurality of column commands Furthermore, transfer of the variable length column command for a particular row or page of a memory bank reduces a time duration before which a subsequent command can be transferred, for example, to a different memory bank. Thus, performance improvements may also be realized in the exemplary use of the variable length column command.
In
Additionally, in the case of DRAM in memory subsystem 130, periodic refresh of memory cells is required, as known in the art, and refresh counter 162 may provide periodic messages to command scheduler 156 to provide refresh commands to memory subsystem 130. The transactions from command scheduler 156 are transferred to memory interface 110 which may include a physical layer module for commands shown as CA PHY block 110a. Corresponding data to be transferred for some requests (e.g., write commands) is queued in data buffer 158, and with the control of data management block 160 for selected transactions, the data is provided to a physical layer module for data shown as DQ PHY block 110b in memory interface 110. Data received from memory subsystem 130 (e.g., read data), via DQ PHY block 110b and data management block 160 may also be placed in the same data buffer 158 or a different data buffer, per particular implementations, before being provided to a requesting processing element 104a-e. Various other control logic and functional blocks may be present in memory controller 108 and more generally, SoC 120, but these are not germane to this disclosure, and as such are not dealt with in further detail herein.
Two buses are shown for transferring commands and data between SoC 120 and memory subsystem 130—command bus (also referred to as CA) 114 for transferring addresses, commands, etc., from SoC 120 to memory subsystem 130 and data bus (also referred to as DQ) 112, which may be a bidirectional bus for transferring write data from SoC 120 to memory subsystem 130 and receiving read data at SoC 120 from memory subsystem 130.
Referring now to
With combined reference now to
Each page of the memory bank 180 may comprise several data blocks, for example, of bit lengths 16 or 32 bytes each. In a conventional implementation, a write operation to a page of the memory bank is provided in terms of a column command for each data block to be written. With an open page policy, if a plurality of data blocks is targeted, a corresponding plurality of column commands is selected by command scheduler 156 and provided back to back (also referred to as a burst of column commands)
For example, with combined reference to
In order to reduce the above power consumption, in exemplary aspects, a variable length column command is disclosed. In place of a burst of a plurality of conventional column commands, each directed to an individual data block of the same page of a memory bank, the variable length column command may be used to direct write operations to a plurality of the data blocks targeted by the plurality of column commands In exemplary aspects, the variable length column command consumes less power, both for transfer on the CA bus 114, as well as corresponding circuitry on the SoC and the memory subsystem in comparison to the plurality of column commands which are used to accomplish the same task in conventional processing system 100.
With reference to
As such, processing system 200 is shown to comprise SoC 220 and memory subsystem 230, with processing elements 204a-e (which may be similar to counterpart processing elements 104a-e of
Command dependency and variable length checker 256 may be configured to check for dependencies in command transaction queue 254, such as for two or more commands which may be directed to the same page of the same memory bank, but to different data blocks, and more specifically, adjacent data blocks in some aspects. If such dependencies are found, the two or more commands are replaced by the exemplary variable length column command, an example format of which will be discussed with reference to
The variable length column command, when generated in place of the two or more commands by command dependency and variable length checker 256, may be provided to CA PHY block 210a of memory interface 210 to be transferred on CA bus 214 to memory subsystem 230. The remaining aspects such as DQ bus 212 and DQ PHY block 210b may be similarly configured as DQ bus 112 and DQ PHY block 110b and as such, will not be discussed in further detail herein.
Referring now to
With combined reference now to
Referring to
With reference now to
Referring to
In contrast,
Correspondingly, memory bank B1 will be precharged by the time the data transfer to memory bank B0 ends at time t+85, which means that the data transfer on DQ bus 112 to memory bank B1 can commence as early as time t+91, providing a mere 6 cycle clock delay from when the data transfer for memory bank B0 ended (as contrasted with the 62 cycle wait time during which DQ bus 112 must remain idle for conventional implementations of command sequence 300).
It will be appreciated that aspects include various methods for performing the processes, functions and/or algorithms disclosed herein.
For example, in block 402, the column address CA[5:0] received on CA bus 214 is decoded by command address multiplexor and decoder 272, e.g., to determine whether CA[3] is set.
In block 404, command address multiplexor and decoder 272 may determine the operation is for a conventional write or a conventional read and whether CA[3] is set at the sampling time, i.e., when CS-L is high (see
If the outcome of the determination in block 404 is no, then in block 406, the command CAS2 may be sampled and method 400 may return to conventional processing using CAS2 commands
If the outcome of the determination in block 404 is yes, then in block 408, command address multiplexor and decoder 272 may extract information pertaining to the starting column address for performing the variable length column command and in block 410, determine the block length extension or number of data blocks for which the corresponding memory bank is to be accessed continuously.
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Accordingly, an aspect of the invention can include a computer-readable media embodying a method for reducing power consumption in a processing system using a variable length column command. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in aspects of the invention.
The foregoing disclosed devices and methods are typically designed and are configured into GDSII and GERBER computer files, stored on a computer-readable media. These files are in turn provided to fabrication handlers who fabricate devices based on these files. The resulting products are semiconductor wafers that are then cut into semiconductor die and packaged into a semiconductor chip. The chips are then employed in devices described above.
While the foregoing disclosure shows illustrative aspects of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the aspects of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
The present Application for Patent claims the benefit of Provisional Patent Application No. 62/420,954 entitled “LOW POWER MEMORY SUB-SYSTEM USING VARIABLE LENGTH COLUMN COMMAND” filed Nov. 11, 2016, pending, and assigned to the assignee hereof and hereby expressly incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62420954 | Nov 2016 | US |