Many computing devices, including portable computing devices such as mobile phones, include a System on Chip (“SoC”). Today's SoCs require ever increasing levels of power performance and capacity from memory devices, such as double data rate (“DDR”) memory devices. Such requirements necessitate relatively faster clock speeds and wider busses, the busses typically being partitioned into multiple, narrower memory channels in an effort to manage efficiency.
Multiple memory channels may be address-interleaved together to uniformly distribute the memory traffic across memory devices and optimize performance. Using an interleaved traffic protocol, memory data is uniformly distributed across memory devices by assigning addresses to alternating memory channels. Such a technique is commonly referred to as symmetric channel interleaving.
Existing symmetric memory channel interleaving techniques require all of the channels to be activated. For high performance use cases, this is intentional and necessary to achieve the desired level of performance. For low performance use cases, however, this leads to wasted power and inefficiency. Further, performance gains attributable to existing symmetric memory channel interleaving techniques in high performance use cases may sometimes be outweighed by adverse impacts on various parameters associated with a SoC, such as, for example, remaining battery capacity. Also, existing symmetric memory channel interleaving techniques are unable to optimize memory allocations between interleaved and linear zones when system parameters change, leading to inefficient use of memory capacity. Accordingly, there remains a need in the art for improved systems and methods for providing memory channel interleaving.
Systems and methods are disclosed for providing dynamic memory channel interleaving in a system on a chip. One such method comprises configuring a memory address map for two or more memory devices accessed via two or more respective memory channels with a plurality of memory zones. The two or more memory devices comprise at least one memory device of a first type and at least one memory device of a second type and the plurality of memory zones comprise at least one high performance memory zone and at least one low power memory zone. Next, a request is received from a process for a virtual memory page, the request comprising a preference for high performance. Also received is one or more system parameter readings, wherein the system parameter readings indicate one or more power management goals in the system on a chip. Based on the system parameter readings, at least one memory device of the first type is selected. Then, based on the preference for high performance, a preferred high performance memory zone within the at least one memory device of the first type is determined and the virtual memory page is assigned to a free physical page in the preferred memory zone.
The exemplary method may further comprise defining a boundary between the preferred memory zone and a low power memory zone using a sliding threshold address in the memory device so that if it is determined that the preferred memory zone requires expansion, the preferred memory zone may be modified accordingly by adjusting the sliding threshold address such that the low power memory zone is reduced. Additionally, the exemplary method may further comprise migrating the virtual memory page from the preferred memory zone within the at least one memory device of the first type to an alternative memory zone so that the at least one memory device of the first type may be powered down in order to reduce the overall power consumption of the system on a chip.
In the Figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same Figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all Figures.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
In this description, the terms “communication device,” “wireless device,” “wireless telephone”, “wireless communication device,” and “wireless handset” are used interchangeably. With the advent of third generation (“3G”) wireless technology and four generation (“4G”), greater bandwidth availability has enabled more portable computing devices with a greater variety of wireless capabilities. Therefore, a portable computing device may include a cellular telephone, a pager, a PDA, a smartphone, a navigation device, or a hand-held computer with a wireless connection or link.
Certain multi-channel interleaving techniques provide for efficient bandwidth utilization by uniformly distributing memory transaction traffic across all available memory channels. Under use cases in which high bandwidth is not required to maintain a satisfactory quality of service (“QoS”) level, however, a multi-channel interleaving technique that activates all available memory channels may consume power unnecessarily. Consequently, certain other multi-channel interleaving techniques divide memory space into two or more distinct zones at the time of system boot, one or more for interleaved traffic and one or more for linear traffic. Notably, each of the interleaved zone and the linear zone may be comprised of memory space spanning across multiple memory components accessible by different memory channels. Multi-channel interleaving techniques that leverage such static interleaved and linear memory zones may advantageously reduce power consumption by allocating all transactions associated with high bandwidth applications (i.e., high performance applications) to the interleaved zone while allocating all transactions associated with low bandwidth applications to the linear zone. For example, applications requiring a performance driven QoS may be mapped to the region best positioned to meet the performance requirement at the lowest possible level of power consumption.
Further improved multi-channel interleaving techniques dynamically define interleaved and linear memory zones such that the zones, while initially defined at system boot, may be dynamically rearranged and redefined during runtime on a demand basis and in view of power and performance requirements. Such dynamic partial interleaved memory management techniques may allocate transactions to the zones on a page-by-page basis, thereby avoiding the need to send all transactions of a certain application to a given zone. Depending on the real-time system requirements, dynamic partial channel interleaved techniques may allocate transactions from high performance applications to an interleaved memory zone or, alternatively, may seek to conserve power and allocate transactions from high performance applications to a linear zone, thereby trading performance level for improved power efficiency. It is also envisioned that certain embodiments of a dynamic partial channel interleaved solution may allocate transactions to memory zones that are defined as partially interleaved and partially linear, thereby optimizing the power/performance tradeoff for those applications that do not require the highest performance available through a fully interleaved zone but still require more performance than can be provided through a fully linear zone.
Dynamic partial channel interleaving memory management techniques according to the solution utilize a memory management (“MM”) module in the high level operating system (“HLOS”) that comprises a quality and power of service (“QPoS”) monitor module and a QPoS optimization module. The MM module works to recognize power and/or performance “hints” from the application program interfaces (“API”) while keeping track of current page mappings for transactions coming from the applications. The MM module also monitors system parameters, such as power constraints and remaining battery life, to evaluate the impact of the power and/or performance hints in view of the parameters. For example, an application requesting high performance status for its transactions may be overridden in its request such that the transactions are allocated to a defined memory zone associated with a low power consumption (e.g., a single, low power memory channel accessing a low-power memory component earmarked for linear page transactions).
Embodiments of the solution may define memory zones in association with particular QPoS profiles (quality and power). For example, consider a multi-channel DRAM memory architecture with 4-channels: a given zone might be a linear zone on one of the channels AND/OR a given zone might be a linear zone on two of the channels AND/OR a zone might be an interleaved zone across all four channels AND/OR a zone might be an interleaved zone across a subset of channels AND/OR a zone might be a mixed interleaved-linear zone with an interleaved portion across a subset of channels and a linear portion across a different subset of channels, etc. Also, consider an embodiment of the solution applied within a multi-channel memory with dissimilar memory types: a first zone might be comprised wholly within one of the memory types while a second zone might be defined wholly within a different memory component of a different type. Further, an embodiment of the solution applied within a multi-channel memory with dissimilar memory types may be operable to allocate given transactions to interleaved or linear zones within a given type of the dissimilar memories (i.e., a cascaded approach where the solution uses the system parameters to dictate the memory type and then the particular zone defined within the memory type—a zone within a zone). As would be understood by one of ordinary skill in the art considering the present disclosure, depending on the particular zone defined by an embodiment of the solution the associated QPoS will vary due to channel power consumption, memory device power consumption, interleaving/linear write protocol, etc.
Essentially, the monitor module receives the performance/power hints from the APIs and monitors the system parameters. Based on the system parameters, the optimization module decides how to allocate pages in the memory architecture based on QPoS tradeoff balancing. Further, the optimization module may dynamically adjust defined memory zones and/or define new memory zones in an effort to optimize the QPoS tradeoffs. For example, if power conservation is not a priority based on the system parameters, and a demand for high performance transactions from the applications exceeds the capacity of memory zones associated with high performance, the optimization module may dynamically adjust the zones such that more memory capacity is earmarked for high performance transactions.
It is also envisioned that embodiments of the solution for dynamic rearrangement of interleaved and linear memory zones may form multiple zones, each zone associated with a particular QPoS performance level. Certain zones may be in a linear zone of the memory devices while certain other zones are formed within the interleaved zones. Certain other zones may be of a mixed interleaved-linear configuration in order to provide an intermediate QPoS performance level not otherwise achievable in an all-interleaved or all-linear zone. Further, interleaved zones may be spread across all available memory channels or may be spread across a subset of available memory channels.
Advantageously, embodiments of the solution may work to dynamically allocate and free virtual memory addresses from any formed zone based on monitored parameters useful for estimating QPoS tradeoffs. For example, embodiments may assign transactions to zones with the lowest power level capable of supporting a required performance for page allocations. Moreover, embodiments may assign transactions without a request for high performance to the low power zone having the lowest power level of the various lower performance zones. Further, embodiments may assign transactions with a request for high performance to a high performance zone without regard for power consumption.
It is envisioned that certain embodiments may recognize a preferred zone for allocation of certain transactions in addition to “fallback” zones suitable for allocation of the same transactions in the event that the preferred zone is not available or optimal. Certain embodiments may seek to audit and migrate or evict pages from a “power hungry” but higher performance memory zone to a zone with a more optimal QPoS level for the given application associated with the pages. In this way, embodiments of the solution may migrate pages to a memory zone that results in a power savings without detrimentally impacting the QoS provided by the associated application. For example, an evicted page may be the last page active in a given DRAM channel and, therefore, by evicting the page to a memory device hosting a zone with a similar QPoS level and accessed by a different channel, the original DRAM channel may be powered down.
An advantage of embodiments of the solution is to optimize memory related power consumption for a given performance requirement. Essentially, if transactions may be serviced more efficiently (in terms of power consumption) in one memory zone than in another memory zone, then embodiments of the solution may allocate transactions to the more efficient zone and/or create a more efficient zone to service the allocations and/or increase the capacity of the more efficient zone and/or migrate pages to the more efficient zone.
The monitor module tracks the current page allocations in the interleaved zones (whether across all channels or a subset of channels) and the linear zones and the mixed zones. The optimization module works to dynamically rearrange the interleaved and/or linear zones and/or mixed zones within and across the memory devices to create new zones of QPoS levels available for incoming paged allocation based on the needs detected by the monitor module through QPoS hints (from the APIs) and current power/performance states in the memory devices. Based on the QPoS requirements of a given application, or the monitored system parameters, the optimization module determines a memory zone for the allocations. For example, in the case of an interleaved zone formed from a subset of channels (e.g., two channels out of four available channels), the memory management module may dictate that the two unused channels decline being refreshed in order to conserve power. As a further example, in the event that no applications require a QPoS level provided by an interleaved zone accessed by all available channels, the optimization module may continue to allocate transactions to an interleaved zone accessed by a subset of available channels while the remaining channels are powered down to conserve power.
As illustrated in the embodiment of
It should be appreciated that any number of memory devices, memory controllers, and memory channels may be used in the system 100 with any desirable types, sizes, and configurations of memory (e.g., double data rate (DDR) memory). In the embodiment of
As described below in more detail, the system 100 provides page-by-page memory channel interleaving based on static, predefined memory zones. An operating system (O/S) executing on the CPU 104 may employ the MM module 103 on a page-by-page basis to determine whether each page being requested by memory clients from the memory devices 110 and 118 are to be interleaved or mapped in a linear manner. When making requests for virtual memory pages, processes may specify a preference for either interleaved memory or linear memory. The preferences may be specified in real-time and on a page-by-page basis for any memory allocation request. As would be understood by one of ordinary skill in the art, a preference for interleaved memory may be associated with a high performance use case while a preference for linear memory may be associated with a low power use case.
In an embodiment, the system 100 may control page-by-page memory channel interleaving via the kernel memory map 132, the MM module 103, and the memory channel interleaver 106. It should be appreciated that herein the term “page” refers to a memory page or a virtual page comprising a fixed-length contiguous block of virtual memory, which may be described by a single entry in a page table. In this manner, the page size (e.g., 4 kbytes) comprises the smallest unit of data for memory management in an exemplary virtual memory operating system. To facilitate page-by-page memory channel interleaving, the kernel memory map 132 may comprise data for keeping track of whether pages are assigned to interleaved or linear memory.
As illustrated in the exemplary table 200 of
The interleave bits may be added to a translation table entry and decoded by the MM module 103. As further illustrated in
An exemplary implementation of the memory address map is described below with respect to
As illustrated in
Linear zone 402 comprises a first portion of DRAM 112 (112a) and a first portion of DRAM 120 (120a). DRAM portion 112a defines a linear address space 410 for CH. 0. DRAM 120a defines a linear address space 412 for CH. 1. Interleaved zone 404 comprises a second portion of DRAM 112 (112b) and a second portion of DRAM 120 (120b), which defines an interleaved address space 414. In a similar manner, linear zone 408 comprises a first portion of DRAM 114 (114b) and a first portion of DRAM 122 (122b). DRAM portion 114b defines a linear address space 418 for CH0. DRAM 122b defines a linear address space 420 for CH1. Interleaved zone 406 comprises a second portion of DRAM 114 (114a) and a second portion of DRAM 122 (122a), which defines an interleaved address space 416.
In this manner, it should be appreciated that low performance use case data may be contained completely in either channel CH0 or channel CH1. In operation, only one of the channels CH0 and CH1 may be active while the other channel is placed in an inactive or “self-refresh” mode to conserve memory power. This can be extended to any number N memory channels.
In an embodiment, the memory channel interleaver 106 (
The interleave signals 138 received from the MM module 103 signal that the current write or read transaction on SoC bus 107 is, for example, linear, interleaved every 512 byte addresses, or interleaved every 1024 byte addresses. Address mapping is controlled via the interleave signals 138, which takes the high address bits 756 and maps them to CH0 and CH1 high addresses 760 and 762. Data traffic entering on the SoC bus 107 is routed to a data selector 770, which forwards the data to memory controllers 108 and 116 via merge components 772 and 774, respectively, based on a select signal 764 provided by the address mapping module(s) 750. For each traffic packet, a high address 756 enters the address mapping module(s) 750. The address mapping module(s) 750 generates the output interleaved signals 760, 762, and 764 based on the value of the interleave signals 138. The select signal 764 specifies whether CH0 or CH1 has been selected. The merge components 772 and 774 may comprise a recombining of the high addresses 760 and 762, low address 705, and the CH0 data 766 and the CH1 data 768.
Referring again to
As mentioned above, the O/S kernel running on CPU 104 may cooperate in managing the performance/interleave type for each memory allocation via the kernel memory map 132. To facilitate fast translation and caching, this information may be implemented in a page descriptor of a translation lookaside buffer 1000 in MM module 103.
Referring to
Notably, in the system 100a, the MM module 103 further comprises a QPoS monitor module 131 and a QPoS optimization module 133. Advantageously, the monitor module 131 and the optimization module 133 not only recognize QPoS “hints” or preferences from an API associated with an application running on a processing component (e.g., CPU 104), but also monitor and weigh various parameters of the SoC 102 that indicate restraints, or lack thereof, on power consumption. In this way, the monitor module 131 and the optimization module 133 may recognize that power management goals across the SoC 102 may override the QPoS preference of any given application or individual memory transaction.
The monitor module 131, in addition to recognizing QPoS hints from applications and/or individual memory transactions, may actively monitor system parameters such as, but not limited to, operating temperatures, ambient temperatures, remaining battery capacity, aggregate power usage, etc. and provide the data to the optimization module 133. The optimization module may use the data monitored and provided by the monitor module 131 to balance the need for power efficiency against the performance preferences of the application(s). If the optimization module 133 determines that the temperature of the SoC 102 is dictating a reduction in power consumption, for example, then the optimization module 133 may override a high QPoS preference for a high performance QPoS memory zone (e.g., an interleaved zone) and allocate the transaction to a low power QPoS memory zone (e.g., a linear zone).
Using the data from the monitor module 131, it is envisioned that the optimization module 133 may dynamically adjust and/or create memory zones in the memory devices 110, 118, 119 and across memory channels CH0, CH1 or CH2 via memory controllers 108, 116 and 117, respectively. In the system 100a, the memory devices 110, 118 are of a common type while the memory device 119 is of a dissimilar type. As such, the optimization module 133 may use the monitored parameters and API hints to first select a memory type that is best suited for providing a required QPoS level to a requesting application without overconsumption of power and, subsequently, select a defined memory zone within the selected memory type that may most closely provides the desired QPoS level.
It is further envisioned that the optimization module 133 may use the monitored parameters and the API hints to trigger adjust, modification and/or creation of memory zones. For example, if the optimization module recognizes that there are no restrictions on power consumption within the SoC 102, and that the requested transaction includes a preference for a high performance memory zone, and that a high performance interleaved zone defined across memory devices 110, 118 is low in available capacity, and that a relatively large linear zone in the memory devices 110, 118 is underutilized, the optimization module may work to reduce the allocated memory space of the linear zone in favor of reallocating the space to the high performance interleaved zone. In this way, embodiments of the system and method may dynamically adjust memory zones defined within and across the devices 110, 118, 119 and channels CH0, CH1, CH2 to optimize memory usage in view of system power considerations and application performance preferences.
As illustrated in
Returning to the
As further illustrated in
When freeing memory, unused macro blocks may be relocated into the free zone 1420. This may reduce latency when adjusting the sliding threshold. The optimization module 133, working with the monitor module 131, may keep track of free pages or holes in all used macro blocks. Memory allocation requests may be fulfilled using free pages from the requested interleave type.
Notably, the sliding threshold address described in
Returning to the method 1600, at block 1610, a request for a virtual memory page allocation in a high performance zone may be received. Generally, such a request may default to an interleaved zone that leverages the bandwidth of multiple channels accessing multiple memory devices. By contrast, a request for a low power page allocation may default to a linear zone that leverages a single channel accessing a single memory device subject to a linear mapping protocol.
At block 1615, the system parameter readings indicative of power limits, power consumption levels, power availability, remaining battery life and the like may be monitored. At blocks 1620 and 1625, the QPoS preference from the application API and the system parameter readings may be weighed by an optimization module to determine whether the QPoS preference should be overridden in favor of a more power efficient memory channel and device. In such a case, the optimization module may elect to allocate the virtual memory address to a low power zone instead of the preferred high power zone at the expense of the QoS requested by the application.
At decision block 1630, if adequate memory capacity exists in the selected memory zone the method follows to block 1655 and the virtual memory page is assigned to the memory zone. Otherwise, the method follows to block 1635. At block 1635, if the ideal memory zone has not been defined, or is defined but inadequate to accommodate the allocation, the optimization module may work to expand the ideal memory zone (or define it de novo) by dynamically adjusting a memory address range at the expense of an underutilized zone. At blocks 1640 and 1650, the optimization module may also determine that certain transactions or pages may be redirected or migrated to a different zone so that the memory channel and memory device associated with the current zone may be powered down or otherwise taken offline to conserve energy.
It is envisioned that the optimization module may migrate (or make an initial allocation of) pages to an existing zone or to a newly created zone. To create a new zone, memory capacity associated with the free zone (see
As mentioned above, the system 100 may be incorporated into any desirable computing system.
A display controller 1716 and a touch screen controller 1718 may be coupled to the CPU 1702. In turn, the touch screen display 1725 external to the on-chip system 1701 may be coupled to the display controller 1716 and the touch screen controller 1718.
Further, as shown in
As further illustrated in
It should be appreciated that one or more of the method steps described herein may be stored in the memory as computer program instructions, such as the modules described above. These instructions may be executed by any suitable processor in combination or in concert with the corresponding module to perform the methods described herein.
Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described. However, the invention is not limited to the order of the steps described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps may performed before, after, or parallel (substantially simultaneously with) other steps without departing from the scope and spirit of the invention. In some instances, certain steps may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example.
Therefore, disclosure of a particular set of program code instructions or detailed hardware devices is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the Figures which may illustrate various process flows.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, NAND flash, NOR flash, M-RAM, P-RAM, R-RAM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (“DSL”), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
Disk and disc, as used herein, includes compact disc (“CD”), laser disc, optical disc, digital versatile disc (“DVD”), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Alternative embodiments will become apparent to one of ordinary skill in the art to which the invention pertains without departing from its spirit and scope. Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.