Portable computing devices (“PCDs”) are becoming necessities for people on personal and professional levels. PCDs may include cellular telephones, portable digital assistants, portable game consoles, palmtop computers, and other portable electronic processing devices.
PCDs use memory management units (“MMUs”) to manage writing data to and reading data from one or more physical memory devices, such as random access memory devices. An MMU of a PCD may provide a virtual memory to the central processing unit (“CPU”) of the PCD that allows the CPU to run each application program in its own dedicated, contiguous virtual memory address space rather than having all of the application programs share the physical memory address space, which is often fragmented or non-contiguous. The purpose of such an MMU is to translate a virtual memory address (“VA”) into a physical memory address (“PA”) in response to a read or write transaction request from the CPU that identifies the VA. The CPU indirectly reads and writes PAs by directly reading and writing VAs to the MMU, which translates them into PAs and then writes or reads the PAs. Similarly, various systems of a PCD, such as a graphics processing unit (“GPU”), a multimedia client system, etc., may include their own system MMUs (“SMMUs”). An SMMU allows the system to operate in its own dedicated, contiguous virtual memory address space by translating VAs into PAs for that system.
In order to perform the translations, the MMU or SMMU accesses page tables, which may be stored in the PCD main memory. The page tables comprise page table entries. The page table entries are information that is used by the MMU or SMMU to map the VAs into PAs. The MMU or SMMU may include a translation lookaside buffer (“TLB”), which is a cache memory used to store recently used VA-to-PA mappings. When the MMU or SMMU needs to translate a VA into a PA, the MMU or SMMU first checks the TLB to determine whether there is a match for the VA. If the MMU or SMMU finds a match, it uses the mapping found in the TLB to determine the PA and then accesses the PA (i.e., reads or writes the PA). This is known as a TLB “hit.” If the MMU or SMMU does not find a match in the TLB, this is known as a TLB “miss.” In the event of a TLB miss, the MMU or SMMU performs a method known as a table walk. In a table walk, the MMU or SMMU identifies a page table corresponding to the VA and then reads one or more locations in the page table until the corresponding VA-to-PA mapping is found. The MMU or SMMU then uses the mapping to determine the corresponding PA, writes the mapping back to the TLB, and accesses the PA.
In a PCD or other processing device that implements operating system (“OS”) virtualization, a virtual memory monitor, also commonly referred to as a hypervisor, is interposed between the PCD hardware and the PCD system OS. The hypervisor executes in privileged mode and is capable of hosting one or more guest high-level OSs (“HLOSs”). In such systems, application programs running on the OSs use VAs of a first layer of virtual memory to address memory, and the OSs running on the hypervisor use intermediate physical addresses (“IPAs”) of a second layer of virtual memory to address memory. The MMU or SMMU performs a “Stage 1” translation to translate each VA into an IPA, and one or more “Stage 2” translations to translate each IPA into a PA.
In a Stage 1 translation, the MMU or SMMU may read the system memory in a burst mode. For example, an SMMU may read 16 page descriptors, each comprising a VA and corresponding IPA, in a single burst, and store them in its TLB. Such a burst-mode page descriptor read operation is also commonly referred to as a page descriptor pre-fetch operation. In a conventional Stage 2 translation, the SMMU does not operate in the burst mode except in an instance in which all (e.g., 16) of the IPAs from the Stage 1 translation are contiguous, i.e., increase linearly with respect to their corresponding VAs. Although it is generally desirable for IPAs in the system memory to be organized in a linearly increasing manner with respect to their corresponding VAs, operation of the PCD under real-world use cases inevitably leads to memory fragmentation. Under real-world use cases, the probability that the IPAs from a Stage 1 translation are discontiguous is very high. Therefore, the SMMU very frequently performs 16 individual read operations to read 16 IPAs, resulting in high translation latency.
Methods, systems, and computer program products are disclosed for storing address translations in a memory system.
An exemplary method for storing an address translation in a memory system may include reading, by an MMU in a burst mode, a plurality of page descriptors from one or more page tables in a system memory. The plurality of page descriptors may comprise a plurality of VAs and a plurality of IPAs. Each of the IPAs may uniquely correspond to one of the VAs. The exemplary method may further include identifying in the plurality of page descriptors, by the MMU, a first plurality of contiguous IPAs beginning at a first base IPA and a second plurality of contiguous IPAs beginning at an offset from the first base IPA. The first plurality of contiguous IPAs may be separated from the second plurality of contiguous IPAs by at least one IPA that is not contiguous with the first plurality of contiguous IPAs or the second plurality of contiguous IPAs. The exemplary method may still further include reading, by the MMU, all PAs from the one or more page tables in the system memory to detect a plurality and then the MMU decides to store only one only if both the IPA and PA are contiguous. The first PA may correspond to the first base IPA. The exemplary method may also include storing, by the MMU, an entry in a TLB comprising the first PA and a first linearity tag.
An exemplary system for storing an address translation in a memory system may include a system memory configured to store one or more page tables and an MMU having an MMU memory. The MMU may be configured to read a plurality of page descriptors in a burst mode from the one or more of the page tables in the system memory. The plurality of page descriptors may comprise a plurality of VAs and a plurality of IPAs. Each of the IPAs may uniquely correspond to one of the VAs. The MMU may further be configured to identify in the plurality of page descriptors a first plurality of contiguous IPAs beginning at a first base IPA and a second plurality of contiguous IPAs beginning at an offset from the first base IPA. The first plurality of contiguous IPAs may be separated from the second plurality of contiguous IPAs by at least one IPA that is not contiguous with the first plurality of contiguous IPAs or the second plurality of contiguous IPAs. The MMU may still further be configured to read all PAs from the one or more page tables in the system memory to detect a plurality and then the MMU decides to store only one only if both the IPA and PA are contiguous. The first PA may correspond to the first base IPA. The MMU may also be configured to store an entry in a TLB in the MMU memory comprising the first PA and a first linearity tag.
An exemplary system for storing an address translation in a memory system may include means for means for reading a plurality of page descriptors in a burst mode from one or more page tables in a system memory. The plurality of page descriptors may comprise a plurality of VAs and a plurality of IPAs. Each of the IPAs may uniquely correspond to one of the VAs. The exemplary system may further comprise means for identifying in the plurality of page descriptors a first plurality of contiguous IPAs beginning at a first base IPA and a second plurality of contiguous IPAs beginning at an offset from the first base IPA. The first plurality of contiguous IPAs may be separated from the second plurality of contiguous IPAs by at least one IPA that is not contiguous with the first plurality of contiguous IPAs or the second plurality of contiguous IPAs. The exemplary system may still further comprise means for reading all PAs from the one or more page tables in the system memory to detect a plurality and then decides to store only one only if both the IPA and PA are contiguous. The first PA may correspond to the first base IPA. The exemplary system may also comprise means for storing an entry in a TLB comprising the first PA and a first linearity tag.
An exemplary computer program product for storing an address translation in a memory system may comprise a computer-readable medium having stored thereon in executable form instructions, which, when executed by a memory management processor, may configure the memory management processor to read a plurality of page descriptors in a burst mode from one or more page tables in a system memory. The plurality of page descriptors may comprise a plurality of VAs and a plurality of IPAs. Each of the IPAs may uniquely correspond to one of the VAs. The instructions, when executed by the memory management processor, may further configure the memory management processor to identify in the plurality of page descriptors a first plurality of contiguous IPAs beginning at a first base IPA and a second plurality of contiguous IPAs beginning at an offset from the first base IPA. The first plurality of contiguous IPAs may be separated from the second plurality of contiguous IPAs by at least one IPA that is not contiguous with the first plurality of contiguous IPAs or the second plurality of contiguous IPAs. The instructions, when executed by the memory management processor, may still further configure the memory management processor to read all PAs from the one or more page tables in the system memory to detect a plurality and then the processor decides to store only one only if both the IPA and PA are contiguous. The first PA may correspond to the first base IPA. The instructions, when executed by the memory management processor, may also configure the memory management processor to store an entry in a TLB comprising the first PA and a first linearity tag.
In the Figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same Figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all Figures.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
Wireless communication systems are widely deployed to provide various telecommunication services such as telephony, video, data, messaging, and broadcasts. Typical wireless communication systems may employ multiple-access technologies capable of supporting communication with multiple users by sharing available system resources (e.g., bandwidth, transmit power). Examples of such multiple-access technologies include code division multiple access (“CDMA”) systems, time division multiple access (“TDMA”) systems, frequency division multiple access (“FDMA”) systems, orthogonal frequency division multiple access (“OFDMA”) systems, single-carrier frequency division multiple access (“SC-FDMA”) systems, and time division synchronous code division multiple access (“TD-SCDMA”) systems.
These multiple access technologies have been adopted in various telecommunication standards to provide a common protocol that enables different wireless devices to communicate on a municipal, national, regional, and even global level. An example telecommunication standard is Long Term Evolution (“LTE”). An example of an advancement to LTE technology is referred to as 5G. The term 5G represents an advancement of LTE technology including, for example, various advancements to the wireless interface, processing improvements, and the enablement of higher bandwidth to provide additional features and connectivity.
By way of example, a wireless multiple-access communication system may include a number of base stations (which in some examples may be referred to as eNodeBs or eNBs), each simultaneously supporting communication for multiple communication devices, otherwise known as user equipments (“UE”s). A base station may communicate with UEs on downlink channels (e.g., for transmissions from a base station to a UE) and uplink channels (e.g., for transmissions from a UE to a base station).
The term “portable computing device” (“PCD”) is used herein to describe any device operating on a limited capacity power supply, such as a battery. A PCD is an example of a UE. Although battery operated PCDs have been in use for decades, technological advances in rechargeable batteries coupled with the advent of third generation (“3G”) and fourth generation (“4G”) wireless technology have enabled numerous PCDs with multiple capabilities. Examples of PCDs include a cellular telephone, a satellite telephone, a pager, a personal digital assistant (“PDA”), a smartphone, a navigation device, a smartbook or reader, a media player, a combination of the aforementioned devices, or a laptop, tablet, or other hand-held computer with a wireless connection, among others.
The terms “component,” “system,” “subsystem,” “module,” “database,” and the like are used herein to refer to a computer-related entity, either hardware, firmware, or a combination of hardware and firmware. For example, a component may be, but is not limited to being, a processor of portion thereof, a processor or portion thereof as configured by a program, process, object, thread, executable, etc. A component may be localized on one system and/or distributed between two or more systems.
The terms “application” and “application program” may be used synonymously to refer to a software entity having executable content, such as object code, scripts, byte code, markup language files, patches, etc. In addition, an “application” may further include files that are not executable in nature, such as data files, configuration files, documents, etc.
The terms “central processing unit” (“CPU”), “digital signal processor” (“DSP”), and “graphics processing unit” (“GPU”) are non-limiting examples of processors that may reside in a PCD. These terms are used interchangeably herein except where otherwise indicated. A component, system, subsystem, module, etc., of the PCD may include and operate under the control of such a processor.
As illustrated in
The GPU subsystem 106 may include, among other elements (not shown for purposes of clarity), a system memory management unit (“SMMU”) 112. The SMMU 112 may include a memory 114. The memory 114 may be configured to store a translation lookaside buffer (“TLB”) 116. Similarly, the multimedia subsystem 108 may include, among other elements (not shown for purposes of clarity), an SMMU 118. The SMMU 118 may include a memory 120. The memory 120 may be configured to store a TLB 122. Likewise, the input/output subsystem 110 may include, among other elements (not shown for purposes of clarity), an SMMU 124. The SMMU 124 may include a memory 126. The memory 126 may be configured to store a TLB 128. Each of SMMUs 112, 118, and 124 may be configured to operate in a virtual address space and to translate virtual addresses (“VAs”) in its address space into physical addresses (“PAs”) in system memory 102.
The CPU cluster 104 may include two or more CPU cores 130A through 130N, each of which may include a corresponding MMU 132A-132N. Each of MMUs 132-A-132N may be configured to operate in a virtual address space and to translate VAs in its address space into PAs in system memory 102. Each of MMUs 132A-132N may include a corresponding memory 134A-134N. Each of memories 134-134N may be configured to store a corresponding TLB 136A-136N. The CPU cluster 104 may operate under control of an operating system (“OS”) 138 and a hypervisor 140. The hypervisor 140 manages the VA-to-PA address translation for the CPU cluster 104. The hypervisor 140 may also manage a guest high-level OS (“HLOS”) 142.
The MMUs 132A-132N and the SMMUs 112, 118, and 124 are configured to translate VAs into PAs. As the SMMUs 112, 118, and 124 and the MMUs 132A-132N are all similarly configured to perform VA-to-PA address translations, except where otherwise indicated in this disclosure the term “MMU” also includes “SMMU” within its scope of meaning. In the following descriptions and examples, the term “MMU” thus refers to any of the SMMUs 112, 118, and 124 and the MMUs 132A-132N, except where otherwise indicated.
When a subsystem performs a memory transaction to write data to or read data from system memory 102, the subsystem's MMU first determines if the address translation or mapping is cached in its TLB (i.e., a TLB “hit”). If there is a TLB hit, the MMU may uses the mapping found in its TLB. However, if the MMU determines that the mapping is not cached in its TLB (i.e., a TLB “miss”), then the MMU may perform a two-stage table walk to determine the mapping, using information obtained from page tables 144 stored in system memory 102. The following examples illustrate various aspects of storing and otherwise providing such address translations or mappings.
A first example is illustrated in
Each page descriptor comprises a VA and a corresponding intermediate physical address (“IPA”). In the example illustrated in
In
In other instances, if a set of IPAs are contiguous, all corresponding PAs are usually fetched in a burst mode during a second stage of a table walk. The set of PAs which are thereby fetched are checked whether they are contiguous and having the same attributes. Depending on outcome of this check, multiple PAs may get compressed into a single TLB entry or stored as multiple entries.
In the examples described in this disclosure, at least some, but not necessarily all, of the IPAs are contiguous. The term “contiguous” with respect to IPAs in a group means that each IPA in the group represents a location in an intermediate physical memory space immediately adjacent to another location in the intermediate physical memory space, and the IPAs increase linearly from a base IPA of the group in relation to the corresponding VAs. In the example illustrated in
In the example illustrated in
In the example illustrated in
In the example illustrated in
In the example illustrated in
In a second stage of the two-stage VA-to-PA translation an MMU may translate one or more IPAs into corresponding PAs, using information obtained from the page tables 144 (
Continuing the example with reference to
As further illustrated in
The MMU need not individually store the remaining PAs corresponding to the IPAs of groups 202 and 204 because the tag enables those remaining PAs to be computed at a later time (e.g., contemporaneously with a memory transaction) from the stored base PA. The remaining PAs can be computed readily from the stored base PA because the remaining PAs increase linearly with respect to the base PA. In the example illustrated in
In the example illustrated in
For the present example, as each IPA (and VA) represents a 4 kB page, a size parameter of 16 kB indicates that the one or more contiguous IPAs can be found within a 64 kB region beginning at the base VA of groups 202 and 204. As understood by one of ordinary skill in the art, the size parameter may indicate page size as represented by each “1” in a linearity tag. In this case, groups 202 and 204 total about 16 kB each, where the linearity tag 0101 is stored with block 302 and is interpreted as “0 1[16 kB of 204] 0 1[16 kB of 202].” Note in
The location information may have a format that indicates the locations of the contiguous IPA groups within the block 302. In the example illustrated in
The tag may be stored in the TLB in any manner. For example, the tag may be stored in the form of higher-order bits above the PA page bits (i.e., “PA_W”). Generally, a 4-bit linearity tag space may be added in TLB cache. As the TLB storage is implemented in SRAM, area overhead is usually minimal. Linearity tags greater than 4-bits are possible. As understood by one of ordinary skill in the art, the size of the linearity tag depends upon the maximum page size supported (i.e. 4-bit linearity tags are used in the present examples because a maximum page size is described as 64 KB). As noted here and below, maximum page sizes beyond 64 kB are possible and thus, larger linearity tags beyond 4-bits may be employed for such larger page sizes.
Continuing the example with reference to
The tag includes the base VA of the third group 206, a size parameter indicating the size of a burst-readable block 402 of IPAs that encompasses the third group 206, and location information identifying locations of the contiguous IPAs within that block. In the illustrated example, a size parameter of 4 kB indicates that the one or more contiguous IPAs can be found within a compressed 16 kB region (i.e., compressed to 4 kB) beginning at the base VA of the third group 206, VA+4. That is, in the example illustrated in
The location information indicates the locations of the contiguous IPAs within the block 402. In the example illustrated in
Continuing the example with reference to
The tag includes the base VA of the fourth group 208, a size parameter indicating the size of a burst-readable block 502 of IPAs that encompasses the fourth group 208, and location information identifying locations of the contiguous IPAs within that block. In the illustrated example, a size parameter of 4 kB indicates that the one or more contiguous IPAs can be found within a compressed 16 kB region (i.e., compressed to 4 kB) beginning at the base VA of the fourth group 206, VA+12. Note in
The location information indicates the locations of the contiguous IPAs within the block 502. In the example illustrated in
Continuing the example with reference to
The tag includes the VA corresponding to IPA_N, a size parameter indicating the size of a read 602 that encompasses IPA_N, and location information identifying locations of the one or more contiguous IPAs within the data that is read. In the illustrated example, a size parameter of 4 kB indicates that the one or more contiguous IPAs can be found within a 4 kB region of the data that is read.
The location information indicates the locations of the one or more contiguous IPAs within the read 602. In the example illustrated in
Based on this, it is intended that the same tag is extended based on the decoding scheme to cover most or all scenarios. In this case, a 4-bit value 1000 and a size 4 k means only one 4 KB page is stored corresponding to VA+15. This is equivalent to storing single 4 KB page mapping in a single TLB entry as done in conventional systems without any 4-bit tag.
Accordingly, in the example illustrated in
Another example, illustrated in
As illustrated in
As indicated by block 1102, the MMU may read two or more page descriptors in a burst mode from one or more page tables in a system memory. Such a burst-mode read characterizes the first stage of the two-stage table walk. In the examples described above, the MMU reads 16 such page descriptors, each comprising a VA and a corresponding IPA. The MMU may then use the IPAs to determine the PAs in the second stage of the two-stage table walk.
Generally, the MMU may read any one or more of the IPAs to determine a corresponding PA. In some instances, the MMU may, for example, read each IPA individually (i.e., a single read operation) to determine a corresponding PA. However, as indicated by block 1104, in at least some instances the MMU may identify a first group of two or more contiguous IPAs beginning at a first base IPA, and a second group of two or more contiguous IPAs beginning at an offset from the first base IPA. The first and second groups of contiguous IPAs may be separated from the second group of contiguous IPAs by at least one IPA that is not contiguous with the first group or second group. Nevertheless, the MMU may read the first and second groups together in a single burst-mode or block read.
As indicated by block 1106, the MMU may use the base IPA of the first and second groups of contiguous IPAs to read a corresponding first base PA from the one or more page tables in the system memory. The MMU need not read all of the PAs corresponding to all of the IPAs of the first and second groups because those remaining PAs may be computed from the first base PA.
As indicated by block 1108, the MMU may store a first entry in its TLB that includes the above-referenced first base PA and a tag. The tag may include the VA corresponding to the first base PA as well as a size parameter and location information. The size parameter may indicate the size of a burst-readable block of IPAs that encompasses the first and second groups. The location information may identify locations of the contiguous IPAs (defining the first and second groups) within that block.
Another exemplary method, illustrated in
As indicated by block 1210, the MMU may identify a third group of contiguous IPAs that is not contiguous with either of the first or second groups. The base IPA of the third group of contiguous IPAs may be referred to as a second base IPA. As indicated by block 1212, the MMU may use the second base IPA to read a corresponding second base PA from the one or more page tables in the system memory. As indicated by block 1214, the MMU may store a second entry in its TLB that includes the second base PA and a second tag. The second tag may include the VA corresponding to the second base PA as well as a size parameter and location information.
Although not shown in
As illustrated in
A display controller 1310 and a touchscreen controller 1312 may be coupled to the CPU 1304. A touchscreen display 1314 external to the SoC 1302 may be coupled to the display controller 1310 and the touchscreen controller 1312. The display controller 1310 and touchscreen controller 1312 may together be an example of the multimedia subsystem 108 described above with regard to
One or more memories may be coupled to the CPU 1304. The one or more memories may include both volatile and non-volatile memories. Examples of volatile memories include static random access memory (“SRAM”) 1328 and dynamic RAMs (“DRAM”s) 1330 and 1331. Such memories may be external to the SoC 1302, such as the DRAM 1330, or internal to the SoC 1302, such as the DRAM 1331. One or both of the DRAMs 1330 and 1331 may be an example of the system memory 102 described above with regard to
A stereo audio CODEC 1334 may be coupled to the analog signal processor 1308. Further, an audio amplifier 1336 may be coupled to the stereo audio CODEC 1334. First and second stereo speakers 1338 and 1340, respectively, may be coupled to the audio amplifier 1336. In addition, a microphone amplifier 1342 may be coupled to the stereo audio CODEC 1334, and a microphone 1344 may be coupled to the microphone amplifier 1342. A frequency modulation (“FM”) radio tuner 1346 may be coupled to the stereo audio CODEC 1334. An FM antenna 1348 may be coupled to the FM radio tuner 1346. Further, stereo headphones 1350 may be coupled to the stereo audio CODEC 1334. Other devices that may be coupled to the CPU 1304 include a digital (e.g., CCD or CMOS) camera 1352.
A modem or radio frequency (“RF”) transceiver 1354 may be coupled to the analog signal processor 1308. An RF switch 1356 may be coupled to the RF transceiver 1354 and an RF antenna 1358. In addition, a keypad 1360, a mono headset with a microphone 1362, and a vibrator device 1364 may be coupled to the analog signal processor 1308.
A power supply 1366 may be coupled to the SoC 1302 via a power management integrated circuit (“PMIC”) 1368. The power supply 1366 may include a rechargeable battery or a DC power supply that is derived from an AC-to-DC transformer connected to an AC power source.
The SoC 1302 may have one or more internal or on-chip thermal sensors 1370A and may be coupled to one or more external or off-chip thermal sensors 1370B. The one or more of on-chip thermal sensors 1370A may be examples of junction thermal sensor 122 (
The touch screen display 1314, the video port 1320, the USB port 1324, the camera 1352, the first stereo speaker 1338, the second stereo speaker 1340, the microphone 1344, the FM antenna 1348, the stereo headphones 1350, the RF switch 1356, the RF antenna 1358, the keypad 1360, the mono headset 1362, the vibrator 1364, the thermal sensors 1370B, the ADC controller 1372, the PMIC 1368, the power supply 1366, the DRAM 1330, and the SIM card 1326 are external to the SoC 1302 in this exemplary or illustrative embodiment. It will be understood, however, that in other embodiments one or more of these devices may be included in such an SoC.
Alternative embodiments will become apparent to one of ordinary skill in the art to which the invention pertains without departing from its spirit and scope. Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.