The various aspects and embodiments described herein relate to computer memory systems, and in particular, to increasing utilization associated with translation hardware used in a memory management unit (MMU).
Virtual memory is a memory management technique provided by most modern computing systems. Using virtual memory, a central processing unit (CPU) or a peripheral device of the computing system may access a memory buffer using a virtual memory address mapped to a physical memory address within a physical memory space. In this manner, the CPU or peripheral device may be able to address a larger physical address space than would otherwise be possible, and/or may utilize a contiguous view of a memory buffer that is, in fact, physically discontiguous across the physical memory space.
Virtual memory is conventionally implemented through the use of a memory management unit (MMU) for translation of virtual memory addresses to physical memory addresses. The MMU may be integrated into the CPU of the computing system (a CPU MMU), or may comprise a separate circuit providing memory management functions for peripheral devices (a system MMU, or SMMU). In conventional operation, the MMU receives memory access requests from “upstream” devices, such as direct memory access (DMA) agents, video accelerators, and/or display engines, as non-limiting examples. For each memory access request, the MMU translates the virtual memory addresses included in the memory access request to a physical memory address, and the memory access request is then processed using the translated physical memory address.
Because an MMU may be required to translate the same virtual memory address repeatedly within a short time interval, performance of the MMU and the computing system overall may be improved by caching address translation data within the MMU. In this regard, the MMU may include a structure known as a translation cache (also referred to as a translation lookaside buffer, or TLB). The translation cache provides translation cache entries in which previously generated virtual-to-physical memory address translation mappings may be stored for later access. If the MMU subsequently receives a request to translate a virtual memory address stored in the translation cache, the MMU may retrieve the corresponding physical memory address from the translation cache rather than retranslating the virtual memory address.
However, the performance benefits achieved through use of the translation cache may be lost in scenarios in which the MMU processes transactions in an incoming transaction stream in order of arrival. For example, the MMU may have multiple independent parallel translation machines that can each process one transaction at a time. However, the independent parallel translation machines could end up performing multiple identical translations based on the incoming translation request stream (e.g., where multiple requests are in the same memory region). This leads to a waste of hardware resources, specifically the translation machines and bus bandwidth.
The following presents a simplified summary relating to one or more aspects and/or embodiments disclosed herein. As such, the following summary should not be considered an extensive overview relating to all contemplated aspects and/or embodiments, nor should the following summary be regarded to identify key or critical elements relating to all contemplated aspects and/or embodiments or to delineate the scope associated with any particular aspect and/or embodiment. Accordingly, the following summary has the sole purpose to present certain concepts relating to one or more aspects and/or embodiments relating to the mechanisms disclosed herein in a simplified form to precede the detailed description presented below.
According to various aspects, a memory management unit (MMU) having multiple parallel translation machines may collect transactions in an incoming transaction stream and select appropriate transactions to dispatch to the parallel translation machines. For example, the MMU may include a dispatcher that can identify different transactions that belong to the same address set (e.g., have the same address translation) and dispatch one transaction from each transaction set to an individual translation machine. As such, the dispatcher may be used to ensure that multiple parallel translation machines do not perform identical memory translations, as other transactions that share the same address translation may obtain the translation results from a translation lookaside buffer. In this manner, utilization associated with memory translation hardware (e.g., the parallel translation machines) may be increased due to fewer duplicate memory accesses, which may also allow the data bus to serve other requests and thus increase bandwidth in the memory translation system.
According to various aspects, a method for dispatching memory transactions may comprise receiving a transaction stream comprising multiple memory transactions at a dispatcher coupled to a memory translation unit configured to perform multiple memory address translations in parallel, identifying, among the multiple memory transactions in the transaction stream, one or more transaction sets that each include one or more memory transactions that share a memory address translation, and dispatching, to the memory translation unit, one memory transaction for translation per transaction set such that the memory translation unit is configured to perform one memory address translation per transaction set.
According to various aspects, an apparatus for processing memory transactions may comprise a memory translation unit configured to perform multiple memory address translations in parallel and a dispatcher coupled to the memory translation unit, wherein the dispatcher may receive a transaction stream comprising multiple memory transactions, identify, among the multiple memory transactions in the transaction stream, one or more transaction sets that each include one or more memory transactions that share a memory address translation, and dispatch, to the memory translation unit, one memory transaction for translation per transaction set such that the memory translation unit performs one memory address translation per transaction set.
According to various aspects, an apparatus may comprise means for receiving a transaction stream comprising multiple memory transactions, means for identifying, among the multiple memory transactions in the transaction stream, one or more transaction sets, wherein the one or more transaction sets each include one or more memory transactions that share a memory address translation, and means for dispatching, to a memory translation unit configured to perform multiple memory address translations in parallel, one memory transaction for translation per transaction set such that the memory translation unit is configured to perform one memory address translation per transaction set.
According to various aspects, a non-transitory computer-readable storage medium may have computer-executable instructions recorded thereon, wherein the computer-executable instructions may be configured to cause one or more processors to receive a transaction stream comprising multiple memory transactions, identify, among the multiple memory transactions in the transaction stream, one or more transaction sets, wherein the one or more transaction sets each include one or more memory transactions that share a memory address translation, and dispatch, to a memory translation unit configured to perform multiple memory address translations in parallel, one memory transaction for translation per transaction set such that the memory translation unit is configured to perform one memory address translation per transaction set.
Other objects and advantages associated with the aspects and embodiments disclosed herein will be apparent to those skilled in the art based on the accompanying drawings and detailed description.
A more complete appreciation of the various aspects and embodiments described herein and many attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings which are presented solely for illustration and not limitation, and in which:
Various aspects and embodiments are disclosed in the following description and related drawings to show specific examples relating to exemplary aspects and embodiments. Alternate aspects and embodiments will be apparent to those skilled in the pertinent art upon reading this disclosure, and may be constructed and practiced without departing from the scope or spirit of the disclosure. Additionally, well-known elements will not be described in detail or may be omitted so as to not obscure the relevant details of the aspects and embodiments disclosed herein.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments. Likewise, the term “embodiments” does not require that all embodiments include the discussed feature, advantage, or mode of operation.
The terminology used herein describes particular embodiments only and should not be construed to limit any embodiments disclosed herein. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Those skilled in the art will further understand that the terms “comprises,” “comprising,” “includes,” and/or “including,” as used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, various aspects and/or embodiments may be described in terms of sequences of actions to be performed by, for example, elements of a computing device. Those skilled in the art will recognize that various actions described herein can be performed by specific circuits (e.g., an application specific integrated circuit (ASIC)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequences of actions described herein can be considered to be embodied entirely within any form of non-transitory computer-readable medium having stored thereon a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects described herein may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, “logic configured to” and/or other structural components configured to perform the described action.
Before discussing exemplary apparatuses and methods for increasing hardware utilization, increasing throughput, and decreasing bus bandwidth requirements in a memory management unit (MMU) having multiple parallel translation machines as disclosed herein, a conventional computing system providing virtual-to-physical memory address translation is described. In this regard,
As seen in
As noted above, the computing system 100 also includes the CPU 104 having integrated therein the CPU MMU 102. The CPU MMU 102 may provide address translation services for CPU memory access requests (not shown) of the CPU MMU 102 in much the same manner that the SMMU 106 provides address translation services to the upstream devices 108, 110, and 112. After performing virtual-to-physical memory address translation of a CPU memory access request, the CPU MMU 102 may access the memory 132 and/or the slave device 134 via the system interconnect 136. In particular, a master port (M) 150 of the CPU 104 communicates with a slave port (S) 152 of the system interconnect 136. The system interconnect 136 then communicates via the master ports (M) 142 and 144 with the slave ports (S) 146 and 148, respectively, of the memory 132 and the slave device 134.
To improve performance, an MMU, such as the CPU MMU 102 and/or the SMMU 106, may provide a translation cache (not shown) for storing previously generated virtual-to-physical memory address translation mappings. However, in the case of an MMU that has multiple parallel translation machines, the performance benefits achieved through use of the translation cache (also referred to as a translation lookaside buffer, or TLB) may be reduced in scenarios in which the MMU processes transactions in an incoming transaction stream in order of arrival.
In this regard,
However, because the MMU 200 shown in
Accordingly, to avoid the wasted hardware resources and bus bandwidth from having the parallel translation machines 222, 224 perform identical translations,
In various embodiments, with reference to block 430 in
According to various aspects,
According to various aspects,
According to various embodiments, the CPU 610 can also access the display controller 640 over the system bus 620 to control information sent to a display 670. The display controller 640 can include a memory controller 642 and memory 644 to store data to be sent to the display 670 in response to communications with the CPU 610. The display controller 640 sends information to the display 670 to be displayed via a video processor 660, which processes the information to be displayed into a format suitable for the display 670. The display 670 can include any suitable display type, including but not limited to a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an LED display, a touchscreen display, a virtual-reality headset, and/or any other suitable display.
Those skilled in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those skilled in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted to depart from the scope of the various aspects and embodiments described herein.
The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or other such configurations).
The methods, sequences, and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of non-transitory computer-readable medium known in the art. An exemplary non-transitory computer-readable medium may be coupled to the processor such that the processor can read information from, and write information to, the non-transitory computer-readable medium. In the alternative, the non-transitory computer-readable medium may be integral to the processor. The processor and the non-transitory computer-readable medium may reside in an ASIC. The ASIC may reside in an IoT device. In the alternative, the processor and the non-transitory computer-readable medium may be discrete components in a user terminal.
In one or more exemplary aspects, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a non-transitory computer-readable medium. Computer-readable media may include storage media and/or communication media including any non-transitory medium that may facilitate transferring a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of a medium. The term disk and disc, which may be used interchangeably herein, includes CD, laser disc, optical disc, DVD, floppy disk, and Blu-ray discs, which usually reproduce data magnetically and/or optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
While the foregoing disclosure shows illustrative aspects and embodiments, those skilled in the art will appreciate that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. Furthermore, in accordance with the various illustrative aspects and embodiments described herein, those skilled in the art will appreciate that the functions, steps, and/or actions in any methods described above and/or recited in any method claims appended hereto need not be performed in any particular order. Further still, to the extent that any elements are described above or recited in the appended claims in a singular form, those skilled in the art will appreciate that singular form(s) contemplate the plural as well unless limitation to the singular form(s) is explicitly stated.
The present application claims the benefit of U.S. Provisional Application No. 62/561,181, entitled “TRANSACTION DISPATCHER FOR MEMORY MANAGEMENT UNIT,” filed Sep. 20, 2017, the contents of which are hereby expressly incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62561181 | Sep 2017 | US |