Embodiments as disclosed herein are in the field of memory management in computer systems.
Most contemporary computers, including personal computers as well as more powerful workstations, have some graphics processing capability. This capability is often provided by one or more special purpose processors in addition to the central processing unit (CPU). Graphics processing is a task that requires a relatively large amount of data. Accordingly, GPUs typically have their own graphics memories (also referred to as video memories or video random access memory (VRAM)). All computer systems are limited in the amount of data they can process in a given amount of time. One of the limiting factors of performance is availability of memory. In particular the availability of cache memory affects system performance.
Currently when systems that have GPUs and GPU memories are not performing graphics processing, the GPU memory is essentially unused (approximately 90% of VRAM is unused during non-graphics work). It would be desirable to provide a system in which the CPU could access the memory resources of the GPU to increase system performance.
The drawings represent aspects of various embodiments for the purpose of disclosing the invention as claimed, but are not intended to be limiting in any way.
Embodiments of a method and apparatus for using graphics memory (also referred to as video memory or video random access memory (VRAM)) for non-graphics related tasks are disclosed herein. In an embodiment a graphics processing unit (GPU) includes a VRAM cache module with hardware and software to provide and manage additional cache resourced for a central processing unit (CPU). In an embodiment, the VRAM cache module includes a VRAM cache driver that registers with the CPU, accepts read requests from the CPU, and uses the VRAM cache to service the requests. In various embodiments, the VRAM cache is configurable to be the only GPU cache or alternatively, to be a first level cache, second level cache, etc.
In an embodiment the VRAM cache driver is divided into four logical blocks (not shown): an initialization block, including PnP (Plug‘n’Play), power, etc.; an IRP (I/O Request Packet) queuing and processing block; a cache management block handling cache hits/misses, least recently used (LRU) list, etc.; and a GPU programming block.
Various caching algorithms are usable. According to just one example caching algorithm, the size of one cache entry is selected to be large enough to minimize lookup time and size of supportive memory structures. For example, the cache entry is in the range of 16K-256K in an embodiment. Another consideration in choosing the size of cache entries involves particularities of the OS. For example, Windows™ input/output (I/O) statistics can be taken into consideration. Table 1 shows I/O statistics for Windows XP™ read requests, where the X-Axis is I/O size and the Y-Axis is the number of requests:
Most of requests are less than the foregoing example selected caches entry size, which necessitates reading more than requested. However, from a disk IO perspective reading 4K takes the same amount of time as reading 128K, because most of the time taken is HDD seek time. Thus such a scheme is essentially “read ahead” with almost zero cost in terms of time. It may be necessary to allocate additional non-paged memory in order to supply a bigger buffer for such operations. One example eviction algorithm is based on one LRU list which is updated upon each cache hit.
In an embodiment the VRAM cache driver is loaded before any other driver component from a video subsystem. The VRAM cache driver is notified when all necessary video components are loaded and the GPU is initialized. The VRAM cache driver can be called as a last initialization routine, for example.
Memory supplied to (or allocated by) VRAM cache driver can be taken back by properly notifying the VRAM cache driver. According to one embodiment, such as for a particular operating system, the VRAM cache allocates memory in several chunks, and when the CMM (customizable memory management) fails to satisfy a request for local memory (e.g. when a 3D application is starting) it calls the VRAM cache driver, so it can free one or more memory chunks.
The video driver 214 sends messages to the VRAM cache driver 404 to indicate that the GPU is ready (also sending parameters), and an indication of a power state. The VRAM cache driver 404 sends messages to the video driver 214 to allocate memory and to free memory. When the video driver 214 sends a message to the VRAM cache driver 404 that it is out of memory for 3D operations, the VRAM cache driver 404 responds with a message to free memory. The VRAM cache driver 404 sends a transfer request to the video driver 214, and the video driver 214 sends a transfer-finished message to the VRAM cache driver 404. VRAM cache driver 404 should be notified when a requested transfer is complete, for example by calling its DPC (Delayed Procedure Call) routine.
Any circuits described herein could be implemented through the control of manufacturing processes and maskworks which would be then used to manufacture the relevant circuitry. Such manufacturing process control and maskwork generation are known to those of ordinary skill in the art and include the storage of computer instructions on computer readable media including, for example, Verilog, VHDL or instructions in other hardware description language.
Aspects of the embodiments described above may be implemented as functionality programmed into any of a variety of circuitry, including but not limited to programmable logic devices (PLDs), such as field programmable gate arrays (FPGAs), programmable array logic (PAL) devices, electrically programmable logic and memory devices, and standard cell-based devices, as well as application specific integrated circuits (ASICs) and fully custom integrated circuits. Some other possibilities for implementing aspects of the embodiments include microcontrollers with memory (such as electronically erasable programmable read only memory (EEPROM), Flash memory, etc.), embedded microprocessors, firmware, software, etc. Furthermore, aspects of the embodiments may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. Of course the underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (MOSFET) technologies such as complementary metal-oxide semiconductor (CMOS), bipolar technologies such as emitter-coupled logic (ECL), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, etc.
The term “processor” as used in the specification and claims includes a processor core or a portion of a processor. Further, although one or more GPUs and one or more CPUs are usually referred to separately herein, in embodiments both a GPU and a CPU are included in a single integrated circuit package or on a single monolithic die. Therefore a single device performs the claimed method in such embodiments.
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number, respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import, when used in this application, refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word, any of the items in the list, all of the items in the list, and any combination of the items in the list.
The above description of illustrated embodiments of the method and system is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the method and system are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. The teachings of the disclosure provided herein can be applied to other systems, not only for systems including graphics processing or video processing, as described above. The various operations described may be performed in a very wide variety of architectures and distributed differently than described. In addition, though many configurations are described herein, none are intended to be limiting or exclusive.
The teachings of the disclosure provided herein can be applied to other systems, not only for systems including graphics processing or video processing, as described above. The various operations described may be performed in a very wide variety of architectures and distributed differently than described. In addition, though many configurations are described herein, none are intended to be limiting or exclusive.
In other embodiments, some or all of the hardware and software capability described herein may exist in a printer, a camera, television, a digital versatile disc (DVD) player, a DVR or PVR, a handheld device, a mobile telephone or some other device. The elements and acts of the various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the method and system in light of the above detailed description.
In general, in the following claims, the terms used should not be construed to limit the method and system to the specific embodiments disclosed in the specification and the claims, but should be construed to include any processing systems and methods that operate under the claims. Accordingly, the method and system is not limited by the disclosure, but instead the scope of the method and system is to be determined entirely by the claims.
While certain aspects of the method and system are presented below in certain claim forms, the inventors contemplate the various aspects of the method and system in any number of claim forms. For example, while only one aspect of the method and system may be recited as embodied in computer-readable medium, other aspects may likewise be embodied in computer-readable medium. Such computer readable media may store instructions that are to be executed by a computing device (e.g., personal computer, personal digital assistant, PVR, mobile device or the like) or may be instructions (such as, for example, Verilog or a hardware description language) that when executed are designed to create a device (GPU, ASIC, or the like) or software application that when operated performs aspects described above. The claimed invention may be embodied in computer code (e.g., HDL, Verilog, etc.) that is created, stored, synthesized, and used to generate GDSII data (or its equivalent). An ASIC may then be manufactured based on this data.
Accordingly, the inventors reserve the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the method and system.