Embodiments of the present invention are generally related to computer systems employing graphics processing units (GPUs).
As computer systems have advanced, the demand for system memory for execution of applications has increased rapidly. The amount of system memory in a computing device can have a significant impact on the performance of the computer system as well as the user experience.
Similarly, graphics processing units (GPUs) have become increasingly advanced. Correspondingly, the memory used by GPUs has increased to satisfy the demands of increasingly advanced GPUs. The memory is used for storing calculations and data necessary to generate an image (e.g., the frame buffer). Unfortunately, when a GPU is in an idle state, such as executing non graphics intensive applications such as spreadsheets, word processing, and email programs, etc., a vast portion of its memory remains unused. Thus, a majority of the memory remains unused until graphics intensive applications (e.g., games or applications involving graphics rendering) are launched, if ever. Among other things, this unused memory consumes power which is thus wasted during a GPU idle state.
Accordingly, what is needed is a system capable of utilizing graphics memory that would otherwise not be used for graphics processing. Embodiments of the present invention provide a system for accessing graphics memory as part of a computer system memory pool so that the memory can be used for general system application. Embodiments of the present invention allow for increased power efficiency by making use of graphics memory for general system use that would otherwise consume power while being unused.
In one embodiment, the present invention is implemented as a method for enabling access to graphics memory for general system use. The method includes detecting an idle state (e.g., low performance graphics mode) of a graphics processing unit (GPU). The GPU comprises memory (e.g., frame buffer) operable for storing graphics data. The method further includes determining an amount of available memory of the memory of the GPU and signaling an operating system. Memory data transfers (e.g., from virtual memory) are then received to store data into the graphics memory for general system application. Memory accesses to the memory of the GPU are translated into a suitable format and executed. When at some point a graphics intensive application is started by the user and the GPU needs the entire amount of its memory for graphics processing, the GPU can pass the application data stored via a bus where it is redirected to the system's hard drive.
In another embodiment, the present invention is implemented as a system for facilitating access to graphics memory for general system application. The system includes a performance mode monitor for determining a performance mode of a graphics processing unit (GPU) including graphics memory (e.g., frame buffer) and a memory available module for determining an amount of the graphics memory available (e.g., for use as part of a system memory pool). The system further includes a resource signaling module for signaling that the graphics memory is available and an interpretation module for interpreting memory access requests. The interpreting of memory access requests may include converting the requests from a system memory format to a format compatible with a memory access request of graphics memory.
In this manner, embodiments of the present invention facilitate increased system performance by adding graphics memory to the general system memory pool of a computing system. In one embodiment, graphics memory provides a much quicker alternative for data storage over virtual memory which requires disk accesses. Embodiments increase the value of a GPU card by increasing system memory and therefore make the increasingly advanced graphics card more desirable even during periods of non-intensive graphics. Embodiments further increase energy efficiency (e.g., of a mobile GPU) by making use of graphics memory that would otherwise not be used.
In another embodiment, the present invention is implemented as a graphics processing unit (GPU) subsystem. The GPU subsystem includes a graphics processor for executing graphics instructions and a frame buffer comprising memory for storing data used for execution of the graphics instructions. The GPU subsystem further includes a signaling module for signaling (e.g., a chipset, operating system, or the like) that a portion of the frame buffer is available for use by a computer system. A frame buffer access module is also available for facilitating access to the frame buffer memory by the computer system. The frame buffer may be accessible via a PCI Express bus in one instance.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present invention.
Notation and Nomenclature:
Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of an integrated circuit (e.g., computing system 100 of
Computer System Platform:
The CPU 101 and the GPU 110 can also be integrated into a single integrated circuit die and the CPU and GPU may share various resources, such as instruction logic, buffers, functional units and so on, or separate resources may be provided for graphics and general-purpose operations. The GPU may further be integrated into a core logic component. Accordingly, any or all the circuits and/or functionality described herein as being associated with the GPU 110 can also be implemented in, and performed by, a suitably equipped CPU 101. Additionally, while embodiments herein may make reference to a GPU, it should be noted that the described circuits and/or functionality can also be implemented and other types of processors (e.g., general purpose or other special-purpose coprocessors) or within a CPU.
System 100 can be implemented as, for example, a desktop computer system or server computer system having a powerful general-purpose CPU 101 coupled to a dedicated graphics rendering GPU 110. In such an embodiment, components can be included that add peripheral buses, specialized audio/video components, IO devices, and the like. Similarly, system 100 can be implemented as a portable device (e.g., cellphone, PDA, etc.), direct broadcast satellite (DBS)/terrestrial set-top box or a set-top video game console device such as, for example, the Xbox®, available from Microsoft Corporation of Redmond, Wash., or the PlayStation3®, available from Sony Computer Entertainment Corporation of Tokyo, Japan. System 100 can also be implemented as a “system on a chip”, where the electronics (e.g., the components 101, 115, 110, 114, and the like) of a computing device are wholly contained within a single integrated circuit die. Examples include a hand-held instrument with a display, a car navigation system, a portable entertainment system, and the like.
Embodiments of the present invention facilitate increased system performance by adding graphics memory to the general system memory pool of a computing system. In one embodiment, graphics memory provides a much quicker alternative for data storage over virtual memory which requires disk accesses. Embodiments increase the value of a GPU card by increasing system memory and therefore make the increasingly advanced graphics card more desirable even during periods of non-intensive graphics. Embodiments further increase energy efficiency (e.g., of a mobile GPU) by making use of graphics memory that would otherwise not be used.
Performance mode monitor 202 determines a performance mode of the graphics processing unit (GPU) 212 to determine graphics memory usage. In one embodiment, performance mode monitor 202 receives a signal including the performance mode of GPU 212. The GPU subsystem 210 includes graphics memory 214 (e.g., frame buffer) operable to be used in performing graphics processing (e.g., rendering). In one embodiment, GPU 212 has a low or first performance mode corresponding to a low intensity graphics application execution, which utilizes a small portion of graphics memory (e.g., less than 100 MB or less than 64 MB) and a high or second performance mode corresponding to high intensity graphics application execution (e.g., video games, graphical simulations, and the like), which utilizes a relatively large portion of graphics memory (e.g., 256 MB, 512 MB, or 1 GB, etc.). Performance mode monitor 202 may thus monitor and report whether GPU 212 is high performance mode or a low performance mode or any performance mode there between.
Memory available module 204 determines an amount of graphics memory available for use outside the graphics processor. More specifically, memory available module 204 determines the graphics memory space that is available during each performance mode of a GPU. For example, when GPU 212 is in a low performance mode, memory available module 204 determines the graphics memory space which is not being used for graphics processing and therefore may be available for general system use. In one embodiment, the amount of graphics memory that can be made available may be a configurable option (e.g., user configurable). For example, a graphics driver or other application may allow a user to select a portion of graphics memory 214 to be dedicated for use by a computing system for general application use.
Interpretation module 206 interprets memory access requests. More specifically, interpretation module 206 interprets memory access requests (e.g., reads and writes) from a computer system and interprets them for accessing graphics memory 214. For example, interpretation module 206 may carry out an algorithm for receiving a memory access request (e.g., from OS 220) in a system DDR format and convert the request to a GDDR format for carrying out with the allocated graphics memory. By using allocated graphics memory, rather than virtual memory, for general system applications, the overall efficiency of the computer increases as fewer disk accesses are required. In one embodiment, graphics memory 214 of the GPU subsystem 210 is used to store data that was stored in virtual memory (e.g., on a hard disk drive) and interpretation module 206 may reroute virtual memory calls to graphics memory for processing.
In one embodiment, the memory access requests may be received via a PCI Express bus which facilitates efficient use of graphics memory 214. The memory access requests allow graphics memory 214 to store application data. The storage of application data in graphics memory 214 effectively increasing overall memory available to a computer system.
Resource signaling module 208 signals that a portion of graphics memory 214 is available (e.g., when a GPU is in a low performance mode). In one embodiment, resource signaling module 208 signals a chipset (e.g., memory controller or Northbridge) that graphics memory is available and therefore, the overall level of system memory has increased. Resource signaling module 208 may also signal OS 220 that the graphics memory is available for storing application data.
Resource signaling module 208 may further signal that graphics memory is no longer available (e.g., signal OS 220). For example, when a graphics intensive application is started, the GPU may attempt to use a significant portion of graphics memory for graphics processing operations. In theses instances, the GPU (or some other device) passes the application data stored in the graphics memory (e.g., frame buffer) back to the motherboard though the PCI express bus where the data is redirected to virtual memory (e.g., hard disk drive) of the computer system.
Frame buffer 308 comprises memory for storing data used for execution of the graphics instructions. Frame buffer 308 may include one or more memory modules (e.g., one or more memory chips). Frame buffer 308 may further be a variety of types of memory including, but not limited to, GDDR2, GDDR3, GDDR4 or GDDR5. In one embodiment, frame buffer 308 is accessible via a PCI Express bus (e.g., interface 318).
Graphics processor 310 executes graphics instructions (e.g., for graphics rendering). Graphics processor 310 may further coordinate with signaling module 312 and frame buffer access module 310 to allow chipset 316 to access portions of frame buffer 308 thereby utilizing frame buffer 308 as part of system memory. It is appreciated that graphics processor 310 and chipset 316 may be designed to facilitate portions of frame buffer 308 being available for chipset 316 to utilize as system memory. Embodiments thus allow chipset 316 to access system DDR (double data rate) memory and GDDR memory as a memory pool. In one embodiment, graphics processor 310 transfers application data stored in frame buffer 308 to virtual memory before entering a high performance graphics mode in which all graphics memory would be needed for the GPU.
Frame buffer access module 310 facilitates a computer system to access frame buffer 308 for general system use. Frame buffer access module 310 converts or interprets memory access requests received from chipset 316 or motherboard 314 to a format compatible to the memory of frame buffer 308. In one embodiment, frame buffer access module 310 is operable to handle requests for data that were previously stored in virtual memory.
Signaling module 312 signals that a portion of frame buffer 308 is available for use by a computer system (e.g., via chipset 316). In one embodiment, signaling module 312 is operable to signal a memory controller of chipset 316 that a portion of frame buffer 308 is available (e.g., for use as part of a system memory pool). In another embodiment, a operating system may be signaled.
Interface 318 facilitates communication with motherboard 314, chipset 316, and other portions of a computer system. In one embodiment, interface 318 is a PCI Express interface or bus. It is appreciated that interface 318 may be any high speed interface operable to couple GPU subsystem 302 to a computer system. By using the high speed bus of interface 318, motherboard 314 may access the large amounts of frame buffer memory available in frame buffer 308 on GPU subsystem 302 for general system use when the GPU subsystem 302 is in an idle or low performance graphics mode.
Performance mode monitor 320 may be implemented in hardware and operate in a substantially similar manner to performance mode monitor 202. Memory available module 322 may be implemented in hardware and operate in a substantially similar manner to memory available module 204.
With reference to
In block 402, an idle state of a graphics processing unit (GPU) is detected. As described, the idle state may be a low performance graphics mode where substantial portions of graphics memory are unused.
In block 404, an amount of available graphics memory of the GPU is determined in real-time. In one embodiment, the amount of available graphics memory may be configurable or predetermined via a graphical user interface (GUI). For example, a computer system may be configured such that a certain amount of graphics memory is dedicated to the computer system even if the computer system is running a graphics intensive application. It is appreciated that the GPU and operating system can toggle the amount of memory used.
In block 406, a chipset is signaled. More specifically, the chipset may be signaled with the amount of available graphics memory that can be allocated for general system use. In one embodiment, the chipset includes a memory controller and reports to the operating system the amount of available memory (e.g., the combined memory pool of graphics memory and system memory).
In block 408, an operating system is signaled to indicate that graphics memory is available for application data and also the size of the graphics memory available for application data is included. In one embodiment, the operating system is signaled via a GPU driver. In another embodiment, the operating system is signaled via a chipset driver. Any of a number of well known methods can be used.
In block 410, a memory data transfer is received. The memory data transfer may be from a virtual memory storage (e.g., hard disk drive) or system memory to the graphics memory. It is appreciated that the memory data transfer may be modified for storage in graphics memory.
In block 412, memory accesses for the data stored in the memory of the GPU are translated to graphics memory. As described herein, the memory accesses are translated from OS memory calls or chipset memory accesses to a format suitable for accessing graphics memory. Where the graphics memory is being used for to store date previously stored on virtual memory, cache calls to a hard drive may be rerouted to graphics memory.
In block 414, the memory accesses are executed. The application data stored in graphics memory is accessed (e.g., read/write/erase) and results are returned (e.g., via an PCI Express bus).
In block 416, a change to a high end graphics mode is detected. As described herein, the launch of a graphics intensive application may cause a GPU to switch to a high end or high performance graphics mode and therefore need to use substantial portions of graphics memory, which may be temporarily used for general purpose use.
In block 418, data stored in the graphics memory of the GPU is transferred. As described herein, the portions of the data stored in the memory of the GPU may be transferred to a virtual memory storage (e.g., hard disk drive) or to system memory thereby making the memory of the graphics card again available for use in graphics instruction execution. In block 420, graphics memory is utilized for graphics instruction execution.
The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.
| Number | Name | Date | Kind |
|---|---|---|---|
| 4000487 | Patterson et al. | Dec 1976 | A |
| 4100601 | Kaufman et al. | Jul 1978 | A |
| 4553224 | Struger et al. | Nov 1985 | A |
| 4628446 | Hoffner, II | Dec 1986 | A |
| 4791403 | Mitchell et al. | Dec 1988 | A |
| 4803477 | Miyatake et al. | Feb 1989 | A |
| 5109417 | Fielder et al. | Apr 1992 | A |
| 5227789 | Barry et al. | Jul 1993 | A |
| 5293489 | Furui et al. | Mar 1994 | A |
| 5347637 | Halford | Sep 1994 | A |
| 5495542 | Shimomura et al. | Feb 1996 | A |
| 5508909 | Maxwell et al. | Apr 1996 | A |
| 5644524 | Van Aken et al. | Jul 1997 | A |
| 5736987 | Drucker et al. | Apr 1998 | A |
| 5793371 | Deering | Aug 1998 | A |
| 5801708 | Alcorn et al. | Sep 1998 | A |
| 5801975 | Thayer et al. | Sep 1998 | A |
| 5819058 | Miller et al. | Oct 1998 | A |
| 5831640 | Wang et al. | Nov 1998 | A |
| 5835097 | Vaswani et al. | Nov 1998 | A |
| 5841442 | Einkauf et al. | Nov 1998 | A |
| 5854638 | Tung | Dec 1998 | A |
| 5963744 | Slavenburg et al. | Oct 1999 | A |
| 6052127 | Vaswani et al. | Apr 2000 | A |
| 6055000 | Okada | Apr 2000 | A |
| 6078334 | Hanaoka et al. | Jun 2000 | A |
| 6130680 | Cox et al. | Oct 2000 | A |
| 6184893 | Devic et al. | Feb 2001 | B1 |
| 6233641 | Graham et al. | May 2001 | B1 |
| 6295594 | Meier | Sep 2001 | B1 |
| 6304268 | Iourcha et al. | Oct 2001 | B1 |
| 6381664 | Nishtala et al. | Apr 2002 | B1 |
| 6433789 | Rosman | Aug 2002 | B1 |
| 6499077 | Abramson et al. | Dec 2002 | B1 |
| 6501851 | Kondo et al. | Dec 2002 | B1 |
| 6546409 | Wong | Apr 2003 | B1 |
| 6553446 | Miller | Apr 2003 | B1 |
| 6580828 | Li | Jun 2003 | B1 |
| 6658559 | Arora et al. | Dec 2003 | B1 |
| 6678784 | Marmash | Jan 2004 | B2 |
| 6715023 | Abu-Lebdeh et al. | Mar 2004 | B1 |
| 6721813 | Owen et al. | Apr 2004 | B2 |
| 6732210 | Mathieson | May 2004 | B1 |
| 6825848 | Fu et al. | Nov 2004 | B1 |
| 6847365 | Miller et al. | Jan 2005 | B1 |
| 6876362 | Newhall, Jr. et al. | Apr 2005 | B1 |
| 7023445 | Sell | Apr 2006 | B1 |
| 7050063 | Mantor et al. | May 2006 | B1 |
| 7106339 | Grindstaff et al. | Sep 2006 | B1 |
| 7109999 | Lindholm et al. | Sep 2006 | B1 |
| 7126604 | Purcell et al. | Oct 2006 | B1 |
| 7221369 | Tripathi et al. | May 2007 | B1 |
| 7224838 | Kondo et al. | May 2007 | B2 |
| 7290080 | Patel | Oct 2007 | B2 |
| 7332929 | Normoyle et al. | Feb 2008 | B1 |
| 7380092 | Perego et al. | May 2008 | B2 |
| 7487327 | Chang et al. | Feb 2009 | B1 |
| 7624222 | Nanda et al. | Nov 2009 | B2 |
| 7752413 | Hoover et al. | Jul 2010 | B2 |
| 7928988 | Levas | Apr 2011 | B1 |
| 8082381 | Reddy et al. | Dec 2011 | B2 |
| 8103803 | Reddy et al. | Jan 2012 | B2 |
| 20010011326 | Yoshikawa et al. | Aug 2001 | A1 |
| 20020004860 | Roman | Jan 2002 | A1 |
| 20020147753 | Rao et al. | Oct 2002 | A1 |
| 20020156943 | Ishimura et al. | Oct 2002 | A1 |
| 20030013477 | McAlinden | Jan 2003 | A1 |
| 20030016229 | Dorbie et al. | Jan 2003 | A1 |
| 20030023646 | Lin et al. | Jan 2003 | A1 |
| 20030105788 | Chatterjee | Jun 2003 | A1 |
| 20030131164 | Abramson et al. | Jul 2003 | A1 |
| 20030169265 | Emberling | Sep 2003 | A1 |
| 20030206177 | Hoppe et al. | Nov 2003 | A1 |
| 20030223490 | Kondo et al. | Dec 2003 | A1 |
| 20040015659 | Dawson | Jan 2004 | A1 |
| 20040027358 | Nakao | Feb 2004 | A1 |
| 20040151372 | Reshetov et al. | Aug 2004 | A1 |
| 20040207631 | Fenney et al. | Oct 2004 | A1 |
| 20050024376 | Gettman et al. | Feb 2005 | A1 |
| 20050110790 | D'Amora | May 2005 | A1 |
| 20050193169 | Ahluwalia | Sep 2005 | A1 |
| 20060059289 | Ng et al. | Mar 2006 | A1 |
| 20070206018 | Bajic et al. | Sep 2007 | A1 |
| 20080244156 | Patel | Oct 2008 | A1 |
| 20080245878 | Shiota et al. | Oct 2008 | A1 |
| 20080282102 | Reddy et al. | Nov 2008 | A1 |
| 20080288550 | Wang et al. | Nov 2008 | A1 |
| 20100057974 | Reddy et al. | Mar 2010 | A1 |
| 20100153618 | Mathieson et al. | Jun 2010 | A1 |
| Entry |
|---|
| Mark Adler, Gzappend, Nov. 4 2003, http://svn.ghostscript.com/ghostscript/tags/zlib—1.2.3/examples/gzappend.—com. |
| Xilinx; “Using Dedicated Multiplexers in Spartan-3 Generation FPGAs”; Application Note; May 20, 2005. |
| Bluethgen. Hans-Martin; “BEE Mux-Demux Implementation for Connecting Subsystems on 2 FPGAs”; Jul. 15, 2002. |
| Micro Computer Control; “IP-201 12C Bus Multiplexer Board, User's Guide; Revision 1”; Jun. 2000. |
| Stok, L; Interconnect Optimization During Data Path Allocation; IEEE 1990. |
| Erdahl, Mike; “an Open Architecture for Multiplexing and Processing Telemetry Data”; Veda Systems Incorporated; Jul. 15, 1997. |
| Digital Equipment Corporation; “DH11 Asynchronous 16-line Multiplexer Mainteinence Manual”; Apr. 1975. |
| Campbell Scientific, Inc.; “AM16/32B Relay Multiplexer, Instruction Manual”; Oct. 2009. |
| Jiang et al.; “A Multiplexer Controlled by Fuzzy Associate Memory Leaky Bucket in ATM Networks”; 1997 IEEE International Symposium on Circuits and systems; IEEE, Jun. 12, 1997. |
| Number | Date | Country | |
|---|---|---|---|
| 20100149199 A1 | Jun 2010 | US |