The present invention relates generally to the field of computer graphics, and more particularly to utilization of the central processing unit (CPU) and main system random access memory (RAM) in lieu of a graphical processing unit (GPU) and video random access memory (VRAM) to efficiently render computer graphic overlays (e.g., pop-ups, menus, and cursors) with primary output to form a composite image that is presented on-demand to the frame buffer for display on a display device.
Computer graphics primary output (PO), such as graphics output for an application program, is often rendered by the GPU in VRAM. However, a graphic overlays (GO)—for example, pop-ups, menus, and/or cursors—are often rendered by the CPU in RAM instead of by the GPU in VRAM, and then one or more GOs are combined with a PO to form a composite image (CI) for output to the display device (the “CPU Method”). However, to derive a CI from both the PO and the GOs, the frame buffer—or, for some embodiments, its logical equivalent in the VRAM, the VRAM shadow memory (VRAMSM)—must be copied from the graphics card to RAM for processing by the CPU to create a composite image (CI), based on the PO and the GO(s), that is then copied from RAM back to the frame buffer for display. However, because AGP favors a system-to-video flow of data traffic, copying graphics from the frame buffer to system memory is time consuming and resource intensive, and thereby effectively negates any gains from utilizing the GPU on the graphics card.
CIs can also be rendered by the GPU in video working memory (VWM) of VRAM that is separate and distinct from the frame buffer (and VRAMSM), and this method (the “GPU Method”) does not suffer from this AGP-related limitation. However, as widely known and well-understood by those of skill in the art, there are other gains to be had by using the CPU to render “complex graphics” (including GOs) in RAM instead of using the GPU to render graphics in the VRAM. Some of these gains are described in detail in the patent applications cited in the cross-reference section herein above. Therefore, it is generally not desirable to render CIs in VWM with the GPU.
In addition, both the GPU Method and the GPU Method suffer from a “last-write problem.” Specifically, after a CI is formed from a PO and GOs and is written back to the frame buffer for display using either method, there is no mechanism guarantee that the frame buffer will not be further altered—for example, by a subsequent update made to the PO by an application—before the display device is updated based on the CI data written to the frame buffer. This last-write problem can cause a “flicker” effect, erroneous graphics output, or other negative graphical display results.
What is needed in the art is an improved approach to rendering CI graphics on a display device without flickers or errors that can occur with legacy methodologies for combining POs and GOs into CIs and displaying them on a display device. The present invention addresses these shortcomings.
One embodiment of the present invention is a method for rendering a CI (comprising a PO and at least one GO) wherein the GPU and VRAMSM are bypassed altogether and the resulting displayed graphics are instead rendered in RAM by the CPU and copied directly to the frame buffer. This method not only avoids the data flow problems inherent to computer systems that favor system-to-video flow of data traffic (that is, computer systems that utilize an AGP) and avoids the “last-write” problem altogether, but which also takes advantage of modem CPUs having increased computational speeds (that are orders-of-magnitude greater than the speeds of legacy processors) and supports complex graphics functions that are necessarily performed by the CPU (and not the GPU) to achieve significant performance gains.
The foregoing summary, as well as the following detailed description of preferred embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings exemplary constructions of the invention; however, the invention is not limited to the specific methods and instrumentalities disclosed. In the drawings:
The subject matter is described with specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the term “step” may be used herein to connote different elements of methods employed, the term should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
Computer Environment
Numerous embodiments of the present invention may execute on a computer.
As shown in
A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more application programs 36, other program modules 37 and program data 38. A user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite disk, scanner or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor 47, personal computers typically include other peripheral output devices (not shown), such as speakers and printers. The exemplary system of
The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in
When used in a LAN networking environment, the personal computer 20 is connected to the LAN 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
While it is envisioned that numerous embodiments of the present invention are particularly well-suited for computerized systems, nothing in this document is intended to limit the invention to such embodiments. On the contrary, as used herein the term “computer system” is intended to encompass any and all devices capable of storing and processing information and/or capable of using the stored information to control the behavior or execution of the device itself, regardless of whether such devices are electronic, mechanical, logical, or virtual in nature.
Graphics Processing Subsystems
The CPU 21′ is connected to an AGP 230. The AGP provides a point-to-point connection between the CPU 21′, the system memory RAM 25′, and graphics card 240, and further connects these three components to other input/output (I/O) devices 232—such as a hard disk drive 32, magnetic disk drive 34, network 53, and/or peripheral devices illustrated in
The graphics card 240 further comprises a frame buffer 246 which is directly connected to the display device 47′. As well-known and appreciated by those of skill in the art, the frame buffer is typically dual-ported memory that allows a processor (the GPU 242 or the CPU '21, as the case may be) to write a new (or revised) image to the frame buffer while the display device 47′ is simultaneously reading from the frame buffer to refresh (or “update”) the current display content. The graphics card 240 further comprises a GPU 242 and VRAM 244.
The GPU 242 is essentially a second processing unit in the computer system that has been specifically optimized for graphics operations. Depending on the graphics card, the GPU 242 may be either a graphics coprocessor or a graphics accelerator. When the graphics card is a graphics coprocessor, the video driver 224 sends graphics-related tasks directly to the graphics coprocessor for execution, and the graphics coprocessor alone render graphics for the frame buffer 246 (without direct involvement of the CPU 21′). On the other hand, when a graphics cards is a graphics accelerator, the video driver 224 sends graphics-related tasks to the CPU 21′ and the CPU 21′ then directs the graphics accelerator to perform specific graphics-intensive tasks. For example, the CPU 21′ might direct the graphics accelerator to draw a polygon with defined vertices, and the graphics accelerator would then execute the tasks of writing the pixels of the polygon into video memory (the VRAMSM 248) and, from there, copy the updated graphic to the frame buffer 246 for display on the display device 47′.
Accompanying the GPU 242 is VRAM 244 that enables the GPU to maintain its own shadow memory (the VRAMSM) close at hand for speedy memory calls (instead of using RAM), and may also provide additional memory (e.g, VWM) necessary for the additional processing operations such as the GPU Method. The VRAM 244 further comprises a VRAMSM 248 and VWM 249. The VRAMSM 248 is the location in VRAM 244 where the GPU 242 constructs and revises graphic images (including CIs in the GPU Method), and it is the location from which the GPU 242 copies rendered graphic images to the frame buffer 246 of the graphics card 240 to update the display device 47′. In the GPU Method, the VWM is an additional area of VRAM that is used by the GPU 242 to temporarily store graphics data that might be used by the GPU 242 to store GOs and/or store/restore POs (or portions thereof) among other things. (By offloading this functionality to the graphics card 240, the CPU 21′ and VSM 222 are freed from these tasks.)
The system memory RAM 25′ may comprise the operating system 35′, a video driver 224, video memory surfaces (VMSs) 223, and video shadow memory (VSM) 222. The VSM is the location in RAM 25′ where the CPU 21′ constructs and revises graphic images (including CIs in the CPU Method) and from where the CPU 21′ copies rendered graphic images to the frame buffer 246 of the graphics card 240 via the AGP 230. In the CPU Method, the VMSs are additional areas of RAM that are used by the CPU 21′ to temporarily store graphics data that might be used by the CPU 21′ to store GOs and/or store/restore POs (or portions thereof) among other things.
As illustrated in
The Direct Render Method
The method illustrated in
To address these shortcomings, the present invention employs a two-part general method comprising the steps illustrated in the flowchart of
In regard to the first step, the element of “neutralizing” is any state in which the GPU 242 and the VRAM 248 are no longer receiving and/or writing display data to the frame buffer 246, and the step of “isolating” the frame buffer is to prevent anything but the CPU, as the “manager,” to write data to the frame buffer. This step can be accomplished by a number of means; for example, the operating system 35′ might simply prevent any applications, drivers, etc. from communicating directly to the GPU, writing data to VRAM, redirecting all graphics calls to the CPU and its “manage” process, and also preventing applications from circumventing the CPU's “manage” processes for writing data to the frame buffer.
In regard to the second step, the element of using the CPU 21′ and the RAM 25′ to alone “manage” the process, this step essentially equates to having the CPU, utilizing a single process or a coordinated series of processes (the “manager”), to uniformly manage all graphics display data for storing POs and GOs in RAM, rendering CIs in RAM, writing POs and CIs to the frame buffer as appropriate and only as needed (which is the on-demand feature), and resolving conflicting requests for the graphics-based services the CPU provides.
One embodiment of the present invention to address the aforementioned shortcomings using this general methodology is illustrated in
The various system, methods, and techniques described herein may be implemented with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computer will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.
The methods and apparatus of the present invention may also be embodied in the form of program code that is transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via any other form of transmission, wherein, when the program code is received and loaded into and executed by a machine, such as an EPROM, a gate array, a programmable logic device (PLD), a client computer, a video recorder or the like, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code combines with the processor to provide a unique apparatus that operates to perform the indexing functionality of the present invention.
While the present invention has been described in connection with the preferred embodiments of the various figures, it is to be understood that other similar embodiments may be used or modifications and additions may be made to the described embodiment for performing the same function of the present invention without deviating there from. For example, while exemplary embodiments of the invention are described in the context of digital devices emulating the functionality of personal computers, one skilled in the art will recognize that the present invention is not limited to such digital devices, as described in the present application may apply to any number of existing or emerging computing devices or environments, such as a gaming console, handheld computer, portable computer, etc. whether wired or wireless, and may be applied to any number of such computing devices connected via a communications network, and interacting across the network. Furthermore, it should be emphasized that a variety of computer platforms, including handheld device operating systems and other application specific hardware/software interface systems, are herein contemplated, especially as the number of wireless networked devices continues to proliferate. Therefore, the present invention should not be limited to any single embodiment, but rather construed in breadth and scope in accordance with the appended claims.
This application is a continuation-in-part of U.S. patent application Ser. No. 10/622,597 (Atty. Docket No. MSFT-1794), filed on Jul. 18, 2003, entitled “SYSTEMS AND METHODS FOR EFFICIENTLY UPDATING COMPLEX GRAPHICS IN A COMPUTER SYSTEM BY BY-PASSING THE GRAPHICAL PROCESSING UNIT AND RENDERING GRAPHICS IN MAIN MEMORY,” the entire contents of which are hereby incorporated herein by reference. This application is related by subject matter to the inventions disclosed in the following commonly assigned applications, the entire contents of which are hereby incorporated herein by reference: U.S. patent application Ser. No. 10/622,749 (Atty. Docket No. MSFT-1786), filed on Jul. 18, 2003, entitled “SYSTEMS AND METHODS FOR UPDATING A FRAME BUFFER BASED ON ARBITRARY GRAPHICS CALLS”; and U.S. patent application Ser. No. 10/623,220 (Atty. Docket No. MSFT-1787), filed on Jul. 18, 2003, entitled “SYSTEMS AND METHODS FOR EFFICIENTLY DISPLAYING GRAPHICS ON A DISPLAY DEVICE REGARDLESS OF PHYSICAL ORIENTATION.”
Number | Date | Country | |
---|---|---|---|
Parent | 10622597 | Jul 2003 | US |
Child | 10778724 | Feb 2004 | US |