In current graphics processing systems, the number and processing speed of memory clients have increased enough to make memory access latency a barrier to achieving high performance. In some instances, various memory clients share a common memory, and each memory client issues requests for data stored in the common memory based on individual memory access requirements. Requests from these memory clients are typically serialized through a common interface. As a result, requests are sometimes queued up for servicing prior to a critical request, where “critical request” refers to a request that should be serviced promptly to prevent underflow of an output data stream or overflow of an input data stream, in which underflow or overflow can sometimes lead to screen corruption or other errors. In some instances, a non-critical request subsequently becomes critical but is queued up for servicing behind non-critical requests. However, a critical request issued behind non-critical requests may have to wait for all prior non-critical requests to be serviced first, thus leading to increased service time for the critical request and potential screen corruption.
Since the individual requests from the display engine 114 only include specific information about the data being retrieved such as address, size and width of the data, the memory controller 112 must have additional information about the memory it is accessing in order to effectively access the requested data. Additionally, the display buffer 116 must be big enough to store all of the data being requested by the display engine 114. Since the memory controller 112 simply processes individual requests as they are queued up, the display buffer 116 must be sufficiently large to store sufficient data in the event that changes occur and the display engine 114 cannot process the retrieved data as fast as it receives the data. Conversely, there may be delays in retrieving data, and display engine 114 must buffer enough data in display buffer 116 that it does not run out of data while waiting for data to be retrieved.
For example, when the display engine 114 sends an individual request for retrieving specific data to the memory controller 112, the memory controller 112 queues the individual requests, processes the requests sequentially and retrieves the specific requested data sequentially. Once the memory controller 112 queues the request, the request will not be processed until its turn arrives regardless of the criticality of the request. The retrieved data is then sequentially transmitted back to the display engine 114, regardless of whether the display engine 114 is ready to process the data or not. The display engine 114 must have a sufficiently large display buffer 116 to store all of this requested data because the display engine 114 may not be ready to process the data as fast as it receives the data. There are many reasons why the display buffer 116 may not be able to process data as fast as it receives the data including changed conditions, which could require processing data in a different order than the data was received. In such a situation the earlier requested data would have to be stored until the later requested data can be processed. This has a disadvantage of requiring a larger display buffer 116 and more processing power to process this data in a different order than the data was received.
Therefore what is needed is a system and method for controlling the retrieval of data from memory that can accommodate changes in data requests without having to store large amounts of data in display buffers.
According to embodiments of the invention, methods are provided that retrieve data according to contracts having sets of instructions. A process engine, such as a display engine retrieves data from memory by sending contracts, or sets of instructions, to the memory controller. The memory controller then retrieves data from the memory according to the set of instructions and transmits that data to the display engine for the display engine to process and display. Moreover, isochronous memory clients in the display engines can set up a single contract for large blocks of data (e.g. for an entire frame) with the memory controller, which can pre-fetch data, thus reducing the number of critical memory requests.
In one embodiment of the present invention, a method of retrieving data stored in a memory includes generating a contract in a process engine, where the contract includes instructions for multiple memory fetches, transferring the contract from the process engine to a memory controller, performing a plurality of memory access operations to fetch data from the memory according to the contract, sending the fetched data to the process engine according to the contract, and processing the fetched data using the process engine to generate pixels of an image for display. The process engine can be a display engine.
In another embodiment of the present invention, generating the contract includes generating a set of instructions for retrieving data from the memory.
In yet another embodiment of the present invention, generating a set of instructions for retrieving data comprises generating a set of instructions for each surface. The set of instructions for each surface can be generated by providing a base address for data corresponding to each surface, providing a starting offset for retrieving a portion of the data corresponding to each surface, providing a width and height of the surface from which to retrieve data, providing a relative scan-out positioning, and providing scaling.
In yet another embodiment of the present invention, generating a contract includes specifying in the contract a pixel rate for each head.
In yet another embodiment of the present invention, the fetched data is sent to the display engine in an isochronous stream.
In yet another embodiment of the present invention, generating a set of instructions for retrieving data comprises providing information as to an isochronous bandwidth. In one embodiment the isochronous bandwidth is greater than 8 GB/s.
In yet another embodiment of the present invention, the fetched data is buffered in the display engine.
In yet another embodiment of the present invention, retrieving data stored includes generating a second contract in a second process engine, where the second contract includes instructions for multiple memory fetches, transferring the second contract from the second process engine to a memory controller, performing a plurality of memory access operations to fetch a second data from the memory according to the second contract, sending the fetched second data to the second process engine according to the second contract, and processing the fetched second data using the second process engine to generate pixels of an image for display. The second contract can be transferred after the first contract is transferred and after one sweep is performed.
In yet another embodiment of the present invention, retrieving data stored includes generating a contract amendment in response to a change in the display engine, transferring the contract amendment to the memory controller, determining whether the contract amendment is processible, and fetching data from memory controller according to the contract incorporating the contract amendment in the event that the contract amendment can be processed. The decision can be based on whether the memory controller has sufficient time to incorporate the contract amendment.
In yet another embodiment of the present invention, a method of retrieving data stored in a memory includes generating a first contract in a display engine, where the first contract includes instructions for multiple memory fetches, transferring the first contract from the display engine to a memory controller, performing a plurality of memory access operations to fetch a first data from the memory according to the first contract, sending the fetched first data to the display engine according to the first contract, processing the fetched first data using the display engine to generate pixels of an image for display, generating a second contract in the display engine, where the second contract includes instructions for multiple memory fetches, transferring the second contract from the display engine to a memory controller, performing a plurality of memory access operations to fetch a second data from the memory according to the second contract, sending the fetched second data to the display engine according to the second contract, and processing the fetched second data using the display engine to generate pixels of an image for display.
In yet another embodiment of the present invention, retrieving data stored includes generating a contract amendment in response to changes in the display engine, transferring the contract amendment to the memory controller, making a decision whether the contract amendment can be processed along with the first contract, and fetching data from the memory controller according to the first contract incorporating the contract amendment if the decision is that the contract amendment can be processed along with the first contract. The decision can be based on whether the memory controller has sufficient time to incorporate the contract amendment into the first contract.
In yet another embodiment of the present invention, retrieving data stored includes, in the event that the contract amendment cannot be processed with the first contract, incorporating the contract amendment into the second contract, and fetching data according to the second contract as amended. The first data and the second data can be sent to the display engine in an isochronous stream.
In yet another embodiment of the present invention, a system for retrieving data stored in a memory includes a process engine configured to generate a contract, the contract comprising instructions for multiple memory fetches, a memory controller coupled to the process engine, wherein the memory controller is configured to receive the contract from the process engine, wherein the memory controller is configured to process the contract by performing a plurality of memory access operations to fetch data from the memory according to the contract, wherein the memory controller is configured to send the fetched data to the process engine according to the contract, and wherein the process engine is configured to process the fetched data to generate pixels of an image for display. The process engine can be a display engine.
In yet another embodiment of the present invention, the process engine is further configured to generate an amendment to the contract.
In another embodiment of the present invention, a processing apparatus for displaying an image includes a memory request generator configured to generate contracts specifying data ranges for respective presentation elements. The memory request generator is configured to assign priorities to the contracts based on a presentation order of the presentation elements. The processing apparatus also includes a memory request arbiter connected to the memory request generator. The memory request arbiter is configured to issue the contracts based on the priorities assigned to the contracts.
In yet another embodiment, the processing apparatus includes a memory request arbiter configured to receive a first contract specifying data for a first presentation element and a second contract specifying data for a second presentation element. The memory request arbiter is configured to arbitrate between the first contract and the second contract based on a presentation order of the first presentation element and the second presentation element.
Other aspects and embodiments of the invention are also contemplated. The foregoing summary and the following detailed description are not meant to restrict the invention to any particular embodiment but are merely meant to describe some embodiments of the invention.
Embodiments of the present invention use contracts to retrieve ranges of data stored in memory. Contracts used to request ranges of data include sets of instructions which instruct the memory controller what data to retrieve and in some cases how to retrieve that data, as is further described with reference to the figures below. Additionally, amendments can be used to amend or modify contracts after the contracts have been sent to the memory controller. As will be discussed in more detail below, isochronous memory clients (e.g. in a display engine of a graphics processor) can set up a contract for an entire frame of data with the memory controller, which can pre-fetch data. This ability to pre-fetch data based on information in a contract reduces the number of critical memory requests sent by the display engine to the memory controller. Sending a contract for the entire frame also improves the fetching order of data across different isochronous streams that are composed on the same frame. Additionally, by using a contract that predetermines the fetch order, the amount of buffering space required within the isochronous engine can be reduced. It should also be noted that by using contracts to retrieve data, instead of individual requests, data can be retrieved more intelligently than by sequentially processing individual data requests that have been queued up in the memory controller. For example, a contract can be amended to shift processes such as scaling pixels, manipulating pixels and combining pixels from the display engine to the memory controller, whereas individual data requests are simply executed by the memory controller and data is transmitted to the display buffer for processing.
Graphics processing subsystem 212 includes a graphics processing unit (GPU) 222 and a graphics memory 224, which may be implemented, e.g., using one or more integrated circuit devices such as programmable processors, application specific integrated circuits (ASICs), and memory devices. GPU 222 may be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 202 and/or system memory 204 via memory bridge 205 and bus 213, interacting with graphics memory 224 to store and update pixel data, and the like. For example, GPU 222 may generate pixel data from 2-D or 3-D scene data provided by various programs executing on CPU 202. GPU 222 may also store pixel data received via memory bridge 205 to graphics memory 224 with or without further processing. GPU 222 also includes a display engine configured to deliver pixel data from graphics memory 224 to display device 210. The display engine is an isochronous processing engine that obtains pixel data from graphics memory 204 using contracts, as described below.
CPU 202 operates as the master processor of system 200, controlling and coordinating operations of other system components. In particular, CPU 202 issues commands that control the operation of GPU 222. In some embodiments, CPU 202 writes a stream of commands for GPU 222 to a command buffer, which may be in system memory 204, graphics memory 224, or another storage location accessible to both CPU 202 and GPU 222. GPU 222 reads the command stream from the command buffer and executes commands asynchronously with operation of CPU 202. The commands may include conventional rendering commands for generating images as well as general-purpose computation commands that enable applications executing on CPU 202 to leverage the computational power of GPU 222 for data processing that may be unrelated to image generation.
It will be appreciated that the system shown herein is illustrative and that variations and modifications are possible. The bus topology, including the number and arrangement of bridges, may be modified as desired. For instance, in some embodiments, system memory 204 is connected to CPU 202 directly rather than through a bridge, and other devices communicate with system memory 204 via memory bridge 205 and CPU 202. In other alternative topologies, graphics subsystem 212 is connected to I/O bridge 207 rather than to memory bridge 205. In still other embodiments, I/O bridge 207 and memory bridge 205 might be integrated into a single chip. The particular components shown herein are optional; for instance, any number of add-in cards or peripheral devices might be supported. In some embodiments, switch 216 is eliminated, and network adapter 218 and add-in cards 220, 221 connect directly to I/O bridge 207.
The connection of GPU 222 to the rest of system 200 may also be varied. In some embodiments, graphics system 212 is implemented as an add-in card that can be inserted into an expansion slot of system 200. In other embodiments, a GPU is integrated on a single chip with a bus bridge, such as memory bridge 205 or I/O bridge 207.
A GPU may be provided with any amount of local graphics memory, including no local memory, and may use local memory and system memory in any combination. For instance, in a unified memory architecture (UMA) embodiment, no dedicated graphics memory device is provided, and the GPU uses system memory exclusively or almost exclusively. In UMA embodiments, the GPU may be integrated into a bus bridge chip or provided as a discrete chip with a high-speed bus (e.g., PCI-E) connecting the GPU to the bridge chip and system memory.
It is also to be understood that any number of GPUs may be included in a system, e.g., by including multiple GPUs on a single graphics card or by connecting multiple graphics cards to bus 213. Multiple GPUs may be operated in parallel to generate images for the same display device or for different display devices.
In addition, GPUs embodying aspects of the present invention may be incorporated into a variety of devices, including general purpose computer systems, video game consoles and other special purpose computer systems, DVD players, handheld devices such as mobile phones or personal digital assistants, and so on.
The contract 400 is generated and sent out soon thereafter, if not immediately. The delay time between when a contract is generated and when it is sent out can be, for example, several clock cycles or as low as one clock cycle. In one embodiment, the time between when contracts are sent out can range between several times per sweep to once every several sweeps. For example, in one embodiment the contract is sent out once every sweep, (i.e. once for each scan across the entire screen). If the sweep rate is between 60 Hz and 90 Hz, then the contract would be sent out about once every 0.011 seconds to about once every 0.017 seconds. Moreover the contract can be sent out prior to the scan reaching the end of the last scan line. For example, a contract might be sent out five or six lines before the last portion of data is sent back from the memory. In another embodiment, if the contract is sent out at the frame rate of 30 Hz, then a contract would be sent out about once every 0.033 seconds. In one embodiment, operation of the display engine begins by first sending a contract and then starting the raster scan of the display device. Since the exact time or point within the sweep at which the contract is sent out can be programmable, those skilled in the art will realize that this time or point can be varied.
In another embodiment, a warning signal indicating that a contract will be sent is transmitted prior to sending a contract. An advantage of sending a warning signal indicating that a contract is imminent is that the system can adjust for this contract. In one embodiment, the warning is done by setting a bit. Warning signals can be used in VGA mode, which can include a split screen mode. In VGA mode, data can be scanned out from two different buffers for two halves of a screen. For example, when the warning signal, which can be interpreted to wait for an amendment, is set in the contract, the isohub does not start on the contract until it gets the amendment which is meant for a specified line on the screen. Once the amendment is received, the isohub fetches data from the first buffer for the top of the screen and then next buffer for the bottom screen.
In some embodiments, contracts are used to coordinate requests from multiple memory clients within the display engine. Multiple clients might be used to generate composite images on the fly.
The computer 602 includes a Central Processing Unit (“CPU”) 608, which is connected to a memory 610. The memory 610 can include, for example, a Random Access Memory (“RAM”) and/or a Read Only Memory (“ROM”). As illustrated in
In the illustrated embodiment, the processing apparatus 612 includes a display engine 614, which includes memory clients 616, 618, and 620. While three memory clients are illustrated in
In the illustrated embodiment, the processing apparatus 612 also includes a memory controller 622, which is connected to the display engine 614 via a memory request generator 624 and a memory request arbiter 626. The memory controller 622 serves as an interface between the memory clients 616, 618, and 620 and the memory 610. In the illustrated embodiment, the memory request generator 624 creates contracts in response to the memory clients 616, 618, 620. The memory request arbiter 626 then issues contracts for the memory request generator 624. In response to these contracts, the memory controller 622 retrieves data from the memory 610 for the memory clients 616, 618, and 620. The operation of the memory request generator 624 and the memory request arbiter 626 is further described below.
As illustrated in
As illustrated in
In the illustrated embodiment, the memory request generator 624 identifies a presentation order of the components of the composite image and provides an indication of this presentation order to the memory request arbiter 626 through a contract. In some implementations, the components of the composite image includes a presentation order, and the memory request generator 624 identifies through a contract a presentation order of these components of the composite image based on, for example, screen locations of these components of the composite image. Using an arbiter to arbitrate between memory requests can be implemented as discussed, for example, in the co-pending and co-owned U.S. patent application Ser. No. 10/961,574, filed on Oct. 8, 2004, titled “Apparatus, System, and Method for Arbitrating Between Memory Requests,” which application is hereby incorporated by reference in its entirety for all purposes.
As illustrated in
Advantageously, the illustrated embodiment allows contracts to be properly prioritized for servicing based on a presentation order of components of the composite image. As described above, at least one of the memory clients 616, 618, and 620 can correspond to an isochronous memory client. By servicing the memory requests based on the presentation order of the components of the composite image, the illustrated embodiment allows timely delivery of data to respective ones of the memory clients 616, 618, and 620 as the components of the composite image are generated, thus avoiding a stall and degradation of a display or an audio output. Accordingly, the illustrated embodiment serves to reduce instances in which a memory request becomes critical, since such a memory request can be prioritized for servicing ahead of other memory requests. In the event a memory request within a contract does become critical, the illustrated embodiment serves to reduce the service time for such a critical memory request, since such a critical memory request will typically be prioritized for servicing ahead of other memory requests. The service time is reduced by generating an amendment in the memory request generator 624 and sending that amendment to arbiter 626 and eventually to memory controller 622 for incorporation into previously sent contracts. Amendments include changes to previously sent contracts and will be incorporated if received in time as was discussed earlier with reference to
Although computer system 600 has been described in terms of displaying components of a composite image, computer system 600 can also be used generate contracts for presentation elements and to display those presentation elements. Presentation elements are a collection of pixels that are being displayed along a scan line and can be portions of an image that are being displayed at any given time.
In
In displaying the presentation elements of
Next in step 812, the memory request generator 624 assigns priorities to the contracts based on a presentation order of the presentation elements. In the illustrated embodiment, the memory request generator 626 identifies a display order of the presentation elements and assigns priorities to the contracts based on this display order. In particular, the memory request generator 624 assigns a higher priority to a contract that specifies a range of data for a presentation element to be displayed earlier in time. On the other hand, the memory request generator 624 assigns a lower priority to a contract that specifies a range of data for a presentation element to be displayed later in time. In the illustrated embodiment, the memory request generator 624 provides an indication of the assigned priorities in the form of one or more tags that are incorporated in the contracts. Alternatively, the memory request generator 624 can provide the indication of the assigned priorities separately from the contracts.
Next in step 814, the memory request arbiter 626 arbitrates between the contracts based on the priorities assigned to the contracts. In the illustrated embodiment, a memory request arbiter 626 issues the contracts to a memory controller 622 based on the indication of the assigned priorities provided by the memory request generator 624. In particular, the memory request arbiter 626 issues a contract earlier in time if that contract is assigned a higher priority. On the other hand, the memory request arbiter 624 issues a contract later in time if that contract is assigned a lower priority. Finally, in step 820 the process ends when the contract is sent to the memory controller for processing.
Additionally, amendments can be used to correct or change contracts after they are sent, e.g. by making changes to parameters in the contract. If a contract having a fixed set of instructions is sent but circumstances change such that the set of instructions should be modified, then an amendment can be sent to modify the contract. Amendments can be used to change parameters such as the location of the buffer from which data is to be fetched. An example of when an amendment can be used is if scanning is done slower than the rendering rate and a user selects a specific portion of the screen for display. If data was originally being fetched from a first buffer according to the first contract but displaying the selected portion of the screen required fetching data from a second buffer, then an amendment could be used to instruct that data be fetched from the second buffer instead of the first buffer.
In some embodiments, the contract has a cutoff point after which amendments to that contract can no longer be sent. In other embodiments amendments can be sent at anytime, and the system dynamically makes adjustments to the contract according to the amendment. However, if the amendment is sent but the next contract is ready, then the amendment is suppressed, and/or the amendment can be made part of the next contract instead. In one embodiment the amendment can be suppressed by the memory controller whereas in another embodiment the amendment can be suppressed by the client.
In another embodiment, a warning signal, similar to that sent for contracts, indicating that an amendment will be sent is transmitted prior to sending an amendment. An advantage of sending a warning signal indicating that an amendment is imminent is that the system can adjust for this amendment. In one embodiment the warning is done by setting a bit. Warning signals can be used in VGA mode, which can include a split screen mode, as described above with reference to
After the priorities are assigned in step 912, the memory request generator 624 makes a decision in step 914 whether there is an amendment to the contract. If the decision in step 914 is that an amendment has been sent then the memory request generator 624 makes another decision in step 916 whether there is sufficient time available to amend the contract. If there is sufficient time to amend the contract then the contract is amended in step 918 and the process proceeds to step 920. The contract can be amended in the memory request generator 624. Details of the amendment and whether there is sufficient time to amend the contract are discussed in detail above with reference to
Next in step 924, the memory request arbiter 626 arbitrates between the contracts based on the priorities assigned to the contracts and the amendments which may have been incorporated in steps 918 and 922. In the illustrated embodiment, a memory request arbiter 626 issues the contracts to a memory controller 622 based on the indication of the assigned priorities provided by the memory request generator 624 in the form of a contract or amendment. In particular, the memory request arbiter 626 issues a contract earlier in time if that contract is assigned a higher priority or amended to have a higher priority. On the other hand, the memory request arbiter 624 issues a contract later in time if that contract is assigned a lower priority or amended to have a lower priority.
While the invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention as defined by the appended claims. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, method, process operation or operations, to the objective, spirit and scope of the invention. All such modifications are intended to be within the scope of the claims appended hereto. In particular, while the methods disclosed herein have been described with reference to particular operations performed in a particular order, it will be understood that these operations may be combined, sub-divided, or re-ordered to form an equivalent method without departing from the teachings of the invention. Accordingly, unless specifically indicated herein, the order and grouping of the operations is not a limitation of the invention.
This application is a continuation of U.S. patent application Ser. No. 11/678,733. filed Feb. 26, 2007, which claims the benefit of U.S. Provisional Appln. No. 60/862,090, filed Oct. 19, 2006. The disclosures of both applications are incorporated herein by reference in their entirety for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
5818433 | Sherburne | Oct 1998 | A |
6690377 | Tuomi | Feb 2004 | B2 |
6897871 | Morein et al. | May 2005 | B1 |
7221369 | Tripathi et al. | May 2007 | B1 |
7415575 | Tong et al. | Aug 2008 | B1 |
7426594 | Riach et al. | Sep 2008 | B1 |
Number | Date | Country | |
---|---|---|---|
20120188261 A1 | Jul 2012 | US |
Number | Date | Country | |
---|---|---|---|
60862090 | Oct 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11678733 | Feb 2007 | US |
Child | 13440380 | US |