The invention relates generally to graphics processing. More particularly, the invention relates to an apparatus and method for buffering graphics data to support graphics processing operations.
In conventional graphics processing systems, an object to be displayed is typically represented as a set of one or more graphics primitives. Examples of graphics primitives include one-dimensional graphics primitives, such as lines, and two-dimensional graphics primitives, such as polygons. Typically, a graphics primitive is defined by a set of vertices each having a set of vertex attributes. For example, a graphics primitive can be a triangle that is defined by three different vertices, and each of the vertices can have up to 128 different vertex attributes, such as spatial coordinates, color components, fog components, normal vectors, specularity components, and texture coordinates.
Conventional graphics processing systems are typically implemented using a graphics pipeline having multiple pipeline stages. During operation of the graphics pipeline, one pipeline stage can perform a set of graphics processing operations on vertex attributes, and can then issue the vertex attributes for further processing by another pipeline stage. This seemingly straightforward routing of vertex attributes can quickly become complex if various pipeline stages have different processing requirements with respect to the vertex attributes. For example, one pipeline stage can operate on vertex attributes that are in one particular order, while another pipeline stage can operate on the vertex attributes that are in a different order. A further complication can occur if vertex attributes issued by one pipeline stage are stored in a memory, such as a Dynamic Random Access Memory (“DRAM”), pending retrieval of the vertex attributes for further processing by another pipeline stage. In particular, a DRAM typically permits memory access in increments of a particular byte size and for particular ranges of addresses. Unfortunately, vertex attributes as issued by a pipeline stage may not be arranged in a manner that is conducive to efficient memory access and, thus, can require an undesirable number of memory accesses for storage in the DRAM. As a result of these complications, it can be challenging to route vertex attributes while reducing congestion and achieving a desired level of throughput.
It is against this background that a need arose to develop the apparatus and method described herein.
In one aspect, the invention relates to a graphics processing apparatus. In one embodiment, the graphics processing apparatus includes a storage unit and a reorder control unit that is connected to the storage unit. The reorder control unit is configured to coordinate storage of vertex attributes in the storage unit so as to convert the vertex attributes from an initial order to a modified order. The reorder control unit is configured to identify a subset of the vertex attributes to be stored within a common range of addresses in the storage unit, and the reorder control unit is configured to access the storage unit such that the subset of the vertex attributes is written into the storage unit substantially in parallel.
In another embodiment, the graphics processing apparatus includes a buffering unit configured to convert vertex attributes from an initial order to a modified order. The buffering unit includes a storage unit having a memory layout represented as multiple partitions. The buffering unit also includes a reorder control unit that is connected to the storage unit. The reorder control unit is configured to perform a coalesce check to determine whether a subset of the vertex attributes belongs to a common partition among the partitions, and the reorder control unit is configured to issue a single write request to the storage unit if the subset of the vertex attributes passes the coalesce check.
In another aspect, the invention relates to a graphics processing method. In one embodiment, the graphics processing method includes receiving vertex attributes having an initial order and mapping the vertex attributes onto addresses in a storage unit so as to covert the vertex attributes to a modified order. The graphics processing method also includes identifying a subset of the vertex attributes that is mapped onto a common range of addresses in the storage unit. The graphics processing method further includes issuing a single write request to the storage unit such that the subset of the vertex attributes is written into the common range of addresses substantially in parallel.
Advantageously, certain embodiments of the invention operate in accordance with an improved technique for buffering graphics data to support graphics processing operations. In particular, the improved technique allows efficient routing of vertex attributes with reduced congestion and a higher level of throughput.
Other aspects and embodiments of the invention are also contemplated. The foregoing summary and the following detailed description are not meant to restrict the invention to any particular embodiment but are merely meant to describe some embodiments of the invention.
For a better understanding of the nature and objects of some embodiments of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
Like reference numerals are used to refer to corresponding components of the drawings.
The computer 102 includes a Central Processing Unit (“CPU”) 108, which is connected to a system memory 110 over a bus 122. The system memory 110 can be implemented using a Random Access Memory (“RAM”) and a Read-Only Memory (“ROM”). As illustrated in
In the illustrated embodiment, the graphics processing apparatus 112 receives a set of vertices defining an object to be displayed, and the graphics processing apparatus 112 performs a number of graphics processing operations on the vertices. The graphics processing apparatus 112 includes a graphics pipeline 114, which includes a number of graphics processing modules that are connected to one another and that form different pipeline stages. In particular, the graphics pipeline 114 includes a vertex shading module 116, a geometry shading module 118, and a pixel shading module 120, which can be implemented using computational units that execute shading programs. While three graphics processing modules are illustrated in
Still referring to
During operation of the graphics pipeline 114, vertex attributes are routed through the graphics pipeline 114, and are operated upon by successive pipeline stages. In particular, one pipeline stage performs a set of graphics processing operations on the vertex attributes, and then issues the vertex attributes for further processing by another pipeline stage. Different pipeline stages typically have different processing requirements with respect to the vertex attributes. For example, the geometry shading module 118 can operate on the vertex attributes that are in one particular order, while the pixel shading module 120 can operate on the vertex attributes that are in a different order. As another example, the geometry shading module 118 and the pixel shading module 120 can operate on different subsets of the vertex attributes. Ultimately, the pixel shading module 120 produces a set of pixels that represent an object to be displayed, and the pixel shading module 120 then issues the pixels to the local memory 126, which stores the pixels pending display using the display device 106.
When routing vertex attributes through the graphics pipeline 114, it can be desirable to store a copy or some other representation of the vertex attributes in the local memory 126. As illustrated in
In the illustrated embodiment, the graphics pipeline 114 includes a buffering unit 124, which is connected between the geometry shading module 118 and the local memory 126. The buffering unit 124 buffers vertex attributes en route to the local memory 126. As received by the buffering unit 124, vertex attributes are arranged in accordance with processing requirements of the pixel shading module 120. In particular, the vertex attributes are arranged in groups according to a per-vertex arrangement. For example, during one clock cycle, the buffering unit 124 can receive one group of vertex attributes of different types for a particular vertex, and, during another clock cycle, the buffering unit 124 can receive another group of vertex attributes of different types for the same vertex or a different vertex. However, this initial arrangement of the vertex attributes may not be appropriate if the vertex attributes are to be fed back for further processing by the vertex shading module 116 or the geometry shading module 118. Moreover, this initial arrangement of the vertex attributes may not be conducive to efficient memory access to the local memory 126 and, thus, can require an undesirable number of memory accesses for storage in the local memory 126.
Advantageously, the buffering unit 124 coordinates storage of vertex attributes within the buffering unit 124 so as to effectively convert the vertex attributes into a modified arrangement that is appropriate for further processing and that allows efficient memory access to the local memory 126. As further described below, the buffering unit 124 performs efficient reordering of vertex attributes by reducing the number of clock cycles required to accomplish such reordering. Moreover, the buffering unit 124 promotes efficient use of the memory transfer bandwidth of the local memory 126 by coalescing vertex attributes to be stored within a common range of addresses in the local memory 126. Accordingly, the buffering unit 124 allows efficient transfer of vertex attributes to the local memory 126 by reducing the number of clock cycles required to accomplish such transfer. For example, the buffering unit 124 can reorder vertex attributes for different vertices, and the buffering unit 124 can then coalesce the reordered vertex attributes within a single group exiting the buffering unit 124. By operating in such manner, the buffering unit 124 allows routing of vertex attributes between the graphics pipeline 114 and the local memory 126, while reducing congestion and achieving a desired level of throughput.
The foregoing provides an overview of an embodiment of the invention. Attention next turns to
In the illustrated embodiment, the reorder control unit 200 coordinates storage of vertex attributes in the storage unit 206 so as to convert the vertex attributes from an initial order to a modified order. As illustrated in
As illustrated in
During reordering of vertex attributes, the reorder control unit 200 references the reorder table 210 to perform a coalesce check. In accordance with the coalesce check, the reorder control unit 200 determines whether particular ones of the vertex attributes are to be stored within a common range of addresses in the storage unit 206 and, therefore, belong to a common partition of the storage unit 206. If those vertex attributes pass the coalesce check, the reorder control unit 200 accesses the storage unit 206 such that those vertex attributes are written into the common partition substantially in parallel. Through use of the write mask 212, the reorder control unit 200 can issue a single write request to direct storage of those vertex attributes within respective slots of the common partition. On the other hand, if those vertex attributes do not pass the coalesce check, the reorder control unit 200 accesses the storage unit 206 such that those vertex attributes are sequentially written into different partitions of the storage unit 206. In particular, the reorder control unit 200 can issue multiple write requests to direct storage of those vertex attributes within the different partitions. By operating in such manner, the reorder control unit 200 can efficiently reorder vertex attributes with a reduced number of memory accesses to the storage unit 206.
Still referring to
As illustrated in
During packing of vertex attributes, the write control unit 202 references addresses at which the vertex attributes are to be stored in the local memory 126. In accordance with those addresses, the write control unit 202 accesses the storage unit 208 so as to distribute the vertex attributes within appropriate partitions of the storage unit 208. In such manner, the write control unit 202 can coalesce, within a common partition, particular ones of the vertex attributes to be stored within a common range of addresses in the local memory 126. Through use of the write mask 214, the write control unit 202 can direct storage of those vertex attributes within respective portions of the common partition. In some instances, one or more gaps can be present within the common partition, depending on a stride requirement. For other ones of the vertex attributes, the write control unit 202 can access the storage unit 208 such that those vertex attributes are distributed within different partitions of the storage unit 208.
Still referring to
Additional aspects and advantages of the buffering unit 124 can be understood with reference to
Attention first turns to
Attention next turns to
Some embodiments of the invention relate to a computer storage product with a computer-readable medium having instructions or computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (“CD/DVDs”), Compact Disc-Read Only Memories (“CD-ROMs”), and holographic devices; magneto-optical storage media such as floptical disks; carrier wave signals; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (“ASICs”), Programmable Logic Devices (“PLDs”), and ROM and RAM devices. Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, and files containing higher-level instructions that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using Java, C++, or other object-oriented programming language and development tools. Additional examples of computer code include, but are not limited to, control signals, encrypted code, and compressed code.
Some embodiments of the invention can be implemented using computer code in place of, or in combination with, hardwired circuitry. For example, with reference to
While the invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention as defined by the appended claims. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, method, process operation or operations, to the objective, spirit and scope of the invention. All such modifications are intended to be within the scope of the claims appended hereto. In particular, while the methods disclosed herein have been described with reference to particular operations performed in a particular order, it will be understood that these operations may be combined, sub-divided, or reordered to form an equivalent method without departing from the teachings of the invention. Accordingly, unless specifically indicated herein, the order and grouping of the operations is not a limitation of the invention.
Number | Name | Date | Kind |
---|---|---|---|
4727363 | Ishii | Feb 1988 | A |
5550765 | Bhattacharya et al. | Aug 1996 | A |
5701516 | Cheng et al. | Dec 1997 | A |
5712999 | Guttag et al. | Jan 1998 | A |
5778413 | Stevens et al. | Jul 1998 | A |
5790130 | Gannett | Aug 1998 | A |
5835925 | Kessler et al. | Nov 1998 | A |
5883642 | Thomas et al. | Mar 1999 | A |
6002412 | Schinnerer | Dec 1999 | A |
6029226 | Ellis et al. | Feb 2000 | A |
6075544 | Malachowsky et al. | Jun 2000 | A |
6138212 | Chiacchia et al. | Oct 2000 | A |
6275243 | Priem et al. | Aug 2001 | B1 |
6342883 | Kawaoka | Jan 2002 | B1 |
6362825 | Johnson | Mar 2002 | B1 |
6437779 | Saito et al. | Aug 2002 | B1 |
6446183 | Challenger et al. | Sep 2002 | B1 |
6466219 | Shino | Oct 2002 | B1 |
6469704 | Johnson | Oct 2002 | B1 |
6502167 | Tanaka et al. | Dec 2002 | B1 |
6593932 | Porterfield | Jul 2003 | B2 |
6628292 | Ashburn et al. | Sep 2003 | B1 |
6629204 | Tanaka et al. | Sep 2003 | B2 |
6631423 | Brown et al. | Oct 2003 | B1 |
6643754 | Challenger et al. | Nov 2003 | B1 |
6711595 | Anantharao | Mar 2004 | B1 |
6728851 | Estakhri et al. | Apr 2004 | B1 |
6756987 | Goyins et al. | Jun 2004 | B2 |
6784894 | Schimpf et al. | Aug 2004 | B2 |
6819334 | Owada et al. | Nov 2004 | B1 |
6903737 | Knittel | Jun 2005 | B2 |
6947047 | Moy et al. | Sep 2005 | B1 |
7394465 | Toni | Jul 2008 | B2 |
7474313 | Bittel et al. | Jan 2009 | B1 |
7492368 | Nordquist et al. | Feb 2009 | B1 |
7526024 | Kumar et al. | Apr 2009 | B2 |
7557810 | Brown et al. | Jul 2009 | B2 |
7564456 | Lindholm et al. | Jul 2009 | B1 |
20020024522 | Schimpf et al. | Feb 2002 | A1 |
20020038415 | Zandveld et al. | Mar 2002 | A1 |
20020055081 | Hughes et al. | May 2002 | A1 |
20020109690 | Champion et al. | Aug 2002 | A1 |
20030018878 | Dorward et al. | Jan 2003 | A1 |
20030028740 | Challenger et al. | Feb 2003 | A1 |
20040155883 | Ju et al. | Aug 2004 | A1 |
20040168175 | Anantharao | Aug 2004 | A1 |
20040201592 | Huang | Oct 2004 | A1 |
20050036378 | Kawaguchi et al. | Feb 2005 | A1 |
20050055526 | Challenger et al. | Mar 2005 | A1 |
20050132040 | Ellis et al. | Jun 2005 | A1 |
20100271369 | Chang et al. | Oct 2010 | A1 |