3D graphics rendering has been implemented extensively in a variety of hardware (HW) architectures over the past few decades. With the advent of standardized rendering application programming interfaces (APIs) such as OpenGL and more recently DirectX/Direct3D, a similar macro architectural structure has begun to emerge. The details and performance of any particular graphics HW architecture often hinges upon the number of pixel processing pipelines that may be dedicated to this HW architecture, how many stages the various pipelines require, as well as the effectiveness of a variety of cache memories strategically designed throughout the architecture. For instance, some modern graphics architectures include eight or more pixels processing units to handle pixel shading along with two or more cache memories associated with those processing units.
Dependencies between multiple graphics processing pipelines often restrict the overall processing speed of the graphics HW architecture. But such dependencies may also provide opportunities for enhancing processing speed by enabling the recognition of wasteful activities such as the eviction of cache memory contents utilized by one processing pipeline that, as it turns out, will be used by another processing pipeline.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one or more implementations consistent with the principles of the invention and, together with the description, explain such implementations. The drawings are not necessarily to scale, the emphasis instead being placed upon illustrating the principles of the invention. In the drawings,
The following detailed description refers to the accompanying drawings. The same reference numbers may be used in different drawings to identify the same or similar elements. In the following description specific details are set forth such as particular structures, architectures, interfaces, techniques, etc. in order to provide a thorough understanding of the various aspects of the claimed invention. However, such details are provided for purposes of explanation and should not be viewed as limiting. Moreover, it will be apparent to those skilled in the art, having the benefit of the present disclosure, that the various aspects of the invention claimed may be practiced in other examples that depart from these specific details. In certain instances, descriptions of well known devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
System 100 may assume a variety of physical implementations. For example, system 100 may be implemented in a personal computer (PC), a networked PC, a server computing system, a handheld computing platform (e.g., a personal digital assistant (PDA)), a gaming system (portable or otherwise), a 3D capable cell phone, etc. Moreover, while all components of system 100 may be implemented within a single device, such as a system-on-a-chip (SOC) integrated circuit (IC), components of system 100 may also be distributed across multiple ICs or devices. For example, host processor 102 along with components 106, 112, and 114 may be implemented as multiple ICs contained within a single PC while graphics processor 104 and components 108 and 116 may be implemented in a separate device such as a television coupled to host processor 102 and components 106, 112, and 114 through communications pathway 110.
Host processor 102 may comprise a special purpose or a general purpose processor including any processing logic, hardware, software and/or firmware, capable of providing graphics processor 104 with 3D graphics data and/or instructions. Processor 102 may perform a variety of 3D graphics calculations such as 3D coordinate transformations, etc. the results of which may be provided to graphics processor 104 over bus 110 and/or that may be stored in memories 106 and/or 108 for eventual use by processor 104.
In one implementation, host processor 102 may be capable of performing any of a number of tasks that support 3D graphics processing. These tasks may include, for example, although the invention is not limited in this regard, providing 3D scene data to graphics processor 104, downloading microcode to processor 104, initializing and/or configuring registers within processor 104, interrupt servicing, and providing a bus interface for uploading and/or downloading 3D graphics data. In alternate implementations, some or all of these functions may be performed by processor 104. While system 100 shows host processor 102 and graphics processor 104 as distinct-components, the invention is not limited in this regard and those of skill in the art will recognize that processors 102 and 104 possibly in addition to other components of system 100 may be implemented within a single IC where processors 102 and 104 may be distinguished by the respective types of 3D graphics processing that they implement.
Graphics processor 104 may comprise any processing logic, hardware, software, and/or firmware, capable of processing graphics data. In one implementation, graphics processor 104 may implement a 3D graphics hardware architecture capable of processing graphics data in accordance with one or more standardized rendering application programming interfaces (APIs) such as OpenGL and more recently DirectX/Direct3D to name a few examples, although the invention is not limited in this regard. Graphics processor 104 may process 3D graphics data provided by host processor 102, held or stored in memories 106 and/or 108, and/or provided by sources external to system 100 and obtained over bus 110 from interfaces 112 and/or 114. Graphics processor 104 may receive 3D graphics data in the form of 3D scene data and process that data to provide image data in a format suitable for conversion by display processor 116 into display-specific data. In addition, graphics processor 104 may include a variety of 3D graphics processing components such as one or more rasterizers coupled to one or more pixel shaders as will be described in greater detail below.
Bus or communications pathway(s) 110 may comprise any mechanism for conveying information (e.g., graphics data, instructions, etc.) between or amongst any of the elements of system 100. For example, although the invention is not limited in this regard, communications pathway(s) 110 may comprise a multipurpose bus capable of conveying, for example, instructions (e.g., macrocode) between processor 102 and processor 104. Alternatively, pathway(s) 110 may comprise a wireless communications pathway.
Display processor 116 may comprise any processing logic, hardware, software, and/or firmware, capable of converting image data supplied by graphics processor 104 into a format suitable for driving a display (i.e., display-specific data). For example, while the invention is not limited in this regard, processor 104 may provide image data to processor 116 in a specific color data format, for example in a compressed red-green-blue (RGB) format, and processor 116 may process such RGB data by generating, for example, corresponding LCD drive data levels etc. Although
Those skilled in the art will recognize that some components typically found in graphics processors (e.g., tessellation modules, etc.) and not particularly germane to the claimed invention have been excluded from
Rasterizer 208 may be capable of processing pixel fragments provided by triangle setup module 206 to generate image data suitable for processing by display processor 116 (
Rasterizer 208 and/or shader 209 may process pixel fragments in discrete portions and/or “spans” of pixel data (e.g., pixel fragments) provided by triangle setup module 206 and should, as those skilled in the art will recognize, process such spans in the order that they are received from module 206 (i.e., processed in “rendering order”). Moreover, as will be described in greater detail below, Rasterizer 208 and/or shader 209 may process two or more spans more or less concurrently. Pixel shader 209 may comprise any graphics processing logic and/or hardware, software, and/or firmware, capable of using pixel depth and/or pixel color data supplied respectively by cache 210 and/or cache 212 to render and/or process pixel spans.
Those skilled in the art will recognize that, while some spans may take longer to process than other spans, two or more spans that correspond to spatially overlapping portions of a frame buffer (not shown) should be processed in the order received by rasterizer 208 and/or shader 209 to ensure compliance with conventional rendering order constraints (e.g., when alpha blending is enabled). Hence, in accordance with one implementation of the invention, rasterizer 208 may identify and/or recognize one or more rendering order conflicts between two or more spans it is processing and may use that information to control caches 210 and/or 212 so that one or more lines of cache content are retained for use by shader 209 in rendering and/or processing those spans. In other words, as will be described in more detail below, rasterizer 208 may provide one or more indicators caches 210 and/or 212 that may cause the caches to retain at least some of their contents at least temporarily.
Cache 210 may comprise any memory or collection of memories capable of at least storing pixel depth information to be used by rasterizer 208 and/or shader 209. Pixel cache 212 may comprise any memory or collection of memories capable of at least storing pixel color information to be used by rasterizer 208 and/or shader 209. In accordance with an implementation of the invention, caches 210 and/or 212 may respond to one or more indicators and/or control data (e.g., cache line address data) provided by rasterizer 208 over line(s) 214 by holding and/or retaining one or more lines of cache data as will be described in greater detail below. While
Referring to
Referring to
Process 400 may begin with the generation of a first pixel span [act 402]. In one implementation, rasterizer 208 may generate the first pixel span according to conventional procedures. For example, rasterizer 208 may generate the first pixel span using a conventional process of “scan” converting triangle based primitives (specified in “vertice” or “object” space) into spans of discrete pixels (specified in “screen” or “display” space).
Process 400 may continue with the shading and/or rendering of the first span [act 404]. In one implementation, shader 209 may process the first span using one or more of a number of conventional pixel shading techniques, although the invention is not limited in this regard. For example, shader 209 may compare the depth data of the first span as stored in and supplied by depth cache 210 to a depth value stored in a “z buffer” (not shown). In addition, shader 209 may render pixel colors for the span using color information stored in and supplied by pixel cache 212.
Process 400 may continue with the generation of a second pixel fragment span [act 406]. In one implementation, rasterizer 208 may generate the second pixel span in the manner as described above for the first pixel span in act 402. Process 400 may then continue with an assessment of the rendering order of the first and second pixel spans [act 408]. In one implementation, rasterizer 208 may compare the spatial attributes of the first span to those of the second span. In other words, rasterizer 208 may compare the two spans to see whether they correspond or “map” to the same region of screen space (i.e., frame buffer space).
Process 400 may continue with a determination of whether a rendering order conflict exists [act 410]. In one implementation, rasterizer 208 may use the results of act 408 to determine if a rendering order conflict may exist. For example, in one implementation, if the first and second spans map to the same screen space and alpha blending is enabled then a rendering conflict exists and process 400 proceeds to act 412A or to act 412B. Otherwise, if the first and second spans do not map to the same screen space and/or alpha blending is not enabled then a rendering conflict does not exist and process 400 proceeds to act 414.
If a rendering order conflict exists then, in one implementation, one or more cache lines associated with the first span may be held and/or retained and/or locked [act 412A] for use in processing the second span. In one implementation, referring also to
Alternatively, referring to the implementation of
Process 400 may continue with the rendering of the second pixel span [act 414]. In one implementation, shader 208 may render the second pixel span in the manner as described above for the first pixel span in act 404. In accordance with implementations of the invention, shader 208 may render the second pixel span using, at least in part, those contents of caches 210 and/or 212 used to render the first span in act 404 and subsequently retained in either act 412A or act 412B.
Process 400 may conclude with the release [act 416] of any cache lines held and/or retained and/or locked in act 412A. One way to do this is to have rasterizer 208 provide control data via line(s) 214 to buffers 303 and/or 305 directing respective lock modules 304 and/or 306 of caches 210 and/or 212 to unlock the contents of those cache lines (i.e., to subject the content of those cache lines to routine cache retention schemes). One way to do this is to have rasterizer 208 remove from buffers 303 and/or 305 those conflicted cache line addresses supplied in act 412A.
The acts shown in
The foregoing description of one or more implementations consistent with the principles of the invention provides illustration and description, but is not intended to be exhaustive or to limit the scope of the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of various implementations of the invention. For example, while
No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Moreover, when terms such as “coupled” or “responsive” are used herein or in the claims that follow, these terms are meant to be interpreted broadly. For example, the phrase “coupled to” may refer to being communicatively, electrically and/or operatively coupled as appropriate for the context in which the phrase is used. Variations and modifications may be made to the above-described implementation(s) of the claimed invention without departing substantially from the spirit and principles of the invention. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.