This application claims the benefit under 35 U.S.C. § 119 from Korean Patent Application No. 2004-91939, filed on Nov. 11, 2004 in Korean Intellectual Property Office, the entire disclosure of which is hereby incorporated by reference.
1. Field of the Invention
The present invention relates generally to a computer graphics system. In particular, the present invention relates to a graphics system and a memory device for effectively processing three-dimensional (3D) graphic compressed texture data in mobile phone applications, and to a method for 3D graphics processing.
2. Description of the Related Art
3D graphics processing is broken up into two major stages: geometry processing and rasterization. In geometry processing, the vertices that make up polygons of graphic forms, such as triangles, are transformed according to a viewing point. The color is computed for each vertex according to a predetermined lighting model. Rasterization is the process of converting the geometry-processed triangles into final pixels and carrying out texture mapping, depth comparison, and alpha-blending on the pixels.
3D graphics processing is composed, at least in part, of many independent operations. One conventional technique for performing these operations in parallel is pipelining. According to the technique of pipelining, individual processors are serially connected. After a series of operations for one data, a first processor provides the processed data to a second processor responsible for other operations. At the same time, the first processor performs the operations on another data. A 3D graphics system is built with pipelines for texture mapping, depth comparison, and alpha-blending, to thereby improve processing efficiency.
A 3D graphics accelerator co-developed by SUN™ and Mitsubishi™ uses a 3D random access memory (RAM) which is a graphics memory with a Z-test pipeline and an alpha-blending pipeline built therein. In the 3D graphics accelerator, depth comparison and alpha-blending are carried out in the 3D RAM, not in a 3D graphics processor. Without the 3D RAM, the depth comparison and the alpha-blending require a read-modify-write operation, whereas with the 3D RAM, a write-only operation suffices. Therefore, the use of the 3D RAM reduces a bandwidth requirement between a graphics processor and a frame buffer, and increases performance.
A conventional fast memory, synchronous dynamic RAM (SDRAM) is suitable for consecutive read and write operations for one block of burst data, while conventional 3D RAM uses an internal cache and a pre-fetch technique in order to improve performance through processing of successive pixels. Therefore, the use of 3D RAM requires separately procured hardware, complicates control, and causes performance degradation due to a cache miss.
Another drawback with 3D RAM is that, although 3D RAM is designed to store frame data in pixels and process depth comparison and alpha-blending effectively, text storing or a stencil buffer are neglected in the configuration of 3D RAM. At the time when 3D RAM was developed, a dedicated memory system was generally used in which a frame buffer and a texture memory were separately procured. Developments in memory technology have enabled most of the current graphics memory systems to use a unified memory system in which a texture memory, a stencil memory, and a frame buffer exist together to store data associated with graphics processing. In this context, if a memory having 3D RAM functionality is designed with the current memory technology, a texture memory and a frame buffer must reside in a single chip. However, because 3D RAM operates very differently with the texture memory, an effective architecture is difficult to realize.
An exemplary object of the present invention is to address at least the above problems and/or disadvantages. Accordingly, an exemplary object of the present invention is to provide a 3D graphics processing method and apparatus, and a method for 3D graphics processing, for rapidly performing depth comparison and alpha-blending on burst data of consecutive pixels.
Another exemplary object of the present invention is to provide a graphics DRAM structure for providing a unified memory system in which frame data and texture data reside in the same memory space, and an operation method thereof.
The above exemplary objects of the present invention are achieved by providing a graphics system and a memory device for 3D graphics acceleration, and a method for 3D graphics processing.
According to an exemplary aspect of the present invention, in a memory device in a graphics system for 3D graphics processing, a memory structure includes a first memory area allocated to a texture buffer for storing texture data, and a second memory area allocated to a frame buffer for storing frame data in pixels. A comparator controls the memory structure to operate as the texture buffer if an input address to the memory structure indicates the first memory area and controls the memory structure to operate as the frame buffer if the input address indicates the second memory area. If the memory structure operates as the frame buffer, an arithmetic-logic unit (ALU) performs depth comparison or alpha-blending on input frame data and frame data read from the frame buffer.
According to another exemplary aspect of the present invention, in a graphics system for 3D graphics processing, a graphics processor receives fragment information for processing a 3D object and performs texture mapping on the fragment information. At least one pair of memory devices store texture data referenced for the texture mapping, storing frame data in pixels, and perform depth comparison and alpha-blending on the frame data.
The above and other exemplary objects, features and advantages of the exemplary embodiments of the present invention will become more apparent from the following detailed description of the exemplary embodiments, taken in conjunction with the accompanying drawings in which like reference symbols indicate the same or similar components, wherein:
Exemplary embodiments of the present invention will be described herein below with reference to the accompanying drawings. In the following description, well-known functions or constructions are not described in detail for conciseness.
Referring to
Referring to
Referring to
The graphics system 26 having the above-described configuration is enabled in response to a command from the CPU 22 via the system bus 20. The media processor 30 interprets the command and interfaces between the CPU 22 and the graphics system 26. The media processor 30 can also perform typical processing on graphics data, such as transformation and lighting. Programs and data for the media processor 30 are stored in, for example, a Direct Rambus (DR) DRAM 32.
The hardware accelerator 34 receives the graphics data from the media processor 30 and performs a number of functions on the graphics data, including rasterization, 3D texturing, pixel transfers, imaging, fragment processing, clipping, depth cueing, transparency processing, and rendering. The hardware accelerator 34 reads/writes graphics data from/to the frame buffer 38 and reads texel data from the text buffer 36. A texel refers to a smallest graphic unit in a texture mapping image of a 3D object.
For one of the 3D graphics processes, namely rasterization, the hardware accelerator 34 is configured in a pipeline structure. Thus, it includes a texture mapping pipeline, a Z-test pipeline, and an alpha-blending pipeline.
Referring to
A texture mapping pipeline 110 reads four or eight texels 142 for corresponding texture coordinates from the graphics memory 140 (step 112) and performs texture filtering and blending on the texels (step 114). As noted above, a texel refers to a smallest graphic unit in a texture mapping image of a 3D object.
The resulting texel is blended with a pixel color set in the fragment information, thereby producing an alpha value. An alpha test (step 116) is performed by comparing the alpha value of a given pixel with a reference alpha value. The comparison can be made based on many criteria. For example, if the pixel alpha value is higher than the reference alpha value, the alpha test passes. According to another example, if the pixel alpha value is lower than the reference alpha value, the alpha test passes. The alpha test is carried out fragment by fragment. Therefore, if all pixels associated with the fragment information pass the alpha test, the procedure goes to the next pipeline step 120. If the alpha test fails, the fragment is dropped out from the pipeline.
The depth comparison and alpha-blending follow the texture mapping pipeline 110.
In the Z-test pipeline 120, a Z-value 144 is read from the graphics memory 140 (step 122) and compared with that of the current fragment in a depth test or a Z-test (step 124). The Z-test 124 can be carried out in different ways. For example, if the Z-value 144 is greater than, less than, equal to or greater than, or equal to or less than that of the current fragment, the Z-test 124 passes.
If the Z-test 124 fails, that is, if the current fragment is obscured by the previously drawn pixel, the current fragment is removed from the pipeline 120. Otherwise, the Z-value 146 of the current fragment is written in the depth buffer of the graphics memory 140 (step 126).
In the alpha-blending pipeline 130, a color value 148 is read from the graphics memory 140 (step 132) and alpha-blended with the result of texture blending (step 134). The final color value 150 is written into the color buffer of the graphics memory 140 (step 150). The alpha-blending includes combining the color value RGBA of the current fragment with the read color value RGBA.
As described above, the pipelines for graphics processing access the buffers of the graphics memory 140, that is, the texture buffer and the frame buffer with the depth buffer and the color buffer.
Referring to
A write bus 217 and a read bus 218 have a capacity to transfer the entire pixels of one block of a predetermined size. They transfer pixel data between the caches 212a to 212d and a 2K-bit static RAM (SRAM) pixel cache 215 that can store the burst pixel data of a plurality of blocks. The pixel cache 215 can be configured as a level-1 cache memory that stores one block of pixel data in each cache tag entry, unlike the caches 212a to 212d. Each pixel block in the pixel cache 215 corresponds to the data stored in one DRAM bank. The pixel cache 215 has a dedicated port for connection to an arithmetic-logic unit (ALU) 216 as well as two ports for input/output from/to the caches 212a to 212d. The pixel cache 215 functions to match the different speeds of the fast operating ALU 216 and the DRAM banks 211.
The ALU 216 receives inbound pixel data from an external circuit outside the 3D RAM 210 as one operand. It fetches another operand from the pixel cache 215. The ALU 216 is implemented with many mathematical functions needed for data combining or blending. In particular, the ALU 216 renders the 3D RAM 210 to perform write-only operations instead of read-modify-write operations in Z-test or alpha-blending.
The 3D RAM 210 is further provided with two video buffers/shifter registers 213a and 213b. The buffer/shifter registers buffer parallel inputs from each of the DRAM banks and convert them to a serial output to a multiplexer (MUX) 214. The MUX 214 multiplexes the serial pixel streams received from the shift registers into image output.
Referring to
A new-Z value 240 and a new-RGBA value 242 are generated in a 3D graphics processor (not shown) and provided to a 3D RAM for Z 210a and a 3D RAM for color 210b in synchronization to a 100-MHz read-only clock signal. In the 3D RAM for Z 210a, the comparator 224 compares the new-Z value 240 with a Z-value read from the depth buffer 220 via the pixel cache 222 and provides the depth comparison result to the 3D RAM for color 210b via a pass_out pin 244 and a pass_in pin 246. If the z-test passes, the new-Z value 240 is written into the depth buffer 220 via the pixel cache 222.
In the 3D RAM for color 210b, the blender 236 alpha-blends the new-RGBA value 242 with a color value read from the color buffer 230 via the pixel cache 234. The final color value is written into the color buffer 230 via the pixel cache 234. Upon completion of graphics processing of one block of burst pixel data, the pixel value written in the color buffer 230 is provided to a RAM digital-to-analog converter (RAMDAC) 42 via the video buffer 323.
Referring to
The row decoder 322 receives a row address and activates the memory area of the DRAM 320 corresponding to the row address. The column decoder 324 receives a column address and activates a bit position corresponding to the column address in the DRAM 320. The pre-fetch 328 reads data from the DRAM 320 in each address cycle and provides the data to the output buffer 332, so that data can be accessed several times faster than the clock speed of the DRAM 320. In the illustrated exemplary memory structure, burst pixel data is read and written alternately, thereby obviating the need for a cache memory.
A texture buffer and a frame buffer may reside in different memory areas on the same chip in the DRAM 320. The comparator 326 determines whether an input address refers to frame data or texture data by checking the input address provided to the row decoder 322. For example, in the case where the texture data is allocated to an upper memory area in the DRAM 320, if predetermined upper bits of the input address are all 0s, the comparator 326 determines that the input address refers to texture data, and the DRAM 320 allows the 3D graphics processor to read the texture data. On the other hand, if the input address refers to frame data for depth comparison and alpha-blending, the ALU 310 performs depth comparison and alpha-blending.
A graphics system according to an exemplary embodiment of the present invention can be configured with a plurality of DDR SDRAMs illustrated in
Referring to
A 3D graphics processor 350 provides 256-bit pixel data including four pairs of a 32-bit Z-value and a 32-bit color value to the depth buffers 310a and the color buffers 310b in the eight memory chips 300a to 300h. The memory chips 300a to 300h can receive the next 256 bits directly without the suspension of pipeline operation. Upon input of fragment information with Z-values and color values, the 3D graphics processor 350 reads texture data from the texture buffers 320c and 320f of the memory chips 300a to 300h and performs texture mapping on the color values using the texture data.
The depth comparison result of the ALU 310a in the memory chip for Z 300a is output to the memory chip 300b via a pass_out pin. The memory chip for color 300b receives the depth comparison result via a pass_in pin and performs alpha-blending on a 32-bit color value read from the color buffer 320d.
To be more specific, the ALU 310a in the memory chip for Z 300a compares an input 32-bit Z-value with a 32-bit Z-value read from the depth buffer 320a. If the Z-test passes, the input Z-value is written in the depth buffer 320a and a pass signal is output through the pass_out pin. If the Z-test fails, a failure signal is output through the pass_out pin. The pass_out pin is connected to the pass_in pin of the memory chip for color 300b.
The ALU 310b in the memory chip for color 300b alpha-blends an input 32-bit color value with a 32-bit color value read from the color buffer 320d. If the pass_in signal indicates pass, the ALU 310b stores the alpha-blended value in the color buffer 320d. If the pass_in signal indicates fail, the ALU 310b discards the alpha-blended value.
Since the Z-test and alpha-bending are performed on burst data, the speed of externally input data can be matched to a memory reference. Therefore, the ALUs 310a and 310b can operate without the suspension of pipeline operation.
For example, assuming that burst data requires depth comparison and alpha-blending taking processing time k and a setup latency needed to write after reading the burst data is m cycles, each pipeline stage needs (k+m) time for processing. Because the pipeline operation proceeds for the next pixel data for the m cycles, the latency m does not cause the suspension of the pipeline operation. That is, a 32-bit pixel value is output from one pipeline stage (k+m) cycles later and the writing operation of the burst data immediately follows. Therefore, no more than (2k+m) cycles are required for depth comparison and alpha-blending of one burst data.
In accordance with exemplary embodiments of the present invention as described above, because a frame memory and a texture memory reside in one address space, a cost-effective, efficient unified memory system can be realized. That is, since, for example, burst data with a plurality of pixels are subject to depth comparison and alpha-bending at one time, exemplary implementations of the present invention are suitable for fast DRAM technology. In addition, according to an exemplary implementation of the present invention an internal cache is not needed, thereby reducing hardware and improving performance.
While only a few exemplary implementations of the present invention have been shown and described with reference to certain embodiments thereof, it will be understood by those skilled in the art that various changes and modification may be made without departing from the spirit and scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
91939/2004 | Nov 2004 | KR | national |