The present disclosure relates generally to circuits, and more specifically to graphics processors.
Graphics processors are widely used to render 2-dimensional (2-D) and 3-dimensional (3-D) images for various applications such as video games, graphics, computer-aided design (CAD), simulation and visualization tools, imaging, etc. A 3-D image may be modeled with surfaces, and each surface may be approximated with polygons (typically triangles). The number of triangles used to represent a 3-D image is dependent on the complexity of the surfaces as well as the desired resolution of the image and may be quite large, e.g., in the millions. Each triangle is defined by three vertices, and each vertex is associated with various attributes such as space coordinates, color values, and texture coordinates. Each attribute may have up to four components.
A graphics processor may perform various graphics operations to render an image. The graphics operations may include rasterization, stencil and depth tests, texture mapping, shading, etc. The image is composed of many triangles, and each triangle is composed of picture elements (pixels). The graphics processor renders each triangle by determining the values of the components of each pixel within the triangle.
A graphics processor may employ a shader core to perform certain graphics operations such as shading. Shading is a highly complex graphics operation involving lighting, shadowing, etc. The shader core may need to compute transcendental elementary functions such as sine, cosine, reciprocal, logarithm, exponential, square root, and reciprocal square root. These elementary functions may be approximated with polynomial expressions, which may be evaluated with relatively simple instructions executed by an arithmetic logic unit (ALU). However, shader performance may suffer greatly from computing the elementary functions in this manner using an ALU.
Graphics processors capable of efficiently performing arithmetic operations and computing elementary functions are described herein. The terms “operation” and “function” are sometimes used interchangeably. A graphics processor comprises a shader core and possibly other units. The shader core has at least one ALU that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. In some embodiments, the ALU(s) and elementary function unit(s) are arranged and interconnected such that they can operate in parallel on instructions for the same or different threads to improve throughput. For example, the ALU(s) may execute one instruction for one thread, and the elementary function unit(s) may concurrently execute another instruction for another thread. These threads may be for the same or different graphics applications.
In other embodiments, the shader core has fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) up to four components of an attribute for one pixel or (2) one component of an attribute for up to four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost (since elementary function units are more complex and costly than ALUs) while still providing good performance (since elementary functions have lower average usage than arithmetic operations).
Various aspects and embodiments of the invention are described in further detail below.
The features and nature of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or designs.
A graphics processor 120 receives the threads from graphics applications 110a through 110n and performs the tasks indicated by these threads. In the embodiment shown in
Graphics processor 120 may include other processing and control units, engines, and memories. For example, graphics processor 120 may include one or more additional engines that perform triangle setup, rasterization, stencil and depth tests, attribute setup, pixel interpolation, etc. The various graphics operations described herein are known in the art. The additional engine(s) may be coupled between graphics applications 110 and shader core 130 or may be coupled to shader core 130. Graphics processor 120 may implement a software interface such as Open Graphics Library (OpenGL), Direct3D, etc. OpenGL is described in a document entitled “The OpenGL® Graphics System: A Specification,” Version 2.0, dated Oct. 22, 2004, which is publicly available.
A main memory 160 is a large, slower memory located further away (e.g., off-chip) from graphics processor 120. Main memory 160 stores data and instructions that may be loaded into the caches within cache memory system 150.
In many cases, it is desirable to operate on groups of pixels in an image to be rendered. The group size may be selected based on various factors such as hardware requirements, performance, etc. A group size of 2×2 may provide a good tradeoff between the various factors. Processing on four pixels in a 2×2 grid may be performed in several manners.
For pixel-parallel processing in
Within shader core 130a, a multiplexer (Mux) 410 receives threads from graphics applications 110a through 110n and provides these threads to a thread scheduler and context register 420. Thread scheduler 420 performs various functions to schedule and manage execution of threads. Thread scheduler 420 determines whether to accept new threads, creates a register map table for each accepted thread, and allocates resources to the threads. The register map table indicates mapping between logical register address to physical register file address. For each thread, thread scheduler 420 determines whether resources required for that thread are ready, pushes the thread into a sleep queue if any resource (e.g., instruction, register file, or texture read) for the thread is not ready, and moves the thread from the sleep queue to an active queue when all of the resources are ready. Thread scheduler 420 interfaces with a load control unit 460 in order to synchronize the resources for the threads.
Thread scheduler 420 also manages execution of threads. Thread scheduler 420 fetches the instruction(s) for each thread from an instruction cache 422, decodes each instruction if necessary, and performs flow control for the thread. Thread scheduler 420 selects active threads for execution, checks for read/write port conflict among the selected threads and, if there is no conflict, sends instruction(s) for one thread into a processing core 430 and sends instruction(s) for another thread to load control unit 460. Thread scheduler 420 maintains a program/instruction counter for each thread and updates this counter as instructions are executed or program flow is altered. Thread scheduler 420 also issues requests to fetch for missing instructions and removes threads that are completed.
Instruction cache 422 stores instructions for the threads. These instructions indicate specific operations to be performed for each thread. Each operation may be an arithmetic operation, an elementary function, a memory access operation, etc. Instruction cache 422 may be loaded with instructions from cache memory system 150 and/or main memory 160, as needed, via load control unit 460
In the embodiment shown in
In the embodiment shown in
Load control unit 460 controls the flow of data and instructions for various units within shader core 130a. Load control unit 460 interfaces with cache memory system 150 and loads instruction cache 422, a constant buffer 432, and register file banks/output buffer 470 with data and instructions from cache memory system 150. Load control unit 460 also saves the data in output buffer 470 to cache memory system 150. Load control unit 460 also provides instructions to texture engine 140.
Constant buffer 432 stores constant values used by ALU core 440. Output buffer 470 stores temporary results as well as final results from ALU core 440 and elementary function core 450 for threads. A demultiplexer (Demux) 480 receives the final results for the executed threads from output buffer 470 and provides these results to the graphics applications.
In the embodiment shown in
Elementary function units are generally more complex than ALUs. Even with cost-effective implementations, elementary function units typically occupy much larger circuit area than ALUs and are thus more expensive than ALUs. To achieve high shader throughput for all shader instructions, the number of elementary function units may be selected to match the number of ALUs, which is four in the embodiment shown in
ALU core 540 may be a single quad ALU or four scalar ALUs. ALU core 540 couples to thread scheduler 520, constant buffer 532, and output buffer 570 via one set of buses. Elementary function core 550 may be composed of one, two or three (L) elementary function units that can compute an elementary function for either L components of one pixel or one component of L pixels. Elementary function core 550 couples to thread scheduler 520, constant buffer 532, and output buffer 570 via another set of buses. In the embodiment shown in
In the embodiment shown in
ALU core 640 may be a single quad ALU or four scalar ALUs. ALU core 640 couples to thread scheduler 620, constant buffer 632, and output buffer 670 via a set of buses. Elementary function core 650 may be composed of a single elementary function unit that can compute an elementary function for one component of one pixel at a time. In the embodiment shown in
Instructions for elementary functions (or EF instructions) may be generated in an appropriate manner given the design as well as the placement of elementary function core 650 within shader core 130c. If the number of EF units is equal to the number of ALU units (e.g., as shown in
The shader compiler may include synchronization (sync) bits 810 in instructions 800 as appropriate.
In the embodiment shown in
In general, a shader core may include any number of processing, control and memory units, which may be arranged in any manner. These units may also be referred to by other names. For example, a load control unit may also be called an input/output (I/O) interface unit. In some embodiments, a shader core may include fewer elementary function units than ALUs to reduce cost with little degradation in performance. In other embodiments, a shader core may include separate ALU core and elementary function core that can operate on different instructions for the same or different graphics applications in parallel. The ALUs and elementary function units may be implemented with various designs known in the art. A shader core may also interface with external units via synchronous and/or asynchronous interfaces.
The graphics processors and shader cores described herein may be used for wireless communication, computing, networking, personal electronics, etc. An exemplary use of a graphics processor for wireless communication is described below.
Wireless device 700 is capable of providing bi-directional communication via a receive path and a transmit path. On the receive path, signals transmitted by base stations are received by an antenna 712 and provided to a receiver (RCVR) 714. Receiver 714 conditions and digitizes the received signal and provides samples to a digital section 720 for further processing. On the transmit path, a transmitter (TMTR) 716 receives data to be transmitted from digital section 720, processes and conditions the data, and generates a modulated signal, which is transmitted via antenna 712 to the base stations.
Digital section 720 includes various processing and interface units such as, for example, a modem processor 722, a video processor 724, an application processor 726, a display processor 728, a controller/processor 730, a graphics processor 740, and an external bus interface (EBI) 760. Modem processor 722 performs processing for data transmission and reception (e.g., encoding, modulation, demodulation, and decoding). Video processor 724 performs processing on video content (e.g., still images, moving videos, and moving texts) for video applications such as camcorder, video playback, and video conferencing. Application processor 726 performs processing for various applications such as multi-way calls, web browsing, media player, and user interface. Display processor 728 performs processing to facilitate the display of videos, graphics, and texts on a display unit 780. Controller/processor 730 may direct the operation of various processing and interface units within digital section 720.
Graphics processor 740 performs processing for graphics applications and may be implemented as described above. For example, graphics processor 740 may include shader core/processor 130 and texture engine 140 in
Digital section 720 may be implemented with one or more digital signal processors (DSPs), micro-processors, reduced instruction set computers (RISCs), etc. Digital section 720 may also be fabricated on one or more application specific integrated circuits (ASICs) or some other type of integrated circuits (ICs).
The graphics processors and shader cores/processors described herein may be implemented in various hardware units. For example, the graphics systems and shader cores/processors may be implemented in ASICs, digital signal processors (DSPs), digital signal processing device (DSPDs), programmable logic devices (PLDs), field programmable gate array (FPGAs), processors, controllers, micro-controllers, microprocessors, and other electronic units.
Certain portions of the graphics processors may be implemented in firmware and/or software. For example, the thread scheduler and/or load control unit may be implemented with firmware and/or software modules (e.g., procedures, functions, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory (e.g., memory 750 or 770 in
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Number | Name | Date | Kind |
---|---|---|---|
3469244 | Perotto et al. | Sep 1969 | A |
4079452 | Larson et al. | Mar 1978 | A |
4361868 | Kaplinsky | Nov 1982 | A |
5517611 | Deering | May 1996 | A |
5590326 | Manabe | Dec 1996 | A |
5598546 | Blomgren | Jan 1997 | A |
5777629 | Baldwin | Jul 1998 | A |
5793385 | Nale | Aug 1998 | A |
5794016 | Kelleher | Aug 1998 | A |
5798770 | Baldwin | Aug 1998 | A |
5831640 | Wang et al. | Nov 1998 | A |
5870579 | Tan | Feb 1999 | A |
5872729 | Deolaliker | Feb 1999 | A |
5913059 | Torii | Jun 1999 | A |
5913925 | Kahle et al. | Jun 1999 | A |
5944816 | Dutton et al. | Aug 1999 | A |
5949920 | Jordan et al. | Sep 1999 | A |
5958041 | Petolino et al. | Sep 1999 | A |
5991865 | Longhenry et al. | Nov 1999 | A |
6092175 | Levy et al. | Jul 2000 | A |
6163839 | Janik et al. | Dec 2000 | A |
6188411 | Lai | Feb 2001 | B1 |
6219769 | Strongin et al. | Apr 2001 | B1 |
6226604 | Ehara et al. | May 2001 | B1 |
6279099 | Van Hook et al. | Aug 2001 | B1 |
6466221 | Satoh et al. | Oct 2002 | B1 |
6480941 | Franke et al. | Nov 2002 | B1 |
RE37944 | Fielder et al. | Dec 2002 | E |
6493741 | Emer et al. | Dec 2002 | B1 |
6515443 | Kelly et al. | Feb 2003 | B2 |
6516443 | Zook | Feb 2003 | B1 |
6549209 | Shinohara et al. | Apr 2003 | B1 |
6570570 | Suzuki et al. | May 2003 | B1 |
6574725 | Kranich et al. | Jun 2003 | B1 |
6577762 | Seeger et al. | Jun 2003 | B1 |
6593932 | Porterfield | Jul 2003 | B2 |
6614847 | Das et al. | Sep 2003 | B1 |
6636214 | Leather et al. | Oct 2003 | B1 |
6654428 | Bose et al. | Nov 2003 | B1 |
6693719 | Gupta et al. | Feb 2004 | B1 |
6697063 | Zhu | Feb 2004 | B1 |
6717583 | Shimomura et al. | Apr 2004 | B2 |
6734861 | Van Dyke et al. | May 2004 | B1 |
6744433 | Bastos et al. | Jun 2004 | B1 |
6792575 | Samaniego et al. | Sep 2004 | B1 |
6807620 | Suzuoki et al. | Oct 2004 | B1 |
6825843 | Allen et al. | Nov 2004 | B2 |
6891533 | Alcorn et al. | May 2005 | B1 |
6891544 | Oka et al. | May 2005 | B2 |
6950927 | Apisdorf et al. | Sep 2005 | B1 |
6952213 | Ebihara | Oct 2005 | B2 |
6952440 | Underbrink | Oct 2005 | B1 |
6958718 | Symes et al. | Oct 2005 | B2 |
6964009 | Samaniego et al. | Nov 2005 | B2 |
6972769 | Nebeker et al. | Dec 2005 | B1 |
6999076 | Morein | Feb 2006 | B2 |
7006881 | Hoffberg et al. | Feb 2006 | B1 |
7015913 | Lindholm et al. | Mar 2006 | B1 |
7015914 | Bastos et al. | Mar 2006 | B1 |
7027062 | Lindholm et al. | Apr 2006 | B2 |
7027540 | Wilson et al. | Apr 2006 | B2 |
7030878 | Xu et al. | Apr 2006 | B2 |
7034828 | Drebin et al. | Apr 2006 | B1 |
7068272 | Voorhies et al. | Jun 2006 | B1 |
7088371 | Lippincott | Aug 2006 | B2 |
7098922 | Bastos et al. | Aug 2006 | B1 |
7130443 | Werner et al. | Oct 2006 | B1 |
7145565 | Everitt et al. | Dec 2006 | B2 |
7146486 | Prokopenko et al. | Dec 2006 | B1 |
7174224 | Hudson et al. | Feb 2007 | B2 |
7196708 | Dorojevets et al. | Mar 2007 | B2 |
7239322 | Lefebvre et al. | Jul 2007 | B2 |
7239735 | Nozaki | Jul 2007 | B2 |
7268785 | Glanville et al. | Sep 2007 | B1 |
7339592 | Lindholm et al. | Mar 2008 | B2 |
7358502 | Appleby et al. | Apr 2008 | B1 |
7372484 | Mouli | May 2008 | B2 |
7379067 | Deering et al. | May 2008 | B2 |
7388588 | D'Amora et al. | Jun 2008 | B2 |
7447873 | Nordquist | Nov 2008 | B1 |
7557832 | Lindenstruth et al. | Jul 2009 | B2 |
7574042 | Tsuruoka et al. | Aug 2009 | B2 |
7583294 | Ray et al. | Sep 2009 | B2 |
7612803 | Meitav et al. | Nov 2009 | B2 |
7619775 | Kitamura et al. | Nov 2009 | B2 |
7633506 | Leather et al. | Dec 2009 | B1 |
7673281 | Yamanaka et al. | Mar 2010 | B2 |
7683962 | Border et al. | Mar 2010 | B2 |
7684079 | Takata et al. | Mar 2010 | B2 |
7733392 | Mouli | Jun 2010 | B2 |
7738699 | Tsuruoka et al. | Jun 2010 | B2 |
7808505 | Deering et al. | Oct 2010 | B2 |
7813822 | Hoffberg | Oct 2010 | B1 |
7826092 | Ejima et al. | Nov 2010 | B2 |
7904187 | Hoffberg et al. | Mar 2011 | B2 |
7920204 | Miyanari | Apr 2011 | B2 |
7966078 | Hoffberg et al. | Jun 2011 | B2 |
7987003 | Hoffberg et al. | Jul 2011 | B2 |
8046313 | Hoffberg et al. | Oct 2011 | B2 |
8054573 | Mathew et al. | Nov 2011 | B2 |
8154818 | Mathew et al. | Apr 2012 | B2 |
8165916 | Hoffberg et al. | Apr 2012 | B2 |
20020091915 | Parady | Jul 2002 | A1 |
20030034975 | Lindholm et al. | Feb 2003 | A1 |
20030080959 | Morein | May 2003 | A1 |
20030105793 | Guttag et al. | Jun 2003 | A1 |
20030167379 | Soltis, Jr. | Sep 2003 | A1 |
20030172234 | Soltis, Jr. | Sep 2003 | A1 |
20040030845 | DeLano et al. | Feb 2004 | A1 |
20040119710 | Piazza et al. | Jun 2004 | A1 |
20040130552 | Duluk et al. | Jul 2004 | A1 |
20040172631 | Howard | Sep 2004 | A1 |
20040187119 | Janik et al. | Sep 2004 | A1 |
20040246260 | Kim et al. | Dec 2004 | A1 |
20050090283 | Rodriquez | Apr 2005 | A1 |
20050184994 | Suzuoki et al. | Aug 2005 | A1 |
20050195198 | Anderson et al. | Sep 2005 | A1 |
20050206647 | Xu et al. | Sep 2005 | A1 |
20060004942 | Hetherington et al. | Jan 2006 | A1 |
20060020831 | Golla et al. | Jan 2006 | A1 |
20060028482 | Donovan et al. | Feb 2006 | A1 |
20060033735 | Seiler et al. | Feb 2006 | A1 |
20060066611 | Fujiwara et al. | Mar 2006 | A1 |
20060136919 | Aingaran et al. | Jun 2006 | A1 |
20070030280 | Paltashev et al. | Feb 2007 | A1 |
20070070075 | Hsu | Mar 2007 | A1 |
20070185953 | Prokopenko et al. | Aug 2007 | A1 |
20070236495 | Gruber et al. | Oct 2007 | A1 |
20070252843 | Yu et al. | Nov 2007 | A1 |
20070257905 | French et al. | Nov 2007 | A1 |
20070268289 | Yu et al. | Nov 2007 | A1 |
20070283356 | Du et al. | Dec 2007 | A1 |
20070292047 | Jiao et al. | Dec 2007 | A1 |
20070296729 | Du et al. | Dec 2007 | A1 |
20080074433 | Jiao et al. | Mar 2008 | A1 |
Number | Date | Country |
---|---|---|
0627682 | Dec 1994 | EP |
0676691 | Oct 1995 | EP |
0917056 | May 1999 | EP |
3185521 | Aug 1991 | JP |
9062852 | Mar 1997 | JP |
9231380 | Sep 1997 | JP |
2000057365 | Feb 2000 | JP |
2001222712 | Aug 2001 | JP |
2001236221 | Aug 2001 | JP |
2001357410 | Dec 2001 | JP |
2002269583 | Sep 2002 | JP |
2002529870 | Sep 2002 | JP |
2006099422 | Apr 2006 | JP |
2137186 | Sep 1999 | RU |
2004109122 | Oct 2005 | RU |
I230869 | Apr 2005 | TW |
0028482 | May 2000 | WO |
0215000 | Feb 2002 | WO |
WO2005086090 | Sep 2005 | WO |
Entry |
---|
Wynn, Chris; “nVIDIA OpenGL Vertex Programming on Future-Generation GPUs;” May 8, 2004; NVIDIA Corporation; pp. 1-97. |
Lindholm et al.; “A User-Programmable Vertex Engine;” Aug. 12-17, 2001; ACM SIGGRAPH; pp. 149-158. |
Kilgariff et al.; “Chapter 30, The GeForce 6 Series GPU Architecture;” GPU Gems 2 Copyright 2005, pp. 471-491. |
International Search Report, PCT/US07/069803—International Search Authority—European Patent Office, Dec. 11, 2007. |
Bjorke K: “High quality filtering” Chapter 24 in Book ‘GPU Gems’, [Online] 2004, XP002534488 Retrieved from the Internet: URL:http://http.developer.nvidia.com/GPUGems/gpugems—ch24.html> [retrieved on Jun. 29, 2009]. |
Blamer K et al.: “A Single Chip Multimedia Video Processor,” Custom Integrated Circuits Conference, pp. 91-94, Proceedings of the IEEE (May 1994). |
Segal, M. et al.: “The OpenGL Graphics System: A Specification,” pp. 1-368. Version 2.0 (Oct. 22, 2004). |
Waldspurger et al., Register Relocation: Flexible Contexts for Multithreading, International Symposium on Computer Architecture, Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993. |
Written Opinion, PCT/US07/069803—International Searching Authority—European Patent Office—Dec. 11, 2007. |
Deering M. et al: “The SAGE graphics architecture” Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'02), Jul. 23-26, 2002, San Antonio, Texas, USA, 2002, pp. 683-692, XP002534489. |
Hadwiger M. et al: “Hardware-accelerated high-quality filtering on PC hardware” Proceedings of 2001 Conference on Vision, Modelling and Visualization, Nov. 21-23, 2001, Stuttgart, Germany, [Online] 2001, XP002534490 Retrieved from the Internet: URL:http://wwwvis.informatik.uni-stuttgart.de/vmv01/d1/papers/8.pdf > [retrieved on Jun. 29, 2009]. |
Hopf Mi et al: “Accelerating 3D convolution using graphics hardware” Visualization '99. Proceedings San Francisco, CA, USA Oct. 24-29, 1999. Piscataway, NJ. USA, IEEE, US, Oct. 29, 1999, pp. 471-664, XP031385575 ISBN: 978-0-7803-5897-3. |
Novasad J: “Advanced high quality filtering” Chapter 27 in Book ‘GPU-Gems 2’, [Online]. 2005, XP002534486 Retrieved from the Internet:. URL:http://http.developer.nvidia.com/GPUGe ms2/gpugems2—chapter27.html> [retrieved on Jun. 29, 2009]. |
Owens J.D et al: “A survey of general-purpose computation on graphics hardware” Computer Graphics Forum, vol. 26, No. 1. Mar. 2007, pp. 80-113, XP002534491. |
Sigg C. et al: “Fast third-order texture filtering” Chapter 20 in Book ‘GPU Gems 2’, [Online] 2005. XP002534487 Retrieved from the Internet: URL:http://http.developer.nvidia.com/GPUGe ms2/gpugems2—chapter20.html> [retrieved on Jun. 29, 2009]. |
Akkary, H. and Driscoll, M. A. 1998. A dynamic multithreading processor. In Proceedings of the 31st Annual ACM/IEEE international Symposium on Microarchitecture (Dallas, Texas, United States). International Symposium on Microarchitecture. IEEE Computer So. 1998, pp. 226-236. |
Kenji Watanabe, Wanming Chu, Yamin Li, “Exploiting Java Instruction/Thread Level Parallelism with Horizontal Multithreading,” Australasian Computer Systems Architecture Conference, p. 122, 6th Australasian Computer Systems Architecture Conference (AustCSA.) IEEE 2001, pp. 122-129. |
Translation of Office Action in Japanese application 2009-511215 corresponding to U.S. Appl. No. 11/435,454, citing WO05086090, US20030080959 and JP2001222712 dated Feb. 22, 2011. |
Ying Chen, Resit Sendag, David J. Lilja, “Using Incorrect Speculation to Prefetch Data in a Concurrent Multithreaded Processor,” Parallel and Distributed Processing Symposium, International, p. 76b, International Parallel and Distributed Processing Sympos., IEEE 2003, pp. 1-9. |
Hiroaki Hirata, and 4 others, “An elementary processor Architecture with Parallel Instruction Issuing from Multiple Threads,” Information Processing Society article magazine, Information Processing Society of Japan, 1993, vol. 34, No. 4, pp. 595-605. |
Sohn, et al., “A 155-mW 50-Mvertices/s Graphics Processor With Fixed-Point Programmable Vertex Shader for Mobile Applications,” IEEE Journal of Solid-State Circuits, vol. 41, No. 5, May 2006, pp. 1081-1091. |
Onoue, M., et al., “3D Image Handbook”, 1st ed., Asakura Publishing Co., Ltd. (Kunizou Asakura), Feb. 20, 2006, pp. 152-170. |
Number | Date | Country | |
---|---|---|---|
20070273698 A1 | Nov 2007 | US |