Many types of computer systems include display devices to display images, video streams, and data. Accordingly, these systems typically include functionality for generating and/or manipulating images and video information. In digital imaging, the smallest item of information in an image is called a “picture element” and more generally referred to as a “pixel.” To represent a specific color on a typical electronic display, each pixel can have three values, one each for the amounts of red, green, and blue present in the desired color. Some formats for electronic displays may also include a fourth value, called alpha, which represents the transparency of the pixel. This format is commonly referred to as ARGB or RGBA. Another format for representing pixel color is YCbCr, where Y corresponds to the luminance, or brightness, of a pixel and Cb and Cr correspond to two color-difference chrominance components, representing the blue-difference (Cb) and red-difference (Cr).
Display devices are used to view images produced by digital processing devices such as desktop computers, laptop computers, televisions, mobile phones, smart phones, tablet computers, digital cameras, and other devices. A wide variety of technologies including cathode-ray tubes (CRTs), liquid crystal displays (LCDs), plasma display panels, and organic light emitting diodes (OLEDs) are used to implement display devices. Consequently, different display devices are able to represent colors within different gamuts. As used herein, the term “gamut” refers to a complete subset of colors that can be accurately represented by a particular display device.
Furthermore, the same color, as perceived by the human eye, can be represented by different numerical values in different gamuts. For example, the RGB color system is commonly used in computer graphics to represent colors of pixels in images. The same color might be represented by different RGB values in different gamuts. Consequently, gamut mapping is used to map color values between different gamuts so that the perceived colors generated using the color values are the same in different devices. However, gamut mapping typically incurs some amount of interpolation error, and techniques for reducing interpolation error suffer from various shortcomings.
The advantages of the methods and mechanisms described herein may be better understood by referring to the following description in conjunction with the accompanying drawings, in which:
In the following description, numerous specific details are set forth to provide a thorough understanding of the methods and mechanisms presented herein. However, one having ordinary skill in the art should recognize that the various implementations may be practiced without these specific details. In some instances, well-known structures, components, signals, computer program instructions, and techniques have not been shown in detail to avoid obscuring the approaches described herein. It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements.
Various systems, apparatuses, and methods for reducing three dimensional (3D) lookup table (LUT) interpolation error while minimizing on-chip storage are disclosed herein. A display controller receives source pixel data encoded in a first gamut. In one implementation, the display controller uses a 3D LUT to convert the source pixel data from the first gamut to a second gamut associated with a target display. However, due to the finite nature of the storage capacity of the memory structures containing the 3D LUT, interpolation error is introduced when converting from the first gamut to the second gamut. The interpolation error can be reduced by increasing the number of mapping points (i.e., vertices) stored in the 3D LUT, but this comes at a cost of increased on-chip storage.
In various implementations, one of the objectives of the techniques described herein is to reduce interpolation error without significantly increasing the on-chip storage requirements of the 3D LUT. Accordingly, in one implementation, rather than significantly increasing the number of vertices and the size of the LUT, the display controller stores mappings for centroids of the sub-cubes of the 3D representation of the gamut translation space. As used herein, the term “centroid” is defined as the geometric center of a sub-cube or other geometric shape (e.g., tetrahedron or otherwise). For example, the centroid of a cube is the point within the cube that is equidistant from each face of the cube.
In one implementation, when an input pixel is received by the display controller, the display controller identifies which sub-cube contains the input pixel. The mapping of the centroid of the sub-cube is retrieved along with mappings of the corresponding vertices of the sub-cube. The display controller uses the centroid mapping and vertex mappings to convert the input pixel from the first gamut to the second gamut. By using the centroid rather than only vertices of the sub-cube, the interpolation error is reduced when converting the input pixel from the first gamut to the second gamut. In one implementation, the display controller performs tetrahedral interpolation to convert the input pixel from the first gamut to the second gamut using the vertices and centroid of the sub-cube which contains the input pixel. In other implementations, the display controller performs other types of interpolation, including, but not limited to prism interpolation, trilinear interpolation, tricubic interpolation, radial interpolation, or any combination thereof.
In one implementation, rather than adding entries to the lookup table for centroid mappings of all sub-cubes of the 3D-representation of the gamut translation space, the display controller stores centroid mappings for only those sub-cubes which have an interpolation error greater than a threshold. In this way, for pixel locations within sub-cubes that have an interpolation error less than or equal to the threshold, traditional interpolation will be used to convert these pixel locations to the second gamut. Traditional interpolation uses four vertices of the sub-cube to interpolate to a second gamut representation for an interior point that falls within the tetrahedra defined by those four vertices. For pixel locations within sub-cubes that have an interpolation error greater than the threshold, interpolation using the centroid and three vertices of the sub-cube is performed. The smaller size of the tetrahedra bounded by the centroid and three vertices of the sub-cube results in a reduced gamut-conversion error as compared to performing traditional interpolation using four vertices of the sub-cube. This helps to mitigate the gamut-conversion error that is introduced when converting from the first gamut to the second gamut for pixels within sub-cubes that have more than a threshold amount of interpolation error.
Referring now to
In one implementation, processor 105A is a general purpose processor, such as a central processing unit (CPU). In one implementation, processor 105N is a data parallel processor with a highly parallel architecture. Data parallel processors include graphics processing units (GPUs), digital signal processors (DSPs), field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and so forth. In some implementations, processors 105A-N include multiple data parallel processors. In one implementation, processor 105N is a GPU which provides a plurality of pixels to display controller 150 to be driven to display 155. In this implementation, processor 105N can convert pixels of a source frame from a first gamut to a second gamut associated with display 155. Alternatively, in another implementation, display controller 150 converts pixels of the source frame from the first gamut to the second gamut associated with display 155 and may include color assignment, scaling, alpha blending, and/or other functions. In this implementation, display controller 150 includes three-dimensional (3D) lookup table (LUT) 152 and any combination of hardware (e.g., control logic, processing elements) and/or software for converting pixels of the source frame from the first gamut to the second gamut. While 3D LUT 152 is shown as being located within display controller 150, it should be understood that in other implementations, 3D LUT 152 can be located elsewhere in system 100.
In one implementation, 3D LUT 152 includes the components illustrated in the expanded box shown below display controller 150. In other implementations, 3D LUT 152 includes other components in other suitable arrangements. In one implementation, the logic for converting input pixels from a first gamut to a second gamut includes address decoder 165, memory device(s) 180 storing pixel component values in the second gamut, and interpolation unit 190. Address decoder 165 receives the pixel components 160 of an input pixel in the first gamut. In one implementation, pixel components 160 are N-bit red, green, and blue values for the input pixel, where N is a positive integer. Address decoder 165 conveys pixel components 170 to the appropriate memory device(s) 180. In one implementation, pixel components 170 are the M most significant bits (MSBs) of pixel components 160, where M is a positive integer. Pixel components 170 are used to identify the sub-cube or other geometric shape which bounds the input pixel in a 3D representation of the first gamut space. A lookup of memory device(s) 180 is performed using pixel components 170 to retrieve values of pixel components in a second gamut for vertices and one or more interior points 185 of this sub-cube or other geometric shape.
The retrieved pixel component values (in the second gamut) of vertices and interior point(s) 185 are conveyed to interpolation unit 190. Also, address decoder 165 conveys pixel components 175 to interpolation unit 190. In one implementation, pixel components 175 are the (N−M) least significant bits (LSBs) of pixel components 160. Interpolation unit 190 performs interpolation using vertices and interior point(s) 185 and pixel components 175 to generate the pixel components 195 which represent the input pixel in the second gamut. Additional details on the gamut conversion process will be provided throughout the remainder of this disclosure.
Memory controller(s) 130 are representative of any number and type of memory controllers accessible by processors 105A-N and I/O devices (not shown) coupled to I/O interfaces 120. Memory controller(s) 130 are coupled to any number and type of memory devices(s) 140. Memory device(s) 140 are representative of any number and type of memory devices. For example, the type of memory in memory device(s) 140 includes Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), NAND Flash memory, NOR flash memory, Ferroelectric Random Access Memory (FeRAM), or others.
I/O interfaces 120 are representative of any number and type of I/O interfaces (e.g., peripheral component interconnect (PCI) bus, PCI-Extended (PCI-X), PCIE (PCI Express) bus, gigabit Ethernet (GBE) bus, universal serial bus (USB)). Various types of peripheral devices (not shown) are coupled to I/O interfaces 120. Such peripheral devices include (but are not limited to) displays, keyboards, mice, printers, scanners, joysticks or other types of game controllers, media recording devices, external storage devices, network interface cards, and so forth. Network interface 135 is used to receive and send network messages across a network.
In various implementations, computing system 100 is a computer, laptop, mobile device, game console, server, streaming device, wearable device, or any of various other types of computing systems or devices. It is noted that the number of components of computing system 100 varies from implementation to implementation. For example, in other implementations, there are more or fewer of each component than the number shown in
Turning now to
Network 210 is representative of any type of network or combination of networks, including wireless connection, direct local area network (LAN), metropolitan area network (MAN), wide area network (WAN), an Intranet, the Internet, a cable network, a packet-switched network, a fiber-optic network, a router, storage area network, or other type of network. Examples of LANs include Ethernet networks, Fiber Distributed Data Interface (FDDI) networks, and token ring networks. In various implementations, network 210 further includes remote direct memory access (RDMA) hardware and/or software, transmission control protocol/internet protocol (TCP/IP) hardware and/or software, router, repeaters, switches, grids, and/or other components.
Server 205 includes any combination of software and/or hardware for rendering video/image frames and/or encoding the frames into a bitstream. In one implementation, server 205 converts input video/image frames from a first gamut into a second gamut which is associated with display 250. Server 205 includes one or more processors which execute any number of software applications. Server 205 also includes network communication capabilities, one or more input/output devices, and/or other components. The processor(s) of server 205 include any number and type (e.g., graphics processing units (GPUs), CPUs, DSPs, FPGAs, ASICs) of processors. The processor(s) are coupled to one or more memory devices storing program instructions executable by the processor(s). Similarly, client 215 includes any combination of software and/or hardware for decoding a bitstream and driving frames to display 250. In one implementation, client 215 includes one or more software applications executing on one or more processors of one or more computing devices. Client 215 can be a computing device, game console, mobile device, streaming media player, or other type of device.
Referring now to
In various implementations, computing system 300 executes any of various types of software applications. In one implementation, as part of executing a given software application, a host CPU (not shown) of computing system 300 launches kernels to be performed on GPU 305. Command processor 335 receives kernels from the host CPU and issues kernels to dispatch unit 350 for dispatch to compute units 355A-N. Threads within kernels executing on compute units 355A-N read and write data to global data share 370, L1 cache 365, and L2 cache 360 within GPU 305. Although not shown in
Referring now to
In a 3D LUT associated with the lattice shown in
Referring now to
In one implementation, instead of incrementing the value of m and causing the number of vertices per linear dimension to nearly double, a centroid (e.g. centroid 510A of sub-cube 505) can be added to each sub-cube of the 3D LUT. As previously noted, the change from having 17×17×17 vertices in the 3D pixel component space to 33×33×33 vertices increases the number of vertices from 4913 to 35,937. On the other hand, a 17×17×17 3DLUT with entries for 4913 distinct vertices has 16×16×16=4096 sub-cubes. Adding centroids to each sub-cube results in 4913+4096=9009 vertices, which is less than double the original number of vertices In comparison, increasing the vertex density from 17×17×17 to 33×33×33 increases the total number of vertices in the 3D LUT by more than 7 times. Accordingly, adding centroids to each sub-cube can reduce interpolation error while resulting in a smaller increase to the size of the 3D LUT.
Sub-cube 505 shows how the additional centroid 510A is used when performing tetrahedral interpolation. Tetrahedral interpolation involves the look up of the 4 vertices of a tetrahedron for interpolation. For sub-cubes without a centroid, each sub-cube is divided into 6 tetrahedra and 4 vertices of one of the 6 tetrahedra are used for interpolation. When a sub-cube has a centroid, such as sub-cube 505 with centroid 510A, each sub-cube may be divided into 12 tetrahedra and 4 vertices of one of the 12 tetrahedra are used for interpolation. The centroid 510A provides a more accurate interpolation point for interpolation than using four vertices of the sub-cube because the smaller size of the tetrahedra results in a shorter distance from interior points to the vertices. This shorter distance reduces any interpolation error that is generated. Accordingly, using the centroid 510A along with vertices 510B-D when performing interpolation results in a smaller interpolation error as compared to using four vertices of the sub-cube 505.
Turning now to
To convert the source input pixel 605 from the source gamut to the target gamut, the processing unit retrieves mappings for three vertices 601, 602, and 603 and the mapping for centroid 604 from one or more 3D LUTs. The mappings store pixel component values for at least the target gamut. The vertices 601-603 and centroid 604 can also be referred to as the points A, B, C, D, respectively, and the associated pixel component values in the target gamut can be referred to as OA, OB, OC, OD, respectively. Also, the source input pixel 605 can also be referred to as point I.
The interpolated output value for a source pixel that maps to the input point 605 (also referred to as the input point I) is given by:
OI=(VA*OA+VB*OB+VC*OC+VD*OD)/V
where V is the volume of the tetrahedron 600 and Vi(i=A, B, C, D) is the volume for a sub-tetrahedron A, B, C, D, respectively. For example, VD is the volume for a sub-tetrahedron D bounded by the points IABC. The volumes VD and V share the same bottom surface ABC, and so the above equation can be rewritten as:
OI=OA*hA/HA+OB*hB/HB+OC*hC/HC+OD*hD/HD
where Hi(i=A, B, C, D) is the height of the tetrahedron 600 from the vertex i and B, C, D) is the height of the sub-tetrahedron (A, B, C, D) from input point 605. For example, the height 610 is equivalent to HD and the height 615 is equivalent to hD. Output weights are defined as:
wi=(hi*Δ)/Hi
where Δ is the length of a side of the cube. The output value OI can then be written as:
OI=(wA*OA+wB*OB+wC*OC+wD*OD)/Δ
It should be understood that the above is merely one example of a technique for performing tetrahedral interpolation using three vertices and a centroid of a sub-cube to calculate the pixel component values for a source pixel in the target gamut. In other implementations, other interpolation techniques using three vertices and a centroid of a sub-cube can be employed. By using the centroid of the sub-cube rather than a fourth vertex of the sub-cube when performing tetrahedral interpolation, the interpolation error is reduced when calculating the pixel component values in the target gamut.
Referring now to
A display controller receives a source pixel from a source image, where the source image is represented in a first gamut (block 705). Next, the display controller accesses one or more lookup tables to find vertices of a sub-cube which bounds pixel components of the source pixel in a three dimensional (3D) representation of the first gamut (block 710). It is assumed for the purposes of this discussion that the lookup table(s) store a plurality of entries, where each entry includes a mapping from the first gamut to a second gamut, where the second gamut is associated with a display on which the source image will be displayed. In another implementation, the display controller finds the vertices of another type of geometric shape (e.g., tetrahedron, rectangular prism) in block 710 that bounds the pixel components of the source pixel.
Then, the display controller determines whether an interior point of the sub-cube is stored in the lookup table(s) (conditional block 715). In one implementation, the interior point is a centroid of the sub-cube. In other implementations, the interior point is not located at the centroid of the sub-cube. For example, in another implementation, the sub-cube includes multiple interior points stored in the lookup table(s). In this implementation, the display controller selects the interior point which is closest to the source pixel. In a further implementation, the interior point is located on a boundary surface of the sub-cube (or other geometric shape). Accordingly, an interior point can be defined as any point within the geometric shape (e.g., sub-cube) or on a boundary surface of the geometric shape that is not a vertex of the geometric shape.
If an interior point of the sub-cube is stored in the lookup table(s) (conditional block 715, “yes” leg), then the display controller retrieves mappings of the interior point and three corresponding vertices of the sub-cube from the lookup table(s) (block 720). Alternatively, if multiple interior points are stored in the lookup table(s), the display controller could retrieve mappings of two or more interior points and some number of vertices of the sub-cube in block 720. Next, the display controller performs tetrahedral interpolation with the mappings of the interior point and three corresponding vertices of the sub-cube to convert the source pixel to a target pixel in a second gamut, where the second gamut is different from the first gamut (block 725). Then, the display controller provides the target pixel to a display (block 740). Alternatively, the display controller can write the target pixel to a memory device in block 740, or the display controller can convey the target pixel to another functional unit for additional processing in block 740.
If the sub-cube does not have a interior point stored in the lookup table(s) (conditional block 715, “no” leg), then the display controller retrieves mappings of four corresponding vertices of the sub-cube from the lookup table(s) (block 730). Next, the display controller performs tetrahedral interpolation with the mappings of the four corresponding vertices of the sub-cube to convert the source pixel to a target pixel in the second gamut (block 735). Then, the display controller provides the target pixel to a display (block 740). After block 740, method 700 ends. It is noted that method 700 can be performed for each source pixel of the source image.
Turning now to
Then, the processing unit treats the plurality of points that were calculated as vertices in a 3D space (block 815). Next, the processing unit partitions the 3D space based on locations of the vertices (block 820). Then, the processing unit calculates the interpolation error for each sub-cube of a plurality of sub-cubes that are bounded by the plurality of vertices of the 3D space, where the interpolation error is an error in mapping between the first gamut to the second gamut for points within the sub-cube when interpolating from the vertices (block 825). For example, in one implementation, the processing unit calculates the interpolation error at N separate points within the sub-cube when mapping from the first gamut to the second gamut using interpolation from the vertices, where N is a positive integer. The processing unit can then calculate the sum of the interpolation error for the N separate points, calculate the maximum of the interpolation error for the N separate points, or otherwise. Based on the technique utilized, the processing unit can generate a measure or score of the interpolation error for each sub-cube.
Next, the processing unit determines which sub-cubes have greater than a threshold amount of interpolation error (block 830). For example, in one implementation, the processing unit compares the sum of the interpolation error for the N points within each sub-cube to a threshold. In other implementations, the processing unit uses other techniques to determine if the interpolation error of a sub-cube is greater than the threshold. Then, the processing unit calculates mappings from the first gamut to the second gamut for centroids of these identified sub-cubes (block 835). Next, the processing unit stores the centroid mappings in a LUT for use in converting pixel component values from the first gamut to the second gamut (block 840). For example, the centroid mappings can be utilized when performing tetrahedral interpolation to convert pixel component values from the first gamut to the second gamut. After block 840, method 800 ends.
Referring now to
Turning now to
Referring now to
Then, for each sub-cube, the processing unit determines if the measured interpolation error of the sub-cube is greater than a first threshold (conditional block 1110). If the measured interpolation error of the sub-cube is greater than the first threshold (conditional block 1110, “yes” leg), then the processing unit uses a higher resolution setting for the number of mapping points in a 3D lookup table for the sub-cube (block 1115). For example, in one implementation, the higher resolution setting is a highest possible resolution setting, with the highest possible resolution setting being 8×8×8 in one particular implementation. In other implementations, the higher resolution setting can be any of various other values. In one implementation, the processing unit divides a sub-cube into eight or more level 2 sub-cubes in block 1115. In another implementation, the processing unit divides a tetrahedron into four or more tetrahedral in block 1115. In other implementations, the processing unit increases the resolution by other amounts and/or for other types of geometric shapes in block 1115.
If the measured interpolation error of the sub-cube is less than or equal to the first threshold (conditional block 1110, “no” leg), then the processing unit determines if the measured interpolation error of the sub-cube is greater than a second threshold (conditional block 1120). If the measured interpolation error of the sub-cube is greater than the second first threshold (conditional block 1120, “yes” leg), then the processing unit uses a medium resolution setting for the number of mapping points in a 3D lookup table for the sub-cube (block 1125). For example, in one implementation, the medium resolution setting is 5×5×5. In other implementations, the medium resolution setting can be any of various other values. It is assumed for the purposes of this discussion that the medium resolution setting is less than the higher resolution setting. If the measured interpolation error of the sub-cube is less than or equal to the second threshold (conditional block 1120, “no” leg), then the processing unit uses a lower resolution setting for the number of mapping points in a 3D lookup table for the sub-cube (block 1130). For example, in one implementation, the lower resolution setting is 2×2×2. In other implementations, the lower resolution setting can be any of various other values. It is assumed for the purposes of this discussion that the lower resolution setting is less than the medium resolution setting. After blocks 1115, 1125, and 1130, if there are any other sub-cubes for which a resolution setting needs to be calculated (conditional block 1135, “yes” leg), then method 1100 returns to conditional block 1110. Otherwise, if resolution settings have already been determined for all of the sub-cubes (conditional block 1135, “no” leg), then method 1100 ends.
It should be understood that in other implementations, the processing unit can compare the interpolation error to more than two different thresholds. In these implementations, the processing unit can have more than three different resolution settings. Also, in some implementations, rather than comparing the interpolation error of a sub-cube to one or more thresholds, the processing unit uses a formula to convert interpolation error into a resolution setting for the sub-cube. In other words, the processing unit sets a resolution of a number of mapping points in a 3D lookup table for the sub-cube which is proportional to the interpolation error of the sub-cube. Accordingly, the higher the interpolation error is for a given sub-cube, the higher the resolution will be for the given sub-cube.
In various implementations, program instructions of a software application are used to implement the methods and/or mechanisms described herein. For example, program instructions executable by a general or special purpose processor are contemplated. In various implementations, such program instructions are represented by a high level programming language. In other implementations, the program instructions are compiled from a high level programming language to a binary, intermediate, or other form. Alternatively, program instructions are written that describe the behavior or design of hardware. Such program instructions are represented by a high-level programming language, such as C. Alternatively, a hardware design language (HDL) such as Verilog is used. In various implementations, the program instructions are stored on any of a variety of non-transitory computer readable storage mediums. The storage medium is accessible by a computing system during use to provide the program instructions to the computing system for program execution. Generally speaking, such a computing system includes at least one or more memories and one or more processors configured to execute program instructions.
It should be emphasized that the above-described implementations are only non-limiting examples of implementations. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
This application is a continuation of U.S. patent application Ser. No. 16/289,260, now U.S. Pat. No. 11,100,889, entitled “REDUCING 3D LOOKUP TABLE INTERPOLATION ERROR WHILE MINIMIZING ON-CHIP STORAGE”, filed Feb. 28, 2019, the entirety of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
10152772 | Fainstain | Dec 2018 | B2 |
10992938 | Saeedi et al. | Apr 2021 | B2 |
11100889 | Lee et al. | Aug 2021 | B2 |
20040096104 | Terekhov | May 2004 | A1 |
20050047504 | Sung et al. | Mar 2005 | A1 |
20070041026 | Tin | Feb 2007 | A1 |
20070046691 | Presley et al. | Mar 2007 | A1 |
20100128976 | Stauder et al. | May 2010 | A1 |
20140269919 | Rodriguez | Sep 2014 | A1 |
20140376624 | Li et al. | Dec 2014 | A1 |
20150055706 | Xu et al. | Feb 2015 | A1 |
20150256850 | Kottke et al. | Sep 2015 | A1 |
20170236306 | Kuchnio | Aug 2017 | A1 |
20180109804 | Saeedi | Apr 2018 | A1 |
20190027082 | Van Belle | Jan 2019 | A1 |
20190045210 | Guermazi et al. | Feb 2019 | A1 |
Entry |
---|
“Co-occurrence matrix”, Wikipedia.org, Sep. 7, 2016, 2 pages, https://en.wikipedia.org/wiki/Co-occurrence_matrix. [Retrieved Jul. 31, 2018]. |
International Search Report and Written Opinion in International Application No. PCT/IB2019/057945, mailed Dec. 9, 2019, 8 pages. |
Number | Date | Country | |
---|---|---|---|
20210383772 A1 | Dec 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16289260 | Feb 2019 | US |
Child | 17407981 | US |