1. Field of the Invention
This invention is related to the field of graphical information processing, more particularly, to image dithering.
2. Description of the Related Art
Part of the operation of many computer systems, including portable digital devices such as mobile phones, notebook computers and the like is the use of some type of display device, such as a liquid crystal display (LCD), to display images, video information/streams, and data. Accordingly, these systems typically incorporate functionality for generating images and data, including video information, which are subsequently output to the display device. Such devices typically include video graphics circuitry to process images and video information for subsequent display.
In digital imaging, the smallest item of information in an image is called a “picture element”, more generally referred to as a “pixel”. For convenience, pixels are generally arranged in a regular two-dimensional grid. By using this arrangement, many common operations can be implemented by uniformly applying the same operation to each pixel independently. Since each pixel is an elemental part of a digital image, a greater number of pixels can provide a more accurate representation of the digital image.
The intensity of each pixel can vary, and in color systems each pixel has typically three or four components such as red, green, blue, and black.
Most images and video information displayed on display devices such as LCD screens are interpreted as a succession of image frames, or frames for short. While generally a frame is one of the many still images that make up a complete moving picture or video stream, a frame can also be interpreted more broadly as simply a still image displayed on a digital (discrete, or progressive scan) display. A frame typically consists of a specified number of pixels according to the resolution of the image/video frame. Information associated with a frame typically consists of color values for every pixel to be displayed on the screen. Color values are commonly stored in 1-bit monochrome, 4-bit palletized, 8-bit palletized, 16-bit high color and 24-bit true color formats. An additional alpha channel is oftentimes used to retain information about pixel transparency. The color values can represent information corresponding to any one of a number of color spaces.
Under certain circumstances, the total number of colors that a given system is able to generate or manage within the given color space—in which graphics processing takes place—may be limited. In such cases, a technique called dithering is used to create the illusion of color depth in the images that have a limited color palette. In a dithered image, colors that are not available are approximated by a diffusion of colored pixels from within the available colors. Dithering in image and video processing is also used to prevent large-scale patterns, including stepwise rendering of smooth gradations in brightness or hue in the image/video frames, by intentionally applying a form of noise to randomize quantization error.
Systems that feature a display device, such as an LCD screen or other type of display, also typically feature a Display Controller to control the timing of the signals, including video synchronization signals that are provided—from a graphics processing unit, for example—to be displayed. Some Display Controllers are divided into multiple functional stages, for example an interface to receive the pixels from the source (e.g. from the graphics processing unit), and a port control unit to provide the appropriate signals to a display port physically coupling to the display. In some cases, additional functional or logic blocks are instantiated within the Display Controller between the interface and the port control unit. It is important for all components, including the additional functional/logic blocks within the Display Controller to communicate seamlessly and efficiently with each other.
Other corresponding issues related to the prior art will become apparent to one skilled in the art after comparing such prior art with the present invention as described herein.
Dithering in image and video processing typically includes intentionally applying a form of noise to randomize quantization error, for example to prevent large-scale patterns, including stepwise rendering of smooth gradations in brightness or hue in the image/video frames. Modules processing pixels of video and/or image frames may therefore be injected with dither-noise during processing of the pixels. In one set of embodiments, dithering may be performed within a Display Controller on received pixels that are to be provided to a display port by the Display Controller, to display the (dithered) pixels on a graphics display.
The Display Controller may include an RGB (Red, Green, Blue) Interface module and a Display Port module that both use a “receiver is the master”, or pull interface, in which the data receiving module pops pixels from the data sourcing module, and generates the HSync, VSync, and VBI timing signals. A Dither module may be instantiated between the RGB Interface module and Display Port module to perform dithering. In some embodiments, the Dither module may use a “source is the master”, or push interface, in which the data-sourcing module issues data signals and data valid signals. Traditionally, to instantiate a push interface in the middle of a pull interface, a FIFO buffer with large storage capacity has to be incorporated into the design. In order to avoid having to use a large storage capacity FIFO, a clock gating scheme may be implemented with the Dither module to allow the data signals and data valid signals to properly interface with the RBG Interface module and Display Port module, and provide data flow.
One embodiment of a Display Controller may include a Dither module, and may also include a Prefetch module that generates pixel request signals to prefetch N cycles worth of data to the Dither module, where N is the pipeline depth of the Dither module (i.e., N corresponds to the number of stages in the Dither module). The Prefetch module may also control a Clock Gating module included in the Display Controller. Accordingly, data is provided through the RGB Interface module to the Dither module until the N stages within the Dither module are filled. Once the Dither module pipeline is full, the Prefetch module gates the Dither module via the Clock Gating module until the Display Port module issues a pixel request. When the Display Port module issues a pixel request, the pixel request results in the next pixel being popped from the Dither module to the Display Port module, while a next pixel is also popped into the Dither module through the RGB Interface module. When the Display Port module is requesting the last N pixels, a Mask module ensures that the pixel request results in a next pixel being popped from Dither module, but not from the RGB Interface module. This allows an uninterrupted flow when the Display Port module starts requesting pixels, without requiring a large FIFO to store all the dithered pixels.
The following detailed description makes reference to the accompanying drawings, which are now briefly described.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.
Various units, circuits, or other components may be described as “configured to” perform a task or tasks. In such contexts, “configured to” is a broad recitation of structure generally meaning “having circuitry that” performs the task or tasks during operation. As such, the unit/circuit/component can be configured to perform the task even when the unit/circuit/component is not currently on. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits and/or memory storing program instructions executable to implement the operation. The memory can include volatile memory such as static or dynamic random access memory and/or nonvolatile memory such as optical or magnetic disk storage, flash memory, programmable read-only memories, etc. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.” Reciting a unit/circuit/component that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. §112, paragraph six interpretation for that unit/circuit/component.
Turning now to
System 100 illustrated in
In one embodiment, each port 108A-108E may be associated with a particular type of traffic. For example, in one embodiment, the traffic types may include RT traffic, Non-RT (NRT) traffic, and graphics traffic. Other embodiments may include other traffic types in addition to, instead of, or in addition to a subset of the above traffic types. Each type of traffic may be characterized differently (e.g. in terms of requirements and behavior), and memory controller 106 may handle the traffic types differently to provide higher performance based on the characteristics. For example, RT traffic requires servicing of each memory operation within a specific amount of time. If the latency of the operation exceeds the specific amount of time, erroneous operation may occur in RT peripherals 128. For example, image data may be lost in image processor 136 or the displayed image on the displays to which display pipes 134 are coupled may visually distort. RT traffic may be characterized as isochronous, for example. On the other hand, graphics traffic may be relatively high bandwidth, but not latency-sensitive. NRT traffic, such as from processors 116, is more latency-sensitive for performance reasons but survives higher latency. That is, NRT traffic may generally be serviced at any latency without causing erroneous operation in the devices generating the NRT traffic. Similarly, the less latency-sensitive but higher bandwidth graphics traffic may be generally serviced at any latency. Other NRT traffic may include audio traffic, which is relatively low bandwidth and generally may be serviced with reasonable latency. Most peripheral traffic may also be NRT (e.g. traffic to storage devices such as magnetic, optical, or solid state storage). By providing ports 108A-108E associated with different traffic types, memory controller 106 may be exposed to the different traffic types in parallel.
As mentioned above, RT peripherals 128 may include image processor 136 and display pipes 134. Display pipes 134 may include circuitry to fetch one or more image frames and to blend the frames to create a display image. Display pipes 134 may further include one or more video pipelines, and video frames may be blended with (relatively) static image frames to create frames for display at the video frame rate. The output of display pipes 134 may be a stream of pixels to be displayed on a display screen. The pixel values may be transmitted to a Display Controller for display on the display screen. Image processor 136 may receive camera data and process the data to an image to be stored in memory.
Both the display pipes 134 and image processor 136 may operate in virtual address space, and thus may use translations to generate physical addresses for the memory operations to read or write memory. Image processor 136 may have a somewhat random-access memory pattern, and may thus rely on translation unit 132 for translation. Translation unit 132 may employ a translation look-aside buffer (TLB) that caches each translation for a period of time based on how frequently the translation is used with respect to other cached translations. For example, the TLB may employ a set associative or fully associative construction, and a least recently used (LRU)-type algorithm may be used to rank recency of use of the translations among the translations in a set (or across the TLB in fully associative configurations). LRU-type algorithms may include, for example, true LRU, pseudo-LRU, most recently used (MRU), etc. Additionally, a fairly large TLB may be implemented to reduce the effects of capacity misses in the TLB.
The access patterns of display pipes 134, on the other hand, may be fairly regular. For example, image data for each source image may be stored in consecutive memory locations in the virtual address space. Thus, display pipes 134 may begin processing source image data from a virtual page, and subsequent virtual pages may be consecutive to the virtual page. That is, the virtual page numbers may be in numerical order, increasing or decreasing by one from page to page as the image data is fetched. Similarly, the translations may be consecutive to one another in a given page table in memory (e.g. consecutive entries in the page table may translate virtual page numbers that are numerically one greater than or less than each other). While more than one page table may be used in some embodiments, and thus the last entry of the page table may not be consecutive to the first entry of the next page table, most translations may be consecutive in the page tables. Viewed in another way, the virtual pages storing the image data may be adjacent to each other in the virtual address space. That is, there may be no intervening pages between the adjacent virtual pages in the virtual address space.
Display pipes 134 may implement translation units that prefetch translations in advance of the display pipes' reads of image data. The prefetch may be initiated when the processing of a source image is to start, and the translation unit may prefetch enough consecutive translations to fill a translation memory in the translation unit. The fetch circuitry in the display pipes may inform the translation unit as the processing of data in virtual pages is completed, and the translation unit may invalidate the corresponding translation, and prefetch additional translations. Accordingly, once the initial prefetching is complete, the translation for each virtual page may frequently be available in the translation unit as display pipes 134 begin fetching from that virtual page. Additionally, competition for translation unit 132 from display pipes 134 may be eliminated in favor of the prefetching translation units. Since translation units 132 in display pipes 134 fetch translations for a set of contiguous virtual pages, they may be referred to as “streaming translation units.”
In general, display pipes 134 may include one or more user interface units that are configured to fetch relatively static frames. That is, the source image in a static frame is not part of a video sequence. While the static frame may be changed, it is not changing according to a video frame rate corresponding to a video sequence. Display pipes 134 may further include one or more video pipelines configured to fetch video frames. These various pipelines (e.g. the user interface units and video pipelines) may be generally referred to as “image processing pipelines.”
Returning to the memory controller 106, generally a port may be a communication point on memory controller 106 to communicate with one or more sources. In some cases, the port may be dedicated to a source (e.g. ports 108A-108B may be dedicated to the graphics controllers 112A-112B, respectively). In other cases, the port may be shared among multiple sources (e.g. processors 116 may share CPU port 108C, NRT peripherals 120 may share NRT port 108D, and RT peripherals 128 such as display pipes 134 and image processor 136 may share RT port 108E. A port may be coupled to a single interface to communicate with the one or more sources. Thus, when sources share an interface, there may be an arbiter on the sources' side of the interface to select between the sources. For example, L2 cache 118 may serve as an arbiter for CPU port 108C to memory controller 106. Port arbiter 130 may serve as an arbiter for RT port 108E, and a similar port arbiter (not shown) may be an arbiter for NRT port 108D. The single source on a port or the combination of sources on a port may be referred to as an agent. Each port 108A-108E is coupled to an interface to communicate with its respective agent. The interface may be any type of communication medium (e.g. a bus, a point-to-point interconnect, etc.) and may implement any protocol. In some embodiments, ports 108A-108E may all implement the same interface and protocol. In other embodiments, different ports may implement different interfaces and/or protocols. In still other embodiments, memory controller 106 may be single ported.
In an embodiment, each source may assign a quality of service (QoS) parameter to each memory operation transmitted by that source. The QoS parameter may identify a requested level of service for the memory operation. Memory operations with QoS parameter values requesting higher levels of service may be given preference over memory operations requesting lower levels of service. Each memory operation may include a flow ID (FID). The FID may identify a memory operation as being part of a flow of memory operations. A flow of memory operations may generally be related, whereas memory operations from different flows, even if from the same source, may not be related. A portion of the FID (e.g. a source field) may identify the source, and the remainder of the FID may identify the flow (e.g. a flow field). Thus, an FID may be similar to a transaction ID, and some sources may simply transmit a transaction ID as an FID. In such a case, the source field of the transaction ID may be the source field of the FID and the sequence number (that identifies the transaction among transactions from the same source) of the transaction ID may be the flow field of the FID. In some embodiments, different traffic types may have different definitions of QoS parameters. That is, the different traffic types may have different sets of QoS parameters.
Memory controller 106 may be configured to process the QoS parameters received on each port 108A-108E and may use the relative QoS parameter values to schedule memory operations received on the ports with respect to other memory operations from that port and with respect to other memory operations received on other ports. More specifically, memory controller 106 may be configured to compare QoS parameters that are drawn from different sets of QoS parameters (e.g. RT QoS parameters and NRT QoS parameters) and may be configured to make scheduling decisions based on the QoS parameters.
In some embodiments, memory controller 106 may be configured to upgrade QoS levels for pending memory operations. Various upgrade mechanism may be supported. For example, the memory controller 106 may be configured to upgrade the QoS level for pending memory operations of a flow responsive to receiving another memory operation from the same flow that has a QoS parameter specifying a higher QoS level. This form of QoS upgrade may be referred to as in-band upgrade, since the QoS parameters transmitted using the normal memory operation transmission method also serve as an implicit upgrade request for memory operations in the same flow. The memory controller 106 may be configured to push pending memory operations from the same port or source, but not the same flow, as a newly received memory operation specifying a higher QoS level. As another example, memory controller 106 may be configured to couple to a sideband interface from one or more agents, and may upgrade QoS levels responsive to receiving an upgrade request on the sideband interface. In another example, memory controller 106 may be configured to track the relative age of the pending memory operations. Memory controller 106 may be configured to upgrade the QoS level of aged memory operations at certain ages. The ages at which upgrade occurs may depend on the current QoS parameter of the aged memory operation.
Memory controller 106 may be configured to determine the memory channel addressed by each memory operation received on the ports, and may be configured to transmit the memory operations to memory 102A-102B on the corresponding channel. The number of channels and the mapping of addresses to channels may vary in various embodiments and may be programmable in the memory controller. Memory controller 106 may use the QoS parameters of the memory operations mapped to the same channel to determine an order of memory operations transmitted into the channel.
Processors 116 may implement any instruction set architecture, and may be configured to execute instructions defined in that instruction set architecture. For example, processors 116 may employ any microarchitecture, including but not limited to scalar, superscalar, pipelined, superpipelined, out of order, in order, speculative, non-speculative, etc., or combinations thereof. Processors 116 may include circuitry, and optionally may implement microcoding techniques, and may include one or more level 1 caches, making cache 118 an L2 cache. Other embodiments may include multiple levels of caches in processors 116, and cache 118 may be the next level down in the hierarchy. Cache 118 may employ any size and any configuration (set associative, direct mapped, etc.).
Graphics controllers 112A-112B may be any graphics processing circuitry. Generally, graphics controllers 112A-112B may be configured to render objects to be displayed, into a frame buffer. Graphics controllers 112A-112B may include graphics processors that may execute graphics software to perform a part or all of the graphics operation, and/or hardware acceleration of certain graphics operations. The amount of hardware acceleration and software implementation may vary from embodiment to embodiment.
NRT peripherals 120 may include any non-real time peripherals that, for performance and/or bandwidth reasons, are provided independent access to memory 102A-102B. That is, access by NRT peripherals 120 is independent of CPU block 114, and may proceed in parallel with memory operations of CPU block 114. Other peripherals such as peripheral 126 and/or peripherals coupled to a peripheral interface controlled by peripheral interface controller 122 may also be non-real time peripherals, but may not require independent access to memory. Various embodiments of NRT peripherals 120 may include video encoders and decoders, scaler/rotator circuitry, image compression/decompression circuitry, etc.
Bridge/DMA controller 124 may comprise circuitry to bridge peripheral(s) 126 and peripheral interface controller(s) 122 to the memory space. In the illustrated embodiment, bridge/DMA controller 124 may bridge the memory operations from the peripherals/peripheral interface controllers through CPU block 114 to memory controller 106. CPU block 114 may also maintain coherence between the bridged memory operations and memory operations from processors 116/L2 Cache 118. L2 cache 118 may also arbitrate the bridged memory operations with memory operations from processors 116 to be transmitted on the CPU interface to CPU port 108C. Bridge/DMA controller 124 may also provide DMA operation on behalf of peripherals 126 and peripheral interface controllers 122 to transfer blocks of data to and from memory. More particularly, the DMA controller may be configured to perform transfers to and from memory 102A-102B through memory controller 106 on behalf of peripherals 126 and peripheral interface controllers 122. The DMA controller may be programmable by processors 116 to perform the DMA operations. For example, the DMA controller may be programmable via descriptors, which may be data structures stored in memory 102A-102B to describe DMA transfers (e.g. source and destination addresses, size, etc.). Alternatively, the DMA controller may be programmable via registers in the DMA controller (not shown).
Peripherals 126 may include any desired input/output devices or other hardware devices that are included on IC 101. For example, peripherals 126 may include networking peripherals such as one or more networking media access controllers (MAC) such as an Ethernet MAC or a wireless fidelity (WiFi) controller. An audio unit including various audio processing devices may be included in peripherals 126. Peripherals 126 may include one or more digital signal processors, and any other desired functional components such as timers, an on-chip secrets memory, an encryption engine, etc., or any combination thereof.
Peripheral interface controllers 122 may include any controllers for any type of peripheral interface. For example, peripheral interface controllers 122 may include various interface controllers such as a universal serial bus (USB) controller, a peripheral component interconnect express (PCIe) controller, a flash memory interface, general purpose input/output (I/O) pins, etc.
Memories 102A-102B may be any type of memory, such as dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM (including mobile versions of the SDRAMs such as mDDR3, etc., and/or low power versions of the SDRAMs such as LPDDR2, etc.), RAMBUS DRAM (RDRAM), static RAM (SRAM), etc. One or more memory devices may be coupled onto a circuit board to form memory modules such as single inline memory modules (SIMMs), dual inline memory modules (DIMMs), etc. Alternatively, the devices may be mounted with IC 101 in a chip-on-chip configuration, a package-on-package configuration, or a multi-chip module configuration.
Memory PHYs 104A-104B may handle the low-level physical interface to memory 102A-102B. For example, memory PHYs 104A-104B may be responsible for the timing of the signals, for proper clocking to synchronous DRAM memory, etc. In one embodiment, memory PHYs 104A-104B may be configured to lock to a clock supplied within IC 101 and may be configured to generate a clock used by memory 102A and/or memory 102B.
It is noted that other embodiments may include other combinations of components, including subsets or supersets of the components shown in
Turning now to
Overall, the operation of display pipe 212 may be characterized as fetching data from memory, processing that data, then presenting it to an external Display Controller 224 through a buffer 222, which may be an asynchronous FIFO (First-In-First-Out) buffer. The Display Controller 224 may control the timing of the display through a Vertical Blanking Interval (VBI) signal that may be activated at the beginning of each vertical blanking interval. This signal may cause display pipe 212 to initialize (Restart) and start (Go) the processing for a frame (more specifically, for the pixels within the frame). Between initializing and starting, configuration parameters unique to that frame may be modified. Any parameters not modified may retain their value from the previous frame. As the pixels are processed and put into output FIFO 222, the Display Controller may issue signals (referred to as pop signals) to remove the pixels at the Display Controller's clock frequency (indicated as VCLK in
Blend unit 218 may be situated at the backend of display pipe 212 as shown in
Display Controller's clock frequency, designated as VCLK (see also
In one set of embodiments, dithering may also be performed within Display Controller 224. One example of the signal connectivity of a Dither module that performs dithering is shown in
In one set of embodiments, Prefetch module 506 may generate pixel request signals to prefetch N cycles worth of data to the Dither module 502. N is a non-zero integer number corresponding to the pipeline depth of the Dither module 502 (i.e., N corresponds to the number of stages in the Dither module 502). As shown in
Once the Display Port module 304 has issued a pixel request via ‘DP PIX REQ’, Prefetch module 506 stops gating Dither module 502, and the next pixel is popped from Dither module 502 to the Display Port module 304 in through ‘DP DATA’ from ‘DATA OUT’. That is, pixel data is output through ‘DATA OUT’ by
Dither module 502, and is received by Display Port module 304 through ‘DP DATA’. At the same time, a next pixel is also popped into Dither module 502 through the RGB Interface module 302 via ‘RGB DATA’, with the corresponding ‘VALID IN’ signal expected by Dither module 502 provided via ‘RGB PIX REQ’. When Display Port module 304 is requesting the last N pixels, Mask module 504 ensures that the pixel request ‘DP PIX REQ’ is masked, and thus no more pixels are popped from the RGB Interface module 302 into Dither module 502, while a next pixel is being popped from Dither module 502 into Display Port module 304. This allows an uninterrupted flow when Display Port module 304 starts requesting pixels, without requiring a large FIFO to store all the dithered pixels.
As also shown in timing diagram 600, ‘RGB PIX REQ’ is deasserted before ‘DP PIX REQ’ is deasserted, to account for Dither Pipeline Depth. That is, during time period 804—which corresponds to the time period during which Display Port module 304 requests the final N pixels—while ‘DP PIX REQ’ remains asserted, Prefetch module 506 uses REQ MASK 504 to mask ‘DP PIX REQ’ to deassert ‘RGB PIX REQ’. This results in pixels no longer being popped into Dither module 502, while the clock gating of Dither module 502 is still lifted, and the final pixels are provided to Display Port module 304 from Dither module 502, as indicated by the data appearing on the ‘DP DATA’ line until the end of time period 804. Subsequently, Prefetch module 506 may again issue pixel requests at a later time to prefetch pixels to Dither module 502 in the manner described above.
Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.