1. Field of the Invention
The present invention relates generally to display systems and, more specifically, to efficient autostereo (autostereoscopic) support using display controller windows.
2. Description of the Related Art
Autostereoscopy is a method of displaying stereoscopic images (e.g., adding binocular perception of three-dimensional (3D) depth) without the use of special headgear or glasses on the part of the viewer. In contrast, monoscopic images are perceived by a viewer as being two-dimensional (2D). Because headgear is not required, autostereoscopy is also called “glasses-free 3D” or “glassesless 3D”. There are two broad approaches currently used to accommodate motion parallax and wider viewing angles: (1) eye-tracking and (2) multiple views so that the display does not need to sense where the viewers' eyes are located.
Examples of autostereoscopic displays technology include lenticular lens, parallax barrier, volumetric display, holographic and light field displays. Most flat-panel solutions employ parallax barriers or lenticular lenses that redirect imagery to several viewing regions. When the viewer's head is in a certain position, a different image is seen with each eye, giving a convincing illusion of 3D. Such displays can have multiple viewing zones, thereby allowing multiple users to view the image at the same time.
Autostereoscopy can achieve a 3D effect by performing interleaving operations on images that are to be displayed. Autostereoscopic images (a.k.a., “glassesless stereoscopic images” or “glassesless 3D images”) may be interleaved by using various formats. Example formats for interleaving autostereoscopic images include row interleave, column interleave, checkerboard interleave, and sub-pixel interleave. For such interleaving format, software instructs a rendering engine to render images separately for a left frame (e.g., frame for left eye) and a right frame (e.g., frame for right eye). The software then instructs the rendering engine to send the separate frames to different memory surfaces in a memory.
In a conventional system, software uses an alternative engine (e.g., 3D engine, 2D engine, etc.) to fetch the left frame and the right frame surface from the memory, to pack the fetched frames into a corresponding autostereoscopic image format, and then to write the fetched frames back to the memory. For example, in row interleaved autostereo, software has alternate left/right rows in the final autostereoscopic image written to the memory. Eventually, the display fetches the generated autostereoscopic image from memory and then scans out the autostereoscopic image on the display screen (e.g., display panel) for viewing.
Unfortunately, since software instructs the generation of the autostereoscopic image to be handled by a different unit than the original rendering engine, the scanning of the autostereoscopic image requires an additional memory pass (e.g., both an additional read from memory and an additional write to memory). The additional memory pass slows down the system according to a memory bandwidth or a memory input/output (I/O) power overhead. For example, a 1920 pixels×1200 pixels display at 60 frames/second at 4 bits per pixel×2 instructions (read and write)=1.105 gigabits pixels/second or about 99 mill watts of memory I/O power overhead (assuming 110 mW/GBps). Thus, the additional read and write instructions that are required by such a display system, which is managed by software, add a significant amount of operational latency.
Accordingly, what is needed is an approach for carrying out autostereoscopic operations for a display in a more efficient manner.
One implementation of the present approach includes a display controller for controlling a display screen of a display system. In one example, the display controller includes the following hardware components: an image receiver configured to receive image data from a source, wherein the image data includes a first image and a second image; a first window controller coupled to the image receiver and configured to receive the first image from the image receiver and to scale the first image according to parameters of the display screen in order to generate a scaled first image; a second window controller coupled to the image receiver and configured to receive the second image from the image receiver and to scale the second image according to the parameters of the display screen in order to generate a scaled second image; and a blender component coupled to the first and second window controllers and configured to interleave the scaled first image with the scaled second image in order to generate a stereoscopic composited image, wherein the blender component is further configured to scan out the stereoscopic composited image to the display screen without accessing a memory that stores additional data associate with the stereoscopic composited image.
The present approach provides advantages because the display system is configured with hardware components that save the display system from having to perform an additional memory pass before scanning the composited image to the display screen. Accordingly, the display system reduces the corresponding memory bandwidth issues and/or the memory input/output (I/O) power overhead issues that are suffered by conventional systems. Also, because the display system performs fewer passes to memory, the display system consumes less power. Accordingly, where the display system is powered by a battery, the display system draws less battery power and thereby enables the battery charge period to be extended. By using hardware components, the display controller natively supports interleaving images of two hardware window controllers to generate a stereoscopic composited image. The display controller also supports blending the stereoscopic composited image with a monoscopic image and/or with a pre-composited image.
So that the manner in which the above recited features of the invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
In the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the present invention.
Among other things, embodiments of the present invention are directed towards a display controller for controlling a display screen of a display system. The display controller includes an image receiver configured to receive image data from a source, wherein the image data includes a first image and a second image. The display controller includes a first window controller coupled to the image receiver and configured to receive the first image from the image receiver and to scale the first image according to parameters of the display screen in order to generate a scaled first image. The display controller includes a second window controller coupled to the image receiver and configured to receive the second image from the image receiver and to scale the second image according to the parameters of the display screen in order to generate a scaled second image. The display controller includes a blender component coupled to the first and second window controllers and configured to interleave the scaled first image with the scaled second image in order to generate a stereoscopic composited image. The blender component is further configured to scan out the stereoscopic composited image to the display screen before obtaining additional data associate with the image data.
The display system 100 includes a central processing unit (CPU) 102 and a system memory 104 that includes a device driver 103. CPU 102 and system memory 104 communicate via an interconnection path that may include a memory bridge 105. Memory bridge 105, which may be, for example, a Northbridge chip, is connected via a bus or other communication path 106 (e.g., a HyperTransport link, etc.) to an input/output (I/O) bridge 107. I/O bridge 107, which may be, for example, a Southbridge chip, receives user input from one or more user input devices 108 (e.g., touch screen, cursor pad, keyboard, mouse, etc.) and forwards the input to CPU 102 via path 106 and memory bridge 105. A parallel processing subsystem 112 is coupled to memory bridge 105 via a bus or other communication path 113 (e.g., peripheral component interconnect (PCI) express, Accelerated Graphics Port (AGP), and/or HyperTransport link, etc.). In one implementation, parallel processing subsystem 112 is a graphics subsystem that delivers pixels to a display screen 111 (e.g., a conventional cathode ray tube (CRT) and/or liquid crystal display (LCD) based monitor, etc.). A system disk 114 is also connected to I/O bridge 107. A switch 116 provides connections between I/O bridge 107 and other components such as a network adapter 118 and various add-in cards 120 and 121. Other components (not explicitly shown), including universal serial bus (USB) and/or other port connections, compact disc (CD) drives, digital video disc (DVD) drives, film recording devices, and the like, may also be connected to I/O bridge 107. Communication paths interconnecting the various components in
As further described below with reference to
In one implementation, the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit (GPU). In another implementation, the parallel processing subsystem 112 incorporates circuitry optimized for general purpose processing, while preserving the underlying computational architecture, described in greater detail herein. In yet another implementation, the parallel processing subsystem 112 may be integrated with one or more other system elements, such as the memory bridge 105, CPU 102, and I/O bridge 107 to form a system-on-chip (SoC).
It will be appreciated that the system shown herein is illustrative and that variations and modifications are possible. The connection topology, including the number and arrangement of bridges, the number of CPUs 102, and the number of parallel processing subsystems 112, may be modified as desired. For instance, in some implementations, system memory 104 is connected to CPU 102 directly rather than through a bridge, and other devices communicate with system memory 104 via memory bridge 105 and CPU 102. In other alternative topologies, parallel processing subsystem 112 is connected to I/O bridge 107 or directly to CPU 102, rather than to memory bridge 105. In still other implementations, I/O bridge 107 and memory bridge 105 might be integrated into a single chip. Large implementations may include two or more CPUs 102 and two or more parallel processing systems 112. The particular components shown herein are optional; for instance, any number of add-in cards or peripheral devices might be supported. In some implementations, switch 116 is eliminated, and network adapter 118 and add-in cards 120, 121 connect directly to I/O bridge 107.
Referring again to
In operation, CPU 102 is the master processor of the display system 100, controlling and coordinating operations of other system components. In particular, CPU 102 issues commands that control the operation of PPUs 202. In some implementations, CPU 102 writes a stream of commands for each PPU 202 to a pushbuffer (not explicitly shown in either
Referring back now to
In one implementation, communication path 113 is a PCIe link, in which dedicated lanes are allocated to each PPU 202, as is known in the art. Other communication paths may also be used. As mentioned above, a contraflow interconnect may also be used to implement the communication path 113, as well as any other communication path within the display system 100, CPU 102, or PPU 202. An I/O unit 205 generates packets (or other signals) for transmission on communication path 113 and also receives all incoming packets (or other signals) from communication path 113, directing the incoming packets to appropriate components of PPU 202. For example, commands related to processing tasks may be directed to a host interface 206, while commands related to memory operations (e.g., reading from or writing to parallel processing memory 204) may be directed to a memory crossbar unit 210. Host interface 206 reads each pushbuffer and outputs the work specified by the pushbuffer to a front end 212.
Each PPU 202 advantageously implements a highly parallel processing architecture. As shown in detail, PPU 202(0) includes an arithmetic subsystem 230 that includes a number C of general processing clusters (GPCs) 208, where C≧1. Each GPC 208 is capable of executing a large number (e.g., hundreds or thousands) of threads concurrently, where each thread is an instance of a program. In various applications, different GPCs 208 may be allocated for processing different types of programs or for performing different types of computations. The allocation of GPCs 208 may vary dependent on the workload arising for each type of program or computation.
GPCs 208 receive processing tasks to be executed via a work distribution unit 200, which receives commands defining processing tasks from front end unit 212. Front end 212 ensures that GPCs 208 are configured to a valid state before the processing specified by the pushbuffers is initiated.
When PPU 202 is used for graphics processing, for example, the processing workload for operation can be divided into approximately equal sized tasks to enable distribution of the operations to multiple GPCs 208. A work distribution unit 200 may be configured to produce tasks at a frequency capable of providing tasks to multiple GPCs 208 for processing. In one implementation, the work distribution unit 200 can produce tasks fast enough to simultaneously maintain busy multiple GPCs 208. By contrast, in conventional systems, processing is typically performed by a single processing engine, while the other processing engines remain idle, waiting for the single processing engine to complete tasks before beginning their processing tasks. In some implementations of the present invention, portions of GPCs 208 are configured to perform different types of processing. For example, a first portion may be configured to perform vertex shading and topology generation. A second portion may be configured to perform tessellation and geometry shading. A third portion may be configured to perform pixel shading in screen space to produce a rendered image. Intermediate data produced by GPCs 208 may be stored in buffers to enable the intermediate data to be transmitted between GPCs 208 for further processing.
Memory interface 214 includes a number D of partition units 215 that are each directly coupled to a portion of parallel processing memory 204, where D 1. As shown, the number of partition units 215 generally equals the number of DRAM 220. In other implementations, the number of partition units 215 may not equal the number of memory devices. Dynamic random access memories (DRAMs) 220 may be replaced by other suitable storage devices and can be of generally conventional design. Render targets, such as frame buffers or texture maps may be stored across DRAMs 220, enabling partition units 215 to write portions of each render target in parallel to efficiently use the available bandwidth of parallel processing memory 204.
Any one of GPCs 208 may process data to be written to any of the DRAMs 220 within parallel processing memory 204. Crossbar unit 210 is configured to route the output of each GPC 208 to the input of any partition unit 215 or to another GPC 208 for further processing. GPCs 208 communicate with memory interface 214 through crossbar unit 210 to read from or write to various external memory devices. In one implementation, crossbar unit 210 has a connection to memory interface 214 to communicate with I/O unit 205, as well as a connection to local parallel processing memory 204, thereby enabling the processing cores within the different GPCs 208 to communicate with system memory 104 or other memory that is not local to PPU 202. In the implementation shown in
Again, GPCs 208 can be programmed to execute processing tasks relating to a wide variety of applications, including but not limited to, linear and nonlinear data transforms, filtering of video and/or audio data, modeling operations (e.g., applying laws of physics to determine position, velocity and other attributes of objects), image rendering operations (e.g., tessellation shader, vertex shader, geometry shader, and/or pixel shader programs), and so on. PPUs 202 may transfer data from system memory 104 and/or local parallel processing memories 204 into internal (on-chip) memory, process the data, and write result data back to system memory 104 and/or local parallel processing memories 204, where such data can be accessed by other system components, including CPU 102 or another parallel processing subsystem 112.
A PPU 202 may be provided with any amount of local parallel processing memory 204, including no local memory, and may use local memory and system memory in any combination. For instance, a PPU 202 can be a graphics processor in a unified memory architecture (UMA) implementation. In such implementations, little or no dedicated graphics (parallel processing) memory would be provided, and PPU 202 would use system memory exclusively or almost exclusively. In UMA implementations, a PPU 202 may be integrated into a bridge chip or processor chip or provided as a discrete chip with a high-speed link (e.g., PCIe) connecting the PPU 202 to system memory via a bridge chip or other communication means.
As noted above, any number of PPUs 202 can be included in a parallel processing subsystem 112. For instance, multiple PPUs 202 can be provided on a single add-in card, or multiple add-in cards can be connected to communication path 113, or one or more of PPUs 202 can be integrated into a bridge chip. PPUs 202 in a multi-PPU system may be identical to or different from one another. For instance, different PPUs 202 might have different numbers of processing cores, different amounts of local parallel processing memory, and so on. Where multiple PPUs 202 are present, those PPUs may be operated in parallel to process data at a higher throughput than is possible with a single PPU 202. Systems incorporating one or more PPUs 202 may be implemented in a variety of configurations and form factors, including desktop, laptop, or handheld personal computers, servers, workstations, game consoles, embedded systems, and the like.
The display controller 305 is one implementation of the parallel processing subsystem 112 of
The image receiver 310 of
A “stereoscopic” (stereo) image includes an image that has a binocular perception of three-dimensional (3D) depth without the use of special headgear or glasses on the part of a viewer. When a viewer normally looks at objects in real life (not on a display screen) the viewer's two eyes see slightly different images because the two eyes are located at different viewpoints. The viewer's brain puts the images together to generate a stereoscopic viewpoint. Likewise, a stereoscopic image on a display screen is based on two independent channels, for example, the left input field and the right input field of the blender component 325. To achieve a 3D depth perception, a left image and a right image that are fed into the left input field and the right input field, respectively, of the blender component 325 are similar but not exactly the same. The blender component 325 uses the two input fields to receive the two slightly different images and to scan out a stereoscopic image that provides the viewer with a visual sense of depth.
In contrast, a “monoscopic” (mono) image includes an image that is perceived by a viewer as being two-dimensional (2D). A monoscopic image has two related channels that are identical or at least intended to be identical. To achieve a 2D depth perception, the left image and the right image fed into the blender component 325 are the same or at least intended to be the same. The blender component 325 uses the two fields to receive the two same images to give the viewer no visual sense of depth. Accordingly, there is no sense of depth in a monoscopic image. When generating a monoscopic image for the display screen 111, the default calculations for a monoscopic image are based on an assumption that there is one eye centered between where two eyes would be. The result is a monoscopic image that does not have depth like a stereoscopic image has depth.
The first window controller 315 scales the first image (e.g., left-eye image) to the appropriate scaling parameters of the display screen 111. The second window controller 320 scales the second image (e.g., right-eye image) to the appropriate scaling parameters of the display screen 111. The third window controller 322 scales a monoscopic image to the appropriate scaling parameters of the display screen 111. The fourth window controller 322 is configured to receive a pre-composited image from a software module (not shown) that is external to the display controller 305. The first window controller 315, the second window controllers 320, the third window controller 322, and/or the fourth window controller 324 each send respective scaled images to the blender component 325.
In one implementation, the blender component 325 is a multiplexer (mux). The blender component 325 is configured to interleave (e.g., composite, blend, etc.), among other things, the first image and the second image into a corresponding interleaving format (e.g., row interleave, column interleave, checkerboard interleave, or sub-pixel interleave, etc.), which is discussed below with reference to
The blender component 325 can scan out to the display screen 111 a combination of windows according to one or more selections of the blending format selector 332 (e.g., stereo, mono, and/or normal, etc.), which is discussed below with reference to
Advantageously, because the hardware components of the display system 300 do not need to perform an additional memory pass before scanning the composited image to the display screen 111, the display system 300 substantially eliminates the corresponding memory bandwidth issues and/or the memory input/output (I/O) power overhead issues that are suffered by conventional systems. By using hardware components, the display controller 305 natively supports interleaving images of two hardware window controllers to generate a composited image. Also, because the display system 300 performs fewer passes to memory, the display system 300 consumes less power. Accordingly, where the display system 300 is powered by a battery, the display system 300 draws less battery power, thereby extending the battery charge duration. The display controller 305 also supports blending the composited image with a monoscopic image and/or with a pre-composited image. The display system 300 also supports various selections of the interleaving format selector 330, selections of the blending format selector 332, and/or timing programming according to the clock CLK in order to scan out an appropriate image to the display screen 111.
The display system 300 may be implemented on a dedicated electronic visual display, a desktop computer, a laptop computer, tablet computer and/or a mobile phone, among other platforms. Implementations of various interleaving formats in the display system 300 are discussed below with reference to
Referring again to
The display controller 305 can either pre-decimate content meant for the auto-stereoscopic panel, or may deliver an image to the display screen 111 at full resolution, as shown below with reference to
As described above, the display system 300 utilizes a first window controller (e.g., for processing a first image) and a second window controller (e.g., for processing a second image) with a blender component 325 (e.g., smart mux) in the display controller 305 to implement interleaved stereoscopic support. The two windows (e.g., first image and second image) are treated as originating from the same image and having a common depth. The display controller 305 uses the two windows to generate a composite stereoscopic image. The blender component 325 is configured to receive pixels from the two post-scaled windows in a manner required to support at least one of the following interleaving formats: row interleave, column interleave, checkerboard interleave, or sub-pixel interleave.
Pre-decimated means the windows (415, 420) are filtered down to half the resolution of the screen (or half the resolution of the window in which the image is to be displayed) before the display controller receives the windows (415, 420). For example, if the screen has a resolution of 1920 pixels (width)×1200 pixels (height), then the first image 415 includes 960 columns of pixels, and the second image 420 includes 960 columns of pixels; each column of each window has 1200 pixels, which is the height of the screen. In another example, if a window that is a subset of the screen has a resolution of 800 pixels (width)×600 pixels (height), then the first image 415 includes 400 columns of pixels, and the second image 420 includes 400 columns of pixels; each column of each window has 600 pixels, which is the height of the window.
For explanatory purposes, only portions of the images (415, 420) and the composited image 425 are shown.
For pre-decimated images, as shown in
Non-pre-decimated means the images (515, 520) are unfiltered at full resolution of the screen (and/or full resolution of the window in which the image is to be displayed) before the display controller receives the images (515, 520). For example, if the screen has a resolution of 1920 pixels (width)×1200 pixels (height), then the first image 515 includes 1920 columns of pixels, and the second image 520 includes 1920 columns of pixels; each column of each window has 1200 pixels, which is the height of the screen. In another example, if a window that is a subset of the screen has a resolution of 800 pixels (width)×600 pixels (height), then the first image 515 includes 800 columns of pixels, and the second image 520 includes 800 columns of pixels; each column of each window has 600 pixels, which is the height of the window.
For explanatory purposes, only portions of the images (515, 520) and the composited image 525 are shown. The example of
For non-pre-decimated images, as shown in
In another implementation, the display controller can carry out row interleaving (not shown), as opposed to column interleaving. The display controller typically performs row interleaving when the display system is set to a portrait mode, which describes the way in which the image is oriented for normal viewing on the screen. Landscape mode is a common image display orientation. To implement row interleaving and/or portrait mode, the display controller rotates images from a memory (e.g., a memory of the source or a memory of the display system). Procedures for row interleaving are substantially the same as column interleaving, but instead rows of pixels are interleaved.
In another implementation, the display controller can carry out checkerboard interleaving (not shown). Checkerboard interleaving is a subset of column interleaving and/or row interleaving. To implement checkerboard interleaving, the display controller switches the beginning pixel of each row (or column) between a pixel of the first image and then a pixel of the second image in the next row (or column). For example, each pixel column of the composited image includes alternating pixels between a pixel the first image a pixel of the second image in order to form a checkerboard pattern in the composite image. The resulting composited image thereby resembles a checkerboard pattern.
For explanatory purposes, only portions of the sub-images (615, 620) and the composited image 625 are shown. Pixels L0 and L1 of the first image 615 are shown, each pixel having a separate value for red, green, and blue. Likewise, pixels R0 and R1 of the second image 620 are shown, each pixel having a separate value for red, green, and blue. Pixels P0, P1, P2, and P3 are shown for the composited image 625.
For example, pixel P0 of the composited image 625 is a composite of the red value of pixel L0, the green value of pixel R0, and the blue value of pixel L0. Pixel P1 is a composite of the red value of pixel R0, the green value of pixel L0, and the blue value of pixel R0. Pixel P2 of the composited image 625 is a composite of the red value of pixel L1, the green value of pixel R1, and the blue value of pixel L1. Pixel P3 is a composite of the red value of pixel R1, the green value of pixel L1, and the blue value of pixel R1. Other combinations of interleaving sub-pixels are also within the scope of the present technology. The display controller then generates a composited image 625 based on the composited pixels and scans the composited image 625 onto the screen for viewing.
Displaying a Stereoscopic Window with a Monoscopic Window
Referring again to
A software module (not shown) typically manages aligning the windows for the display screen 111 in
Referring back to
In an alternative embodiment, the display system 300 can scan out a stereoscopic window with a normal window. As described above with reference to
Accordingly, the implementation of the fourth window controller 324 configures the display controller to scan out multiple stereoscopic windows to the display screen 111. For example, a software module (not shown) manages the compositing of a second stereoscopic image and uses the fourth window controller 324 to display the second stereoscopic window. The display controller 305 can scan out that second stereoscopic window along with a first stereoscopic window that the display controller 305 composites in hardware by using the blender component 325. Accordingly, the blender component 325 is configured to blend normal, stereoscopic and/or monoscopic windows.
Operating parameters of the blender component 325 are set according to the interleaving format selector 330 and/or the blending format selector 332. The setting of a particular interleaving format selector 330 determines whether particular image data is to receive column interleave, row interleave, checkerboard interleave, and/or sub-pixel interleave, among other types of interleaving. The setting of a particular blending format selector 332 determines whether the blender component 325 is to treat particular image data as being stereo, mono, or normal.
In one implementation, the blender component 325 includes a multiplexer (mux) that includes circuitry for processing according to various selections of the interleaving format selector 330 and/or the blending format selector 332. The circuitry can include an arrangement of hardware gates (e.g., OR gates, NOR gates, XNOR gates, AND gates, and/or NAND gates, etc.) that configure the blender component 325 to interleave two or more data streams received from the first window controller 315, the second window controller 320, and/or the third window controller 322. The circuitry of the blender component 325 may also include an arrangement of electronic switches for setting the circuitry to process image data according to the interleaving format selectors 330 (e.g., column, row, checkerboard, sub-pixel, etc.) and/or the blending format selectors 332 (e.g., stereo, mono, normal, etc.). In light of the descriptions above with reference to
The invention has been described above with reference to specific embodiments and numerous specific details are set forth to provide a more thorough understanding of the invention. Persons skilled in the art, however, will understand that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.