This disclosure generally relates to stereoscopic displays, and more particularly, to a method and apparatus for encoding and decoding a stereoscopic video frame or data, so that it can be identified as stereo video frames or data by a receiver, and be compatible with existing receiver infrastructure.
Electronic stereoscopic displays offer benefits to viewers both for technical visualization and, more and more commonly, for entertainment. Cinema systems based on Texas Instruments Digital Light Processing (DLP) light engine technology and RealD polarization control components are being deployed widely in North America. Similar DLP technology is used in, for example, the Mitsubishi WD65833 Rear Projection television and the Samsung HL-T5676 RPTV. A different approach is used in the Hyundai E465S(3D) LCD television, which uses regularly arranged micro-polarizers bonded to an LCD display, such that circular polarized material alternately polarizes horizontal rows of pixels on the display. Thus, the 3D image is created by placing the left eye image into odd numbered rows and the right eye image in even numbered rows. The lenses in the 3D glasses are also polarized with material ensuring only the left eye sees the left image and vice versa. Yet another approach is used in the Samsung PN50A450P1D Plasma television. Different eyewear is used for polarization based versus time-sequential based 3-D, but these details are not germane to this disclosure.
The examples given above are all televisions that are capable of displaying both 2-D and 3-D content, but the formatting of the image data that is used when 3-D content is to be displayed is such as to render the images unwatchable if 2-D video data are (incorrectly) formatted as if they are 3-D data. This is currently handled in the products listed above by a viewer who manually switches the TV into “3-D mode” when 3-D content is to be played. This is typically done through menu selection. The specific formatting performed by the television itself or by a receiver depends on the technology used by the display device.
The present disclosure provides a method and apparatus for marking, encoding or tagging a video frame to indicate that the content should be interpreted by a receiver, or suitably equipped display/TV, as 3-D video content. The present disclosure also provides a method and apparatus for identifying or decoding the tagged video frame to detect whether the content should be interpreted as 3-D video content.
In an embodiment, the 3-D video image, which is encoded in a transportable format such as side-by-side, is modified by replacing lines of the image with a specific pattern of color bars that are robust to compression, and are in essence improbable to occur within image content. When the receiver detects the presence of these color bars, it interprets them as a command to switch into 3-D mode.
It would be desirable for the television or receiver to determine automatically whether the incoming video data is intended to be displayed in 3-D or 2-D. This would have the benefit that the viewer would not have to manually adjust menu items or meddle with remote controls at the start of a 3-D movie. There are also other benefits such as allowing the producers of content to start a program in 2D mode, display a banner prompting the viewer(s) to “put your glasses on now”, and then switch the television into 3-D mode by changing the content to 3-D content.
Furthermore it is highly desirable that 3-D video content can be transmitted over the existing (2-D) video delivery infrastructure. Generally, content from delivery systems may be from streaming source(s) or may be from stored file(s). For example, such delivery systems may include, but are not limited to, DVD, Blu-Ray disc, Digital Video Recorder, Cable TV, Satellite TV, Internet and IPTV, and over-the-air broadcast, and the like. These delivery systems use various types of video compression, and for 3-D video content to be successfully transported over them, the 3-D data should be compatible with a number of compression schemes. One efficient scheme that has this property, is the side-by-side encoding described in commonly-owned U.S. Pat. No. 5,193,000, entitled “Multiplexing technique for stereoscopic video system,” to Lipton et al., which is hereby incorporated by reference. In this scheme, the left and right stereo frames are re-sampled to a lower resolution to allow them to be horizontally “squeezed” and placed side-by-side on a single 3-D frame. Because the resulting encoded image is itself an image (albeit with a boundary running down through the middle of it), it can be transported through any of the above-disclosed delivery systems.
Other related art in this field includes commonly-owned U.S. Pat. No. 5,572,250, entitled “Universal electronic stereoscopic display,” and U.S. Pat. No. 7,184,002, entitled “Above-and-below stereoscopic format with signifier” describe related systems and are herein incorporated by reference. Patent '250 describes a system in which a “tag” is embedded in time-sequential stereoscopic video fields to allow the system to determine whether the field that is being displayed at a given time is intended for the left or right eye. Patent '002 describes a system in which stereo fields are encoded in the top and bottom halves of a video image. A “tag” is included in the video data to help the system determine whether the field that is being displayed by a CRT should be sent to the left or right eye.
As disclosed herein, to address the problems discussed, a “tagging” technique may be used to modify image content in a frame to indicate whether visual content is to be treated as 2-D or 3-D by a receiver (as mentioned above).
Encoding an Image Frame to Indicate 3-D Video Content
The encoding process starts at step 101. In step 102, 3-D video data is received in a transportable format, for example side-by-side format. In other embodiments, the transportable format of the 3-D video data may be in up-and-down format, a temporally or spatially multiplexed format, or a Quincunx multiplexed format. Various transportable formats are disclosed above, but others may alternatively be used. The type of transportable format used is not germane to this disclosure.
Optionally, at least the bottom line of each frame is replaced with the 3-D tag data in step 104. In an embodiment, the bottom eight lines of each frame are replaced with the 3-D tag data. In another embodiment, the bottom two lines of each frame are replaced with the 3-D tag data. Other embodiments may vary the number of lines to be replaced with the 3-D tag data. A line of the frame or multiple lines of the frame are for illustrative purposes only and step 104 may be replaced with a step in which any portion of the image is replaced with a 3-D tag data. For example, in other embodiments, a watermark, a rectangular block, circle, or any predetermined shape in each frame may be replaced with 3-D tag data.
The most convenient way of adding the video tag depends on how the video data are created initially. The addition of the tag is a process that may be integrated into the generation of the video data, or it may be added subsequently by use of a stand-alone computer program.
Although this disclosure mostly discusses using the tag to identify whether the video data is 3D or not, a tag may be used to carry a number of unique pieces of information, not just whether the video is 3D. In either case, the tags may be constant throughout the entire video data, or may be dynamic (or changing) depending on the frame. The tag may be a predetermined specific color pattern or the tag may be modified dynamically in order to convey other information (e.g., real time information) that may affect the video conversion process. The simplest tag uses the entire tag to identify whether the content is 3D or not. The tag can be significantly redundant, and can carry more than a single piece of information. In other words, the tag can become a carrier of multiple pieces of information and this information could be changed depending on the frame. In an embodiment, the information is changed on a frame by frame basis. This “real time” information may include, but is not limited to, information about the characteristics of the content of a frame—like color space, dynamic range, screen size that the content was mastered for, and so on. In effect, the tag may be used as a means to carry metadata and can carry a wide variety of information. In an embodiment, in either case of the predetermined specific color pattern or the dynamic tag, the tag is robust in that the tag is unlikely to appear in naturally occurring non-stereoscopic video data.
In an embodiment, exemplary pixel values of the video tag used are specified in the table of
The tagged image may then optionally be compressed using conventional compression techniques in step 106. Once compressed, the tagged image video data can be stored (step 108) and/or transmitted over video distribution channels (step 110). Transmitting the video data over standard video pathways (e.g., cable/satellite/terrestrial/broadband broadcast, streaming, DVD, Blu Ray discs, et cetera) typically include compression and/or decompression and chroma subsampling, and may include scaling.
In an embodiment, an advantage of the present disclosure is that the boundaries of the blocks of color in the video tag may be aligned with the boundaries of the blocks used by the popular MPEG2 compression scheme. This helps to preserve the integrity of the blocks even under severe compression. It should be noted that the steps may be performed in another order and that other steps may be incorporated into the process without departing from the spirit of the disclosure.
One advantage of using the bottom eight or two lines (as opposed to a smaller tag) is that it allows the tag to survive image corrupting processes (such as compression and decompression) with enough fidelity to be reliably detected. One advantage of using RGB stimulus values 16 and 235 (as opposed to 0 and 255) is more universal compatibility, and the fact that the receiver may be able to detect if color range expansion occurred in the playback path which may be useful in the event the receiver performs any color space processing.
Although an embodiment teaches the use of the bottom eight lines and another embodiment teaches the use of the bottom two lines of an image to carry the 3-D tag, it should be apparent to a skilled artisan that alternative encoding schema may be used, for instance using a different number of lines to encode, and/or placing the tag or tag lines in another part of the frame (e.g., the top part of the frame). The common elements between the disclosed embodiments are that the tag is present in the image data itself, and after being decoded, it is masked with other pixels (e.g., black pixels).
Decoding an Image Frame to Detect 3-D Video Content
The decoding process starts at step 201. Conventional processing techniques such as using software, hardware, or a combination, for example, a processor running a software program, may be used to perform the decoding process. Image data are received at step 202. The image data may be compressed or uncompressed prior to the detection step 204. In the case that the image data are uncompressed prior to the detection step 204, the values of the data near the center of the color blocks may be examined to determine whether they are close enough to the tag value.
In an embodiment, after the video data are uncompressed, the receiver interrogates the pixel values. This can be done with a processor, logic inside a field-programmable gate array (FPGA), or application-specific integrated circuit (ASIC), for example. The receiver examines the values of the bottom line of pixels.
When a 3-D tag is detected at step 204, 3-D mode is indicated at step 206, thus triggering or switching into 3-D mode or continuing to operate in 3-D if already in that mode. The tag pixels are optionally replaced with black pixels at step 208. Referring back to detection step 204, if enough of these pixel values fall outside the allowed range, the 3-D tag is not detected, thus 2-D mode is indicated at step 210, thus triggering or switching into 2-D mode or continuing to operate in 2-D if already in that mode, and the bottom lines are allowed to pass through unaffected at step 212.
In an embodiment in which the bottom eight lines of an image carry the 3-D tag, the detection step 204 includes the receiver performing the following steps on the tag data residing in the last eight rows of the frame. In this embodiment, only the center part of the tagged data is examined. The first two rows and the final two rows of the eight lines of tag data are ignored. The center four rows are processed in the following manner.
For a frame, if the error count exceeds a predetermined threshold, then that frame is deemed to not carry the 3-D tag. If the error count for all of R, G and B is below the predetermined threshold then the frame is deemed to carry the 3-D tag. The thresholds used in this exemplary embodiment are also given in the table in
In an embodiment bottom in which the bottom two lines of an image carry the 3-D tag, the detection step 204 includes the receiver performing the following steps on the tag data residing in the last two rows of the frame. In this embodiment, only the second row of the tagged data is examined. The first row of tag data is ignored. The bottom row is processed in the following manner.
For a frame, if the error count exceeds a predetermined threshold, then that frame is deemed to not carry the 3-D tag. If the error count for all of R, G and B is below the predetermined threshold then the frame is deemed to carry the 3-D tag. The thresholds used in this exemplary embodiment are also given in the table in
In an embodiment, the receiver can switch immediately into or out of 3-D (or 2-D) mode on detection of the presence or absence of the tag or, optionally, can wait for a number of successive detections before making a change of state. This provides more immunity to noise at the cost of some delay in changing modes. For example, consistent with the disclosed embodiment, two successive detections of the tag may suffice to switch into 3-D mode and likewise, two successive detections of no tag may suffice to switch to 2-D mode.
To add further immunity to noise, mode transition hysteresis may be used for the three qualification parameters mentioned above: error count; value thresholds; and successive frame count. If hysteresis is used, in an embodiment, once in 3-D mode, more tolerant values of each of these parameters are used for tag disqualification to go back into 2-D mode. These values are also given in the tables in
The details of the 3-D operation mode of the receiver (which may reside inside a television or display) depend on the details of the technology used, and may use conventional 3-D operation techniques known in the art. A decoder module may be used and may include, e.g., software code, a chip, a processor, a chip or processor in a television or DVD player, a hardware module with a processor, etc. For example, the Hyundai E465S(3D) television, which is currently commercially available in Japan, can accept a video stream in the side-by-side format and reformat it to display in the row-interlaced format required by the x-pol technology. The Hyundai E465S television is instructed manually to perform this formatting operation via a menu selection. If that TV was modified consistent with the disclosed embodiments, it may switch automatically on receipt of content that was properly tagged.
In an embodiment, after switching into 3-D mode, the receiving system removes the tag and replaces the tag with other pixels. For example, the tag may be replaced with all black pixels or pixels of another color (e.g., to match a border color). Other replacement methods may also be used including pixel replication.
In the exemplary embodiments of
For a 720 pixel image size, the tag is still two lines high, but the width of all blocks are scaled down by a factor of 1.5 (e.g., as if the player had scaled down a 1080 pixel source image for a 720 pixel display).
As discussed above, a decoder module may be used to decode any video data stream including at least one video frame and determine whether that video data includes a 3-D content identifier or tag.
In operation, the decoder module 1102 receives either 2-D or 3-D video data via input 1112. The analyzer module 1104 analyzes a portion of at least one frame of the video data and determines whether that data carries a 3-D content identifier. The analyzed portion of the image frame may include at least one line, multiple lines, or any other shape or block of pixels. The decoder module 1102 may output a signal or bit (or bits) indicating whether the 3-D content identifier is present 1116. The decoder module 1102 may also output the video data stream via video output 1114. In an embodiment, the decoder module 1102 removes a detected 3-D content identifier before outputting the video data stream. In another embodiment, the decoder module 1102 can output a signal or bit (or bits) for left/right image synchronization for 3-D data over sync output 1118. The decoder module 1102 may comprise, for example, software code, a system on a chip, a processor, a chip or processor in a television or DVD player, a set-top box, a personal computer, a hardware module with a processor, et cetera.
The decoder module 1102 may also include a receiving module (not shown in this figure) for receiving the 2-D or 3-D video data and an indicating module. The receiving module can receive the 2-D or 3-D video data. The indicating module uses the information from the analyzer module 1104 (the determination of whether a 3-D content identifier is present) and may provide a signal or bit (or bits) indicating that one of either a 3-D mode or 2-D mode. The decoder module 1102 may also include an image writing module (not shown in this figure) for replacing the pixels of the 3-D content identifier with other pixels. In an embodiment, the image writing module replaces the 3-D content identifier with black pixels, such that the viewer will be unable to see any tag information (however minimal) on the viewing screen, other than a hardly-noticeable thin black bar.
As used herein, the term “transportable format” refers to a format in which 3-D image content for left- and right-eye images is transported via the 2-D delivery infrastructure, which includes transportation via communications links (e.g., internet delivery of streaming media, video files, and the like) and/or storage media (e.g., DVD, Blu Ray disc, hard drives, ROM, and the like). Examples of such “transportable formats” include but are not limited to side-by-side, top-bottom, quincunx multiplexing, temporal/spatial modulation, or a combination thereof. As used herein, the terms “encoding,” is used synonymously with “marking,” and “tagging.” As used herein, the terms “decoding,” is used synonymously with “identifying.”
While various embodiments in accordance with the principles disclosed herein have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the invention(s) should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with any claims and their equivalents issuing from this disclosure. Furthermore, the above advantages and features are provided in described embodiments, but shall not limit the application of such issued claims to processes and structures accomplishing any or all of the above advantages.
Additionally, the section headings herein are provided for consistency with the suggestions under 37 CFR 1.77 or otherwise to provide organizational cues. These headings shall not limit or characterize the invention(s) set out in any claims that may issue from this disclosure. Specifically and by way of example, although the headings refer to a “Technical Field,” the claims should not be limited by the language chosen under this heading to describe the so-called field. Further, a description of a technology in the “Background” is not to be construed as an admission that certain technology is prior art to any invention(s) in this disclosure. Neither is the “Brief Summary” to be considered as a characterization of the invention(s) set forth in issued claims. Furthermore, any reference in this disclosure to “invention” in the singular should not be used to argue that there is only a single point of novelty in this disclosure. Multiple inventions may be set forth according to the limitations of the multiple claims issuing from this disclosure, and such claims accordingly define the invention(s), and their equivalents, that are protected thereby. In all instances, the scope of such claims shall be considered on their own merits in light of this disclosure, but should not be constrained by the headings set forth herein.
This application claims priority to U.S. Provisional Application Ser. No. 61/085,719, filed on Aug. 1, 2008 entitled “Method and Apparatus to Encode and Decode Stereoscopic Video Data,” and U.S. Provisional Application Ser. No. 61/150,218, filed on Feb. 5, 2009 entitled “Method and Apparatus to Encode and Decode Stereoscopic Video Data,” which are incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
61085719 | Aug 2008 | US | |
61150218 | Feb 2009 | US |