This invention relates to the field of digital image transmission and more specifically to a method and system for encoding and decoding frames of a digital image stream.
When transmitting digital image streams, some form of compression (also referred to as encoding) is often applied to the image streams in order to reduce data storage volume and bandwidth requirements. For instance, it is known to use a quincunx or checkerboard pixel decimation pattern in video compression. Obviously, such compression leads to a necessary decompression (or decoding) operation at the receiving end, in order to retrieve the original image streams.
In commonly assigned US patent application publication 2003/0223499, stereoscopic image pairs of a stereoscopic video are compressed by removing pixels in a checkerboard pattern and then collapsing the checkerboard pattern of pixels horizontally. The two horizontally collapsed images are placed in a side-by-side arrangement within a single standard image frame, which is then subjected to conventional image compression (e.g. MPEG2) and, at the receiving end, conventional image decompression. The decompressed standard image frame is then further decoded, whereby it is expanded into the checkerboard pattern and the missing pixels are spatially interpolated.
Although the various levels of compression/encoding and decompression/decoding that digital image streams undergo in the course of transmission are necessary given the current standards for storage and broadcast (transport) of video sequences, problems inevitably arise in the form of loss of information and/or distortion. Various different techniques for these compression/encoding and decompression/decoding operations have been developed over the years and continue to be improved upon, with a particular goal being to reduce the inherent degree of data loss and/or image artifacts. However, there is still much room for improvement, particularly when it comes to increasing the quality level of the reconstructed image stream at the receiving end.
Consequently, there exists a need in the industry to provide an improved method and system for encoding and decoding digital image streams.
In accordance with a broad aspect, the present invention provides a method of encoding a digital image frame. The method includes generating metadata representative of a value of at least one component of at least one pixel of the frame in the course of applying an encoding operation to the frame.
In accordance with another broad aspect, the present invention provides a method of decoding an encoded digital image frame for reconstructing an original version of the frame. The method includes utilizing metadata in the course of applying a decoding operation to the encoded frame, wherein the metadata is representative of a value of at least one component of at least one pixel of the original version of the frame.
In accordance with yet another broad aspect, the present invention provides a system for processing frames of a digital image stream. The system includes a processor for receiving a frame of the image stream, the processor being operative to generate metadata representative of a value of at least one component of at least one pixel of the frame. The system also includes a compressor for receiving the frame and the metadata from the processor, the compressor being operative to apply a compression operation to the frame and to the metadata for generating a compressed frame and associated compressed metadata. The system includes an output for releasing the compressed frame and the compressed metadata.
In accordance with a further broad aspect, the present invention provides a system for processing compressed image frames. The system includes a decompressor for receiving a compressed frame and associated compressed metadata and for applying thereto a decompression operation in order to generate a decompressed frame and associated decompressed metadata. The system also includes a processor for receiving the decompressed frame and its associated decompressed metadata from the decompressor, the processor being operative to utilize the decompressed metadata in the course of applying a decoding operation to the decompressed frame for reconstructing an original version of the decompressed frame, wherein the decompressed metadata is representative of a value of at least one component of at least one pixel of the original version of the decompressed frame. The system further includes an output for releasing the reconstructed original version of the decompressed frame.
In accordance with another broad aspect, the present invention provides a processing unit for processing frames of a digital image stream, the processing unit operative to generate metadata representative of a value of at least one component of at least one pixel of at least one frame of the image stream in the course of applying an encoding operation to the frames of the image stream.
In accordance with yet another broad aspect, the present invention provides a processing unit for processing frames of a decompressed image stream, the processing unit operative to receive metadata associated with a decompressed frame and to utilize this metadata in the course of applying a decoding operation to the decompressed frame for reconstructing an original version of the decompressed frame, wherein the metadata is representative of a value of at least one component of at least one pixel of the original version of the decompressed frame.
The invention will be better understood by way of the following detailed description of embodiments of the invention with reference to the appended drawings, in which:
It should be understood that the expressions “decoded” and “decompressed” are used interchangeably within the present description, as are the expressions “encoded” and “compressed”. Furthermore, although examples of implementation of the invention will be described herein with reference to three-dimensional stereoscopic images, such as movies, it should be understood that the scope of the invention encompasses other types of video images as well.
Stored digital image sequences are then converted to an RGB format by processors such as 20 and 22 and fed to inputs of moving image mixer 24. Since the two original image sequences contain too much information to enable direct storage onto a conventional DVD or direct broadcast through a conventional channel using the MPEG2 or equivalent multiplexing protocol, the mixer 24 carries out a decimation process to reduce each picture's information. More specifically, the mixer 24 compresses or encodes the two planar RGB input signals into a single stereo RGB signal, which may then undergo another format conversion by a processor 26 before being compressed into a standard MPEG2 bit stream format by a typical compressor circuit 28. The resulting MPEG2 coded stereoscopic program can then be broadcasted on a single standard channel through, for example, transmitter 30 and antenna 32 or recorded on a conventional medium such as a DVD. Alternative transmission medium could be, for instance, a cable distribution network or the Internet.
Turning now to
Video processor 106 is capable to perform various different tasks, including for example some or all video playback tasks, such as scaling, color conversion, compositing, decompression and deinterlacing, among other possibilities. Typically, the video processor 106 would be responsible for processing the received compressed image stream 102, as well as submitting the compressed image stream 102 to color conversion and compositing operations, in order to fit a particular resolution.
Although the video processor 106 may also be responsible for decompressing and deinterlacing the received compressed image stream 102, this interpolation functionality may alternatively be performed by a separate, back-end processing unit. In a specific, non-limiting example, the compressed image stream 102 is a compressed stereoscopic image stream 102 and the above-discussed interpolation functionality is performed by a stereoscopic image processor 118 that interfaces between the video processor 106 and both the DVI 110 and display signal driver 112. This stereoscopic image processor 118 is operative to decompress and interpolate the compressed stereoscopic image stream 102 in order to reconstruct the original left and right image sequences. Obviously, the ability of the stereoscopic image processor 118 to successfully reconstruct the original left and right image sequences is greatly hampered by any data loss or distortion in the compressed image stream 102.
The present invention is directed to a method and system for encoding and decoding frames of a digital image stream, resulting in an improved quality of the reconstructed image stream after transmission. Broadly put, when encoding a frame of the image stream in preparation for transmission or recording, metadata is generated, where this metadata is representative of a value of at least one component of at least one pixel of the frame. The frame and its associated metadata both then undergo a respective standard compression operation (e.g. MPEG2 or MPEG, among other possibilities), after which the compressed frame and the compressed metadata are ready for transmission to the receiving end or for recording on a conventional medium. At the receiving end, the compressed frame and associated compressed metadata undergo respective standard decompression operations, after which the frame is further decoded/interpolated at least in part on a basis of its associated metadata in order to reconstruct the original frame.
It is important to note that, upon encoding of the image frame, metadata may be generated for each pixel of the frame or for a subset of pixels of the frame. Any such subset is possible, down to a single pixel of the image frame. In a specific, non-limiting example of implementation of the present invention, metadata is generated for some or all of the pixels of the frame that are decimated (or removed) in the course of encoding the frame. In the case of generating metadata for only select ones of the decimated pixels of the frame, the decision to generate metadata for a particular decimated pixel may be taken on a basis of by how much a standard interpolation of the particular decimated pixel deviates from the original value of the particular pixel. Thus, for a predefined maximum acceptable deviation, if a standard interpolation of the particular decimated pixel results in a deviation from the original pixel value that is greater than the predefined maximum acceptable deviation, metadata is generated for the particular decimated pixel. Conversely, if the standard interpolation of the particular decimated pixel results in a deviation that is smaller than the predefined maximum acceptable deviation, that is if the quality of the standard interpolation of the particular decimated pixel is sufficiently high, no metadata need be generated for the particular decimated pixel.
Advantageously, by generating and transmitting/recording along with an encoded image frame metadata characterizing at least certain pixels of the original frame, where this metadata is very easily compressible by standard compression schemes (e.g. techniques used in MPEG4), it is possible to increase a quality level of the reconstructed frame at the receiving end without adding a significant burden to the bandwidth of the transmission or recording medium. More specifically, when encoding of a frame results in certain pixels of the frame being removed from the frame and thus not transmitted or recorded, the metadata generated for some or all of these missing pixels and accompanying the encoded frame eases and improves the process of filling in the missing pixels and reconstructing the original frame at the receiving end.
Obviously, within an image stream, it is possible that while certain frames of the stream may benefit from having associated metadata, others may not require the metadata. More specifically, if the standard interpolation applied at the time of decoding of an encoded version of a particular frame results in a deviation from the original particular frame that is considered acceptable (e.g. smaller than a predefined maximum acceptable deviation), then metadata need not be generated for the particular frame. Accordingly, within a compressed image stream that is transmitted or recorded with associated metadata, certain frames may have associated metadata while others may not, without departing from the scope of the present invention.
It is important to note however that the technique of the present invention is applicable to all types of digital image streams and is not limited in application to any one specific type of image frames. That is, the technique may be applied to digital image frames other than stereoscopic image frames. Furthermore, the technique may be applied regardless of the particular type of encoding operation that is applied to the frames, whether it be compression encoding or some other type of encoding. Finally, the technique may even be applied if the digital image frames are to be transmitted/recorded without undergoing any further type of encoding or compression (e.g. transmitted/recorded as uncompressed data rather than JPEG, MPEG2 or other), without departing from the scope of the present invention.
In
As shown in
Assume for example that the pixels of the frame are in an RGB format, such that each pixel has three components and is defined by a vector of 3 digital numbers, respectively indicative of the red, green and blue intensity. Furthermore, within the frame, each pixel has adjacent pixels 1, 2, 3 and 4, each of which also has a respective red, green and blue component. When generating the metadata for decimated pixel X, one bit of metadata is generated for each of components Xr, Xg and Xb. Thus, the metadata for pixel X could be, for example, “010”, in which case the metadata values for Xr, Xg and Xb are “0”, “1” and “0”, respectively. These metadata values for Xr, Xg and Xb are set on a basis of predefined combinations of adjacent pixel component values, where the particular metadata value chosen for a specific component of decimated pixel X is representative of the combination that is closest in value to the actual value of that specific component. Taking for example the predefined combinations shown in
Xr=([1r]+[2r])/2
Xg=([3g]+[4g])/2
Xb=([1b]+[2b])/2
In the non-limiting example of
In yet another possible variation of the technique shown in
It is important to note that, regardless of the number of bits of metadata available per component of each decimated pixel X, various different predefined combinations of the adjacent pixel component values are possible and may be used to generate the metadata for the image frame, without departing from the scope of the present invention. Furthermore, it is also possible that the metadata for each decimated pixel X may be generated on a basis of the component values of non-adjacent pixels in the frame, or the component values of a combination of adjacent and non-adjacent pixels in the frame, without departing from the scope of the present invention.
In the above examples of
In a specific, non-limiting example, the metadata is generated only for those decimated pixels for which it has been found that a standard interpolation at the receiving end results in a deviation from the original pixel value that is greater than a predefined maximum acceptable deviation (i.e. the standard interpolation degrades the quality of the reconstructed frame). Thus, in the case of a decimated pixel for which a standard interpolation results in a deviation from the original pixel value that is smaller than the predefined maximum acceptable deviation (i.e. a good quality interpolation is possible at the receiving end), metadata need not be generated.
In a variant example of implementation of the present invention, in the course of applying an encoding operation to an image frame, metadata is generated for only select components of select decimated pixels of the frame. Thus, for a particular decimated pixel, metadata may be generated for at least one component of the particular pixel, but not necessarily for all of the components of the particular pixel. Obviously, it is also possible that no metadata be generated for the particular decimated pixel, in the case where the standard interpolation of the particular decimated pixel is of sufficiently high quality. In a specific, non-limiting example, the decision to generate metadata for a particular component of a decimated pixel may be taken on a basis of by how much a standard interpolation of the particular component of the decimated pixel deviates from the original value of the particular component. Thus, for a predefined maximum acceptable deviation, if a standard interpolation of the particular component of the decimated pixel results in a deviation from the original component value that is greater than the predefined maximum acceptable deviation, metadata is generated for the particular component of the decimated pixel. Conversely, if the standard interpolation of the particular component of the decimated pixel results in a deviation that is smaller than the predefined maximum acceptable deviation, that is if the quality of the standard interpolation of the particular component is sufficiently high, no metadata need be generated for the particular component of the decimated pixel.
In another variant example of implementation of the present invention, in the course of applying an encoding operation to an image frame, metadata is generated for each and every component of each and every pixel of the image frame that is decimated or removed from the frame during the encoding. The provision of this metadata in association with the encoded frame will thus provide for a simpler and more efficient interpolation of missing pixels upon decoding of the encoded frame at the receiving end. In a specific case of this variant example of implementation, when metadata is generated for each component of each decimated pixel of a frame, and the number of bits of metadata per component is equal to the actual number of bits of each pixel component in the frame, it is possible to obtain the greatest quality in the reconstructed image frame at the receiving end. This is because the metadata that accompanies the encoded frame and that is thus available at the receiving end represents the actual component values for every pixel that was decimated or removed from the frame upon compression encoding, without any approximation or interpolation.
In yet another variant example of implementation of the present invention, the generation of metadata for an image frame may include the generation of metadata presence indicator flags. Each flag would be associated with either the frame itself, a particular pixel of the frame or a specific component of a particular pixel of the frame and would indicate whether or not metadata exists for the frame, the particular pixel or the specific component. In the non-limiting example of a one-bit flag, the flag could be set to “1” to indicate the presence of associated metadata and to “0” to indicate the absence of associated metadata. In a specific, non-limiting example, upon generation of the metadata for a frame, a map of metadata presence indicator flags is also generated, where a flag may be provided for: 1) each pixel of the frame; 2) each one of a subset of pixels of the frame; 3) each one of a subset of components of each pixel of the frame; or 4) each one of a subset of components of a subset of pixels of the frame. A subset of pixels may include, for example, some or all of the pixels that are decimated from the frame during encoding. Upon decoding of an encoded frame having associated metadata, such metadata presence indicator flags would be particularly useful in the case where metadata was either only generated for certain ones of the pixels that were decimated from the frame during encoding or only generated for certain ones of the components of certain or all of the decimated pixels.
In a further variant example of implementation of the present invention, the generation of metadata for an image frame may include embedding in a header of this metadata an indication of the position of each pixel within the frame for which metadata has been generated. This header may further include, for each identified pixel position, an indication of the specific components for which metadata has been generated, as well as of the number of bits of metadata that is stored for each such component, among other possibilities.
Once all of the metadata for the image frame has been generated, the encoded frame and its associated metadata can be compressed by a standard compression scheme in preparation for transmission or recording. Note that the type of standard compression that is best suited to the frame may differ from the type of standard compression that is best suited to the associated metadata. Accordingly, the frame and its associated metadata may undergo different types of standard compression in preparation for transmission, without departing from the scope of the present invention. In a specific, non-limiting example, the stream of image frames may be compressed into a standard MPEG2 bit stream, while the stream of associated metadata may be compressed into a standard MPEG bit stream.
Once the encoded frame and its associated metadata have been compressed, they can be transmitted via an appropriate transmission medium to a receiving end. Alternatively, the compressed frame and its associated compressed metadata can be recorded on a conventional medium, such as a DVD. The metadata generated for the frames of an image stream thus accompany the image stream, whether the latter is sent over a transmission medium or recorded on a conventional medium, such as a DVD. In the case of transmission, a compressed metadata stream may be transmitted in a parallel channel of the transmission medium. In the case of recording, upon recording of the compressed image stream on a disk such as a DVD, the compressed metadata stream may be recorded in a supplementary track provided on the disk for storing proprietary data (e.g. user-data track). Alternatively, whether destined for transmission or recording, the compressed metadata may be embedded in each frame of the compressed image stream (e.g. in the header). Yet another alternative is to take advantage of a color space format conversion process that each frame must typically undergo prior to compression, in order to embed the metadata into the image stream. In a specific example, assuming that each frame of a stereoscopic image stream is converted from a RGB format to a YCbCr 4:2:2 color space prior to compression and transmission/recording of the image stream, the image stream may be formatted as a RGB 4:4:4 stream with the associated metadata stored in the additional storage space (i.e. extra bandwidth) available as a result of switching from the 4:2:2 format to the 4:4:4 format (while maintaining the main video data as YCbCr 4:2:2). Obviously, whether destined for transmission or recording, the frames of an image stream and the associated metadata may be coupled or linked together (or simply interrelated) by any one of various different solutions, without departing from the scope of the present invention.
When the frames of a compressed image stream along with the accompanying compressed metadata are either received over a transmission medium at a receiving end or read from a conventional medium by a player (e.g. DVD drive), the compressed frames and associated metadata are processed in order to reconstruct the original frames for display. This processing includes the application of standard decompression operations, where a different decompression operation may be applied to the compressed frames than to the associated compressed metadata. After this standard decompression, the frames may require further decoding in order to reconstruct the original frames of the image stream. Assuming that the frames were encoded at the transmitting end, upon decoding of a particular frame of the image stream, the associated metadata, if any, is used to reconstruct the particular frame. In a specific, non-limiting example, the metadata associated with a particular frame (or with specific pixels of the particular frame) is used to determine the approximate or actual values of at least some of the missing pixels of the particular frame, by consulting at least one metadata mapping table (such as the tables shown in
As discussed above, in a specific, non-limiting example, the metadata technique of the present invention may be applied to a stereoscopic image stream, where each frame of the stream consists of a merged image including pixels from a left image sequence and pixels from a right image sequence. In one particular example, compression encoding of the stereoscopic image stream involves pixel decimation and results in encoded frames, each of which includes a mosaic of pixels formed of pixels from both image sequences. Upon decoding, a determination of the value of each missing pixel is required in order to reconstruct the original stereoscopic image stream from these left and right image sequences. Accordingly, the metadata that is generated and accompanies the encoded stereoscopic frames is used at the receiving end to fill in at least some of the missing pixels when decoding the left and right image sequences from each frame.
Continuing with the example of a stereoscopic image stream,
In terms of implementation, the functionality necessary for the metadata-based encoding and decoding techniques described above can easily be built into one or more processing units of existing transmission systems, or more specifically of existing encoding and decoding systems. Taking for example the system for generating and transmitting a stereoscopic image stream of
Advantageously, the metadata technique of the present invention allows for backward compatibility with existing video equipment.
Although various embodiments have been illustrated, this was for the purpose of describing, but not limiting, the present invention. Various possible modifications and different configurations will become apparent to those skilled in the art and are within the scope of the present invention, which is defined more particularly by the attached claims.