The present invention relates to encoding and decoding methods and apparatuses thereof; and, more particularly, to encoding and decoding methods for a single-view video or a multi-view video and apparatuses thereof.
This work was supported by the IT R&D program of MIC/IITA [2007-S-004-01, “Development of Glassless Single-User 3D Broadcasting Technologies”].
Single-view video coding is a method for encoding an image captured from one camera, and multi-view video coding (MVC) is a method for encoding images captured at the same time from a plurality of cameras disposed at different locations. The multi-view video encoding enables a user to interact with a system in order to enable the user to watch an image from a desired viewpoint. Therefore, the multi-view video encoding can support a next generation 3-D TV, a free viewpoint video, and a 3-D security system.
Effective compression has been receiving an attention in the single-view video coding and multi-view video coding.
Particularly, a multi-view video image includes a large amount of data to process, such as the number of cameras for capturing images and image sizes, compared with a typical single-view video image. Therefore, it is very important to effectively compress such a large amount of image data.
For example, terrestrial digital multimedia broadcasting (T-DMB) must provide an AV service at a limited bit rate such as 1.5 Mbps within a predetermined bandwidth. In T-DMB, each broadcasting station encodes video data at a bit rate of about 384 kbps for one AV program. Since each of the broadcasting station uses an optimized commercial encoder, an encoding method that provides a high compression rate at a low bit rate such as 5 to 600 Kbps may be more suitable to a stereoscopic DMB video coding technology, rather than an non-optimized reference SW-based encoder.
An embodiment of the present invention is directed to providing an encoding and decoding method for compressing data more effectively at a low bit rate.
Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art of the present invention that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof.
In accordance with an aspect of the present invention, there is provided a single-view video encoding method including performing motion estimation based on a base image and a reference image, generating residual data using blocks of the base image and the motion estimated blocks, down-sampling the residual data, and transforming the down-sampled residual data through Discrete Cosine Transformation (DCT) and quantizing the transformed residual data.
In accordance with another aspect of the present invention, there is provided a single-view video encoder including a motion estimator for performing motion estimation based on a base image and a reference image, a residual data generator for generating residual data using blocks of the base image and the motion estimated blocks, a down-sampling unit for down-sampling the residual data, and a quantizing unit for transforming the down-sampled residual data through Discrete Cosine Transformation (DCT) and quantizing the transformed residual data.
In accordance with another of the present invention, there is provided a single-view video decoding method including receiving a bit stream including base image information having residual data, up-sampling the residual data, and performing motion compensation based on a reference image and the up-sampled residual data and generating a base image.
In accordance with another aspect of the present invention, there is provided a single-view video decoder including a receiver for receiving a bit stream including base image information having residual data, an up-sampling unit for up-sampling the residual data, and a base image generator for performing motion compensation based on a reference image and the up-sampled residual data and generating a base image.
In accordance with another aspect of the present invention, there is provided a multi-view encoding method, including performing motion and disparity estimation based on a base image, a supplementary image, and a reference image, generating residual data using the reference image and the motion and disparity estimated data, down sampling the residual data, and transforming and quantizing the down sampled residual data using a discrete cosine transformation (DCT) method.
In accordance with another aspect of the present invention, there is provided a multi-view video encoder, including a motion and disparity estimator for performing motion and disparity estimation based on a base image, a supplementary image, and a reference image, a residual data generator for generating residual data using the base image and the motion and disparity estimated data, a down-sampling unit for down-sampling the residual data, and a quantizing unit for transforming the down-sampled residual data through Discrete Cosine Transformation (DCT) and quantizing the transformed residual data.
In accordance with another aspect of the present invention, there is provided a multi-view video decoding method including receiving a bit stream having base image information and supplementary image information, up-sampling the base image information, and performing motion and disparity compensation based on a reference image, the up-sampled base image information, and the supplementary image information, and generating a base image, wherein the base image information include residual data.
In accordance with another aspect of the present invention, there is provided a multi-view video decoder, including a receiver for receiving a bit stream having base image information and supplementary image information, an up-sampling unit for up-sampling the base image information, and a base image generator for performing motion and disparity compensation based on a reference image, the up-sampled base image information, and the supplementary image information, and generating a base image, wherein the base image information include residual data.
The advantages, features and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. When it is considered that detailed description on a related art may obscure a point of the present invention, the description will not be provided herein. Hereafter, specific embodiments of the present invention will be described in detail with reference to the accompanying drawings.
An encoding and decoding method according to the present invention can compress and restore video more effectively at a low bit rate.
Hereafter, the present invention will be described by referring to the drawings.
The down-sampling of the residual data is performed in a movement direction of an image. For example, if objects in an image make less horizontal movements, the down-sampling is performed in a horizontal direction. In this manner, the deterioration of image quality can be further minimized. The down-sampling can be performed in any one of a horizontal direction, a vertical direction, and a quarter direction according to an image. The down-sampling will be described in more detail in later.
The single-view video encoder according to the present embodiment may further include an up-sampling unit and a reference image generator for compensating motions using an up-sampled residual data and generating a reference image. The up-sampling unit may include an inverse quantizer 115 for inverse-quantizing the quantized residual data, an inverse discrete cosine transformation (IDCT) unit 117 for transforming the inverse-quantized residual data using the IDCT scheme, and an up-sampler 119 for up-sampling the transformed residual data. The motions of the reference image are compensated using the up-sampled residual data, and the motion compensated reference image may be used as a reference image for a next base image. The reference image may be stored in a memory 103. The up-sampling is used for restoring the down-sampled residual data and uses the same sampling method. Therefore, if the down-sampling is performed in the horizontal direction, the up-sampling is also performed in the horizontal direction.
The down-sampling of the residual data is performed in a movement direction of an image. For example, if objects in an image make less horizontal movements, the down-sampling is performed in a horizontal direction. In this manner, the deterioration of image quality can be further minimized. The down-sampling can be performed as any one of a horizontal down-sampling mode, a vertical down-sampling mode, and a quarter down-sampling mode according to an image. The down-sampling will be described in more detail in later.
The multi-view video encoder according to the present embodiment further includes an up-sampling unit 409 and a reference image generator for compensating motions and differences using the up sampled residual data and generating a reference image. The up sampling unit 409 includes an inverse quantization unit IQ for inverse quantizing the quantized residual data, an Inverse Discrete Cosine Transformation unit IDCT for transforming the inverse-quantized residual data using the IDCT scheme, and an up-sampler for up-sampling the transformed residual data. The motion of the reference image is compensated using the up-sampled residual data. The motion compensated reference image may be used as a reference image for a next base image to encode. The supplementary images and the reference image may be stored in a memory 411. Since the up-sampling is performed for restoring the down-sampled residual data, the same sampling method is used. Therefore, if the down sampling was performed in a horizontal direction, the up sampling is also performed in the horizontal direction.
(1) Horizontal Down-sampling Mode: ½ down sampling in a horizontal direction
(2) Vertical Down-sampling Mode: ½ down sampling in a vertical direction
(3) Quarter Down-sampling Mode: ½ down sampling in both of a horizontal direction and a vertical direction
Objects make motions in a horizontal director or a vertical direction according to images or contents. Therefore, any one of the horizontal, vertical, and quarter down-sampling modes is applied according to a movement direction of an object included in images or contents. For example, the horizontal down-sampling mode is performed for an image or content including an object makes less motion in a horizontal direction. In this manner, an amount of bits to encode can be reduced, and the deterioration of image quality can be minimized.
In case of a stereoscopic DMB video, in case of a monitor for displaying an image at a 320×240 resolution with a 3D display scheme, and a monitor for displaying images by interlacing images in a horizontal direction, the deterioration of image quality in a horizontal direction can be prevented by performing a horizontal down-sampling mode because the monitor displays data with a horizontal resolution reduced by ½. In case of a monitor displaying images by interlacing the images in a vertical direction, the deterioration of image quality in the horizontal direction can be prevented by performing the vertical down sampling operation because the monitor displays data with a vertical resolution reduced by ½.
The down sampling mode according to the present embodiment can be applied for four inter estimation modes 16×16, 8×16, 16×8, and 8×8 among inter estimation modes of joint multi-view video model (JMVM) in an estimation mode with down sampling applied to residual data. Therefore, the multi-view video encoding method according to the present embodiment perform 16 times of 4×4 DCT and quantization by dividing each macro block into 16 blocks of 4×4 pixels for luminance components. 8 times or 4 times of 4×4 DCT and quantization are performed in the present embodiment by down-sampling the residual data as shown in
Meanwhile, a De-blocking Filter may employ a De-blocking algorithm used in AVC. However, an indexing part may be modified not to refer blocks that are not encoded by down-sampling the residual data.
Syntax for embodying a method for down-sampling residual data according to the present embodiment may add information (residual_dowmsampling_mode) on a down sampling mode for residual data of the present invention to sequence_paprameter_mvc_extension( )
Here, the information residual_dowmsampling_mode may include information of Table 1 with H.7.4.1 “sequence parameter set MVC extension semantics”.
Graphs of
Graphs of
The above described method according to the present invention can be embodied as a program and stored on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by the computer system. The computer readable recording medium includes a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a floppy disk, a hard disk and an optical magnetic disk.
While the present invention has been described with respect to the specific embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.
Following description exemplifies only the principles of the present invention. Even if they are not described or illustrated clearly in the present specification, any one of ordinary skill in the art can embody the principles of the present invention and invent various apparatuses within the concept and scope of the present invention. The use of the conditional terms and embodiments presented in the present specification are intended only to make the concept of the present invention understood, and they are not limited to the embodiments and conditions mentioned in the specification.
Also, all the detailed description on the principles, viewpoints and embodiments and particular embodiments of the present invention should be understood to include structural and functional equivalents to them. The equivalents include not only currently known equivalents but also those to be developed in future, that is, all devices invented to perform the same function, regardless of their structures.
For example, block diagrams of the present invention should be understood to show a conceptual viewpoint of an exemplary circuit that embodies the principles of the present invention. Similarly, all the flowcharts, state conversion diagrams, pseudo codes and the like can be expressed substantially in a computer-readable media, and whether or not a computer or a processor is described distinctively, they should be understood to express various processes operated by a computer or a processor.
Functions of various devices illustrated in the drawings including a functional block expressed as a processor or a similar concept can be provided not only by using hardware dedicated to the functions, but also by using hardware capable of running proper software for the functions. When a function is provided by a processor, the function may be provided by a single dedicated processor, single shared processor, or a plurality of individual processors, part of which can be shared.
The apparent use of a term, ‘processor’, ‘control’ or similar concept, should not be understood to exclusively refer to a piece of hardware capable of running software, but should be understood to include a digital signal processor (DSP), hardware, and ROM, RAM and non-volatile memory for storing software, implicatively. Other known and commonly used hardware may be included therein, too.
In the claims of the present specification, an element expressed as a means for performing a function described in the detailed description is intended to include all methods for performing the function including all formats of software, such as combinations of circuits for performing the intended function, firmware/microcode and the like.
To perform the intended function, the element is cooperated with a proper circuit for performing the software. The present invention defined by claims includes diverse means for performing particular functions, and the means are connected with each other in a method requested in the claims. Therefore, any means that can provide the function should be understood to be an equivalent to what is figured out from the present specification.
Other objects and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. The same reference numeral is given to the same element, although the element appears in different drawings. In addition, if further detailed description on the related prior arts is determined to obscure the point of the present invention, the description is omitted. Hereafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
The present invention reduces the number of blocks to encode by down-sampling residual data. Therefore, the deterioration of image quality can be minimized and video data can be compressed more effectively at a low bit rate. The present invention can be applied not only to a single-view video but also to a multi-view video.
Single-view video coding is a method for encoding an image captured from one viewpoint, and multi-view video coding is a method for encoding images captured at the same time from more than two viewpoints, which are disposed at different spatial locations. Although the single-view video encoding and the multi-view video encoding use a similar encoding method, the multi-view video encoding uses a disparity vector DV with a motion vector unlike the single-view that uses a motion vector (MV) only. The motion vector denotes motion information of an object in an image captured from one camera, and the disparity vector denotes a location difference of an object among images captured from different cameras. Hereinafter, the single-view video encoding method and the multi-view video encoding method will be described in detail.
In case of the single-view video, motion estimation is performed for a base image using a reference image. The reference image is an image compared with the base image. For example, the reference image may be an image previously encoded. Residual data is generated using the motion-estimated blocks of the reference image and blocks of the base image, and the number of blocks to encode is reduced by down-sampling the generated residual data. The down-sampled residual data is encoded by transforming and quantizing the down-sampled residual data through discrete cosine transformation (DCT).
Meanwhile, the quantized residual data is inverse quantized, transformed through inverse discrete cosine transformation (IDCT), and up-sampled. Using the up-sampled residual data, motion compensation is performed, and a motion-compensated image is generated. The motion compensated image may be used as a reference image for a next image to encode. Here, the down sampling may be performed according to a movement rate of an image. The movement rate includes a movement direction of an object included in an image. The down sampling may use three methods, a horizontal down-sampling mode for down-sampling data in a horizontal direction, a vertical down-sampling mode for down-sampling data in a vertical direction, and a quarter down-sampling mode for down-sampling data in a horizontal direction and a vertical direction. For example, in case of contents having less horizontal movements, the horizontal down-sampling is performed for reducing an amount of bits while minimizing the deterioration of image quality. Here, the down sampling and the up sampling use the same sampling method. For example, if the down sampling is performed as a horizontal down sampling mode, the up sampling is also performed as a horizontal down sampling mode.
Decoding of the coded signal view video performs the encoding steps of the single-view video are performed in a reverse order. That is, a base image can be restored by decoding the down-sampled and encoded data, up-sampling the decoded data, performing motion compensation. Here, the down sampling and the up sampling use the same sampling method. For example, the down sampling is performed as a horizontal down sampling mode, the up sampling is also performed as a horizontal down sampling mode.
In case of the multi-view video, motion and disparity estimation is performed for a base image using a supplementary image and a reference image. For example, motion estimation is performed using the base image and a reference image, and disparity estimation is performed using the base image and the supplementary image. Here, the base image and the supplementary image are images of different viewpoints. For example, in case of two viewpoints captured from a left side and a right side, the base image and the supplementary image may be a left image and a right image or vice versa. The reference image is an image compared with the base image. For example, the reference image may be an image encoded at a previous stage. Residual data is generated using estimated blocks of the supplementary image and the reference image and blocks of the base image, and the number of blocks to encode is reduced by down sampling the generated residual data. The down-sampled residual data is encoded by transforming and quantizing the down-sampled residual data through discrete cosine transformation (DCT).
Meanwhile, the quantized residual data is inverse quantized, transformed through the inverse discrete cosine transformation (IDCT), and up-sampled. The motion and disparity compensation is performed using the up-sampled residual data. The motion compensated image may be used as a reference image for a next image to encode. Here, the down sampling may be performed according to the movement rate of an image. The movement rate includes a movement direction of an object included in an image. The down sampling may use three methods, a horizontal down-sampling mode for down-sampling data in a horizontal direction, a vertical down-sampling mode for down-sampling data in a vertical direction, and a quarter down-sampling mode for down-sampling data in a horizontal direction and a vertical direction. For example, in case of contents having less horizontal movements, the horizontal down-sampling is performed for reducing an amount of bits while minimizing the deterioration of image quality. Here, the down sampling and the up sampling use the same sampling method. For example, if the down sampling is performed as a horizontal down sampling mode, the up sampling is also performed as a horizontal down sampling mode.
Decoding of coded multi-view video is performed the encoding steps of the multi-view video in a reverse order. That is, a base image may be restored by decoding the down-sampled and encoded data, up-sampling the decoded data, performing motion and disparity compensation. Here, the down sampling and the up sampling uses the same sampling method. For example, if the down sampling is performed as a horizontal down sampling mode, the up sampling is also performed as a horizontal down sampling mode.
Hereinafter, the single-view video coding and the multi-view video coding will be described in detail with embodiments.
<Single-View Video Coding>
A single-view video encoding method according to an embodiment of the present invention includes performing motion estimation based on a base image and a reference image, generating residual data using blocks of the base image and the motion estimated blocks, down-sampling the residual data, and transforming the down-sampled residual data through Discrete Cosine Transformation (DCT) and quantizing the transformed residual data.
The single-view video encoding method may further include inverse-quantizing the quantized residual data and transforming the inverse-quantized residual data through Inverse Discrete. Cosine Transformation (IDCT), and up-sampling the transformed residual data, and performing motion compensation using the up-sampled residual data and generating a reference image. The motion estimation may be performed in a macro block size of the base image. The residual data may be down-sampled along a movement direction of an image. For example, the residual data may be down-sampled using any one of a horizontal down-sampling mode, a vertical down-sampling mode, and a quarter down-sampling mode.
A single-view video encoder according to an embodiment of the present invention includes a motion estimator for performing motion estimation based on a base image and a reference image, a residual data generator for generating residual data using blocks of the base image and the motion estimated blocks, a down-sampling unit for down-sampling the residual data, and a quantizing unit for transforming the down-sampled residual data through Discrete Cosine Transformation (DCT) and quantizing the transformed residual data.
The single-view video encoder may further include an up-sampling unit for inverse-quantizing the quantized residual data and transforming the inverse-quantized residual data through Inverse Discrete Cosine Transformation (IDCT), and up-sampling the transformed residual data, and a reference image generator for performing motion compensation using the up-sampled residual data and generating a reference image. The motion estimation may be performed in a macro block size of the base image. The residual data may be down-sampled along a movement, direction of an image. For example, the residual data may be down-sampled using any one of a horizontal down-sampling mode, a vertical down-sampling mode, and a quarter down-sampling mode.
<Single-View Video Decoding>
A single-view video decoding method according to an embodiment of the present invention includes receiving a bit stream including base image information having residual data, up-sampling the residual data, and performing motion compensation based on a reference image and the up-sampled residual data and generating a base image. The up-sampling the residual data may include decoding the residual data, inverse-quantizing the decoded residual data, and transforming the inverse-quantized residual data through Inverse Discrete Cosine Transformation (IDCT). The residual data may be up-sampled along a movement direction of an image. For example, the residual data may be up-sampled using any one of a horizontal up-sampling mode, a vertical up-sampling mode, and a quarter up-sampling mode.
A single-view video decoder according to an embodiment of the present invention includes a receiver for receiving a bit stream including base image information having residual data, an up-sampling unit for up-sampling the residual data, and a base image generator for performing motion compensation based on a reference image and the up-sampled residual data and generating a base image. The up-sampling unit may include a decoder for decoding the residual data, an inverse-quantizing unit for inverse-quantizing the decoded residual data, and an inverse discrete cosine transform (IDCT) unit for transforming the inverse-quantized residual data through IDCT. The residual data may be up-sampled along a movement direction of an image. For example, the residual data may be up-sampled using any one of a horizontal up-sampling mode, a vertical up-sampling mode, and a quarter up-sampling mode.
<Multi-View Video Encoding>
A multi-view encoding method according to an embodiment of the present invention includes performing motion and disparity estimation based on a base image, a supplementary image, and a reference image, generating residual data using the reference image and the motion and disparity estimated data, down sampling the residual data, and transforming and quantizing the down sampled residual data using a discrete cosine transformation (DCT) method.
The multi-view encoding method may further include inverse-quantizing the quantized residual data, transforming the inverse-quantized residual data through inverse discrete cosine transformation (IDCT), and up-sampling the transformed residual data, and performing motion and parity compensation using the up-sampled residual data and generating a reference image. The motion and disparity estimation may be performed in a macro block size of the base image. The residual data may be down-sampled in a movement direction of an image. For example, the residual data is down-sampled using any one of a horizontal down-sampling mode, a vertical down-sampling mode, and a quarter down-sampling mode.
A multi-view video encoder according to an embodiment of the present invention includes a motion and disparity estimator for performing motion and disparity estimation based on a base image, a supplementary image, and a reference image, a residual data generator for generating residual data using the base image and the motion and disparity estimated data, a down-sampling unit for down-sampling the residual data, and a quantizing unit for transforming the down-sampled residual data through Discrete Cosine Transformation (DCT) and quantizing the transformed residual data.
The multi-view video encoder may further include an up-sampling unit for inverse-quantizing the quantized residual data, transforming the inverse-quantized residual data through Inverse Discrete Cosine Transformation (IDCT), and up-sampling the transformed residual data, and a reference image generator for performing motion and disparity compensation using the up-sampled residual data and generating a reference image. The motion and disparity estimation may be performed in a macro block size of the base image. The residual data may be down-sampled along a movement direction of an image. For example, the residual data is down-sampled using any one of a horizontal down sampling mode, a vertical down sampling mode, and a quarter down sampling mode.
<Multi-View Video Decoding>
A multi-view video decoding method according to an embodiment of the present invention includes receiving a bit stream having base image information and supplementary image information, up-sampling the base image information, and performing motion and disparity compensation based on a reference image, the up-sampled base image information, and the supplementary image information, and generating a base image. The base image information includes residual data. The up-sampling the base image information may include decoding the base image information, inverse-quantizing the decoded base image information, and transforming the inverse-quantized base image information through inverse discrete cosine transform (IDCT). The residual data may be up-sampled along a movement direction of an image. For example, the residual data is up-sampled using any one of a horizontal up-sampling mode, a vertical up-sampling mode, and a quarter up-sampling mode.
A multi-view video decoder according to an embodiment of the present invention includes a receiver for receiving a bit stream having base image information and supplementary image information, an up-sampling unit for up-sampling the base image information, and a base image generator for performing motion and disparity compensation based on a reference image, the up-sampled base image information, and the supplementary image information, and generating a base image. The base image information may include residual data. The up-sampling unit may include a decoder for decoding the base image information, an inverse quantizer for inverse-quantizing the decoded base image information, and an inverse discrete cosine transform (IDCT) unit for transforming the inverse-quantized base image information through IDCT. The up-sampling unit up-samples the residual data along a movement direction of an image. For example, the up-sampling unit up-samples the residual data using any one of a horizontal up-sampling mode, a vertical up-sampling mode, and a quarter up-sampling mode.
The present invention is applied to single-view video encoding and decoding and multi-view video encoding and decoding for compressing data more effectively at a low bit rate.
Number | Date | Country | Kind |
---|---|---|---|
10-2007-0100610 | Oct 2007 | KR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR2008/005739 | 9/29/2008 | WO | 00 | 4/2/2010 |