This application claims priority from Korean Patent Application No. 10-2005-0061190, filed on Jul. 7, 2005, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
Apparatuses and methods consistent with the present invention relate to video encoding and decoding, and more particularly, to video encoding and decoding effective for error correction and error concealment in an error-prone environment.
2. Description of the Related Art
The development of wireless Internet has rapidly increased the availability of bandwidth in a wireless environment. Accordingly, the amount of video image data that is provided in the wireless environment has consistently increased to meet the needs of users. Such an increase in the amount of video image data enhances visual quality and quality of service (QoS). However, the increased amount of image data is accompanied by various problems, such as reproduction errors or stoppage due to transmission failures and errors that may occur during wireless transmission.
In the case of MPEG encoding, when data loss occurs in the Internet, a sky wave, or a wireless communications network due to a transmission error, the data loss is not limited only to the portion or frame where the error occurred. Rather, the error may propagate such that a number of frames or portions that may be referred to by the lost portion or frame are also affected, thereby damaging or deteriorating visual quality. To restore such loss, various techniques have been suggested, including a data restoration technique on a transmission channel such as forward error correction (FEC) and automatic retransmission request (ARQ), a source coding technique such as multiple description coding and scalable video coding, and a post-processing technique such as error concealment.
Referring to
An enhancement-layer bit stream 127 is generated as follows. The raw video 101 is down-sampled, DCT transformed, and quantized. The quantized video is reconstructed by inverse quantization 113 and inverse discrete cosine transform (IDCT) 115. Then, up-sampling 117 is performed on the IDCT transformed video. The up-sampled video is subtracted from the raw video 101, and DCT 121 is performed on a residual image which is obtained after subtraction 119. The DCT transformed residual image is quantized using a quantization parameter, which is smaller than the quantization parameter used on the base layer 123. The quantized bits are coded by VLC 125 to generate the enhancement layer bit stream 127.
Although not shown, for the base-layer encoding, motion estimation is performed between the down-sampling 103 and the DCT 105, and motion compensation is performed between the IDCT 115 and the up-sampling 117. For the enhancement-layer encoding, motion estimation is performed between the subtraction 119 and the DCT 121. An encoded frame is decoded and then stored in a frame buffer (not shown) to be referred to for motion estimation.
In the case of non-scalable video encoding, the raw video 101 is transformed into a compressed bit stream after the DCT 105, the quantization 107, and the VLC 109.
Referring to
The base-layer bit stream 201 is VLD transformed into 8×8 blocks in a frequency domain, and a high-frequency domain is reconstructed through the inverse quantization 205. After the IDCT 207, the video of the frequency domain is reconstructed into the video of an image domain.
The video obtained from the base layer is up-sampled to its original size before being down-sampled, and the up-sampled video is combined with a video obtained after the enhancement-layer bit stream 221 is processed in the enhancement layer. The combined video is clipped, and the high visual quality video 219 is generated as a result. Although not shown, in the base-layer decoding, motion compensation is performed between the IDCT 207 and clipping 209. Motion compensation is also performed between the IDCT 227 and addition 215.
In the case of non-scalable decoding, a compressed video stream is decoded into a video after the VLD 203, the inverse quantization 205, and the IDCT 207.
Referring to
In
In scalable coding, a base-layer frame may be used as a reference frame of the enhancement layer such that a transmission error that occurs in the enhancement layer is not spread. However, more bandwidth may be used in this case than when an enhancement-layer frame is referred to in the enhancement layer.
If a minimum temporal difference between a reference picture and a current picture is time t1 in a frame encoding method, the minimum temporal difference may be half the time t1 in the field encoding method. Therefore, for images with a lot of motion, it is effective to use the field encoding method for motion compensation prediction encoding, which, in turn, enhances encoding efficiency.
In the MPEG-2 standard, when a bit steam having the field structure is encoded using a single scalable encoding method or a spatially scalable encoding method, each field cannot be stored or transmitted as an independent stream. In addition, field pictures derived from a frame must be successively placed and transmitted. In this regard, if errors occur, they cannot be easily corrected.
As illustrated in
Accordingly, a field structure which is more effective for error concealment or achieves a higher compression than a top/bottom field structure which uses an interlaced scanning method needs to be developed in consideration of a texture structure of a screen. In addition, the following problems must be overcome. When a bit stream having the field structure which is applied to video encoding is transmitted, field pictures derived from a frame must be successively transmitted, thereby making error correction difficult. Further, in the spatially scalable encoding, pictures of various types are arranged in top and bottom fields of the base layer and the enhancement layer without considering bit-rate distribution. Thus, there is a high probability of a peak bit rate, which, in turn, increases an error rate. Finally, even when only one of a top field and a bottom field has an error, since top and bottom fields are allowed to refer to each other, the error is spread to a picture in the other field which refers to the field having the error.
The present invention provides video encoding and decoding methods and apparatuses which can increase a compression rate and are effective for error correction and error concealment by adaptively applying an encoding method according to characteristics of a video.
The present invention also provides video encoding and decoding methods and apparatuses which are effective for error correction by improving a predictive reference method of a field structure.
The present invention also provides video encoding and decoding methods and apparatuses which minimize an amount of data of frames affected by a loss that occurs during transmission and are effective for error concealment.
According to an aspect of the present invention, there is provided a video encoding method including: determining an encoding operation to encode raw video among a frame encoding operation, a top/bottom field encoding operation, and a left/right field encoding operation based on at least one characteristic of the raw video; and generating a bit stream by adaptively performing the encoding operation that is determined to encode the raw video.
According to another aspect of the present invention, there is provided a video encoding method including: encoding a raw video in a base layer and generating a base-layer bit stream; determining an encoding operation to encode the raw video among a frame enhancement-layer encoding operation, a top/bottom field enhancement-layer encoding operation, and a left/right field enhancement-layer encoding operation based on at least one characteristic of the raw video; and generating an enhancement-layer bit stream by adaptively performing the enhancement-layer encoding operation that is determined to encode the raw video.
According to another aspect of the present invention, there is provided a video decoding method including: determining an encoding operation of encoding a bit stream that is received; and generating a decoded video by adaptively performing one of a frame decoding operation, a top/bottom field decoding operation, and a left/right field decoding operation based on the encoding operation that is determined.
According to another aspect of the present invention, there is provided a video decoding method including: decoding a base-layer bit stream and generating a base-layer video; determining an encoding operation of encoding an enhancement-layer bit stream that is received separately from the base-layer bit stream; and generating an enhancement-layer video by adaptively performing one of a frame enhancement-layer decoding operation, a top/bottom field enhancement-layer decoding operation, and a left/right field enhancement-layer decoding method operation on the enhancement-layer bit stream that is received according to the encoding operation that is determined.
According to another aspect of the present invention, there is provided a video encoding apparatus including: an encoding operation determiner which determines an encoding operation to encode raw video among a frame encoding operation, a top/bottom field encoding operation, and a left/right field encoding operation based on at least one characteristic of the raw video; and an encoder which generates a bit stream by adaptively performing the encoding operation that is determined to encode the raw video.
According to another aspect of the present invention, there is provided a video encoding apparatus including: a base-layer encoder which encodes a raw video in a base layer and generates a base-layer bit stream; an encoding operation determiner which determines an encoding operation to encode the raw video among a frame enhancement-layer encoding operation, a top/bottom field enhancement-layer encoding operation, and a left/right field enhancement-layer encoding operation based on at least one characteristic of the raw video; and an enhancement-layer encoder which generates an enhancement-layer bit stream by adaptively performing the enhancement-layer encoding operation that is determined to encode the raw video.
According to another aspect of the present invention, there is provided a video decoding apparatus including: an encoding method determiner that determines an encoding operation of encoding a bit stream that is received; and a decoder which generates a decoded video by adaptively performing one of a frame decoding operation, a top/bottom field decoding operation, and a left/right field decoding operation according to the encoding operation that is determined.
According to another aspect of the present invention, there is provided a video decoding apparatus including: a base-layer decoder that decodes a base-layer bit stream and generates a base-layer video; an encoding method determiner that determines an encoding operation to encode an enhancement-layer bit stream that is received separately from the base-layer bit stream; and an enhancement-layer decoder that generates an enhancement-layer video by adaptively performing one of a frame enhancement-layer decoding operation, a top/bottom field enhancement-layer decoding operation, and a left/right field enhancement-layer decoding operation on the enhancement-layer bit stream according to the decoding operation that is determined.
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
Exemplary embodiments of the present invention will be described more fully with reference to the accompanying drawings. The invention may, however, be embodied in many different forms and should not be construed as being limited to the exemplary embodiments set forth therein; rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the invention to those skilled in the art.
When an input video has more high-frequency components in a horizontal direction, the input video is divided into top and bottom fields and encoded accordingly. However, when the input video has more high-frequency components in a vertical direction, it is effective to divide the input video into left and right fields and encode the input image accordingly for error correction. Therefore, in the case of a video that has many left and right motion vectors, it is efficient to use the left/right field structure and adaptively encode the video according to the characteristics of the left/right field structure.
The video encoder 800 includes a frame encoder 820, a top/bottom field encoder 830 and a left/right field encoder 840. The video encoder 800 selects one of the frame encoder 820, the top/bottom field encoder 830 and the left/right field encoder 840 and performs adaptive encoding.
The encoding method determiner 810 may determine an encoding method in a number of ways. For instance, the encoding method determiner 810 may select an encoding method which can achieve a highest compression rate. Alternatively, the encoding method determiner 810 may determine an encoding method based on characteristics of overall motion vectors. Specifically, a frame encoding method is used when there are few motion vectors, a left/right field encoding method is used when there are a many horizontal motion vectors, and a top/bottom field encoding method is used when there are many vertical motion vectors.
To determine which method is to be used, motion vectors are measured and the measured motion vectors added. If a horizontal motion of the added motion vectors exceeds a predetermined threshold, the left/right field encoding method is used. If a vertical motion of the added motion vectors exceeds a predetermined threshold, the top/bottom field encoding method is used.
When the left/right field encoding method is used, even if there is an error in one of the left and right fields, the lost field can be more effectively restored using the other field than in the top/bottom field encoding method since there are many horizontal motion vectors. The same is true when the top/bottom field encoding method is used since there are many vertical motion vectors.
The frame encoder 820 encodes a video into a compressed bit stream through a discrete cosine transform (DCT), quantization, and variable length coding (VLC). The top/bottom field encoder 830 divides the frame 401 (see
When a video is encoded using the field encoding method, two bit streams are generated. While field pictures derived from a frame are successively placed and transmitted in the conventional field encoding method, each bit stream is transmitted independently through a transmission channel in the field encoding method according to an exemplary embodiment of the present invention. Each bit stream may be independently transmitted using a different network device, or a different transmission control protocol (TCP) or user datagram protocol (UDP) port. Different priorities may be given to transmission packets of bit streams, and the bit streams may be transmitted accordingly. Since bit streams are transmitted independently through different transmission channels, error correction can be enhanced.
The video encoder 900 performs base-layer encoding and enhancement-layer encoding. For the base-layer encoding, the video encoder 900 performs operations illustrated in
An enhancement-layer encoding method determiner 930 determines an enhancement-layer encoding method using the methods used by the encoding method determiner 810 of
The frame enhancement-layer encoder 940 performs the operations illustrated in
The top/bottom field enhancement-layer encoder 950 divides the frame 401 (see
The base-layer encoder 910 generates a base-layer bit stream, whereas each of the frame enhancement-layer encoder 940, the top/bottom field enhancement-layer encoder 950, and the left/right field enhancement-layer encoder 960 generates one or two enhancement-layer bit streams. In other words, the frame enhancement-layer encoder 940 generates an enhancement-layer bit stream 945, the top/bottom field enhancement-layer encoder 950 generates bit streams 951 and 953 for top and bottom fields, respectively, and the left/right field enhancement-layer encoder 960 generates bit streams 961 and 963 for left and right fields.
The bit streams thus generated is independently transmitted through different transmission channels. As described above with reference to
The video decoder 1000 includes a frame decoder 1030, a top/bottom field decoder 1040, and a left/right field decoder 1050. The frame decoder 1030 variable-length-decodes, inversely quantizes, and IDCT transforms a received bit stream and generates a decoded video.
The top/bottom field decoder 1040 and the left/right field decoder 1050 receive bit streams for two fields from the transmission channel 1010 via the encoding method determiner 1020. The top/bottom field decoder 1040 and the left/right field decoder 1050 variable-length-decode, inversely quantize, and IDCT transform the bit streams. Then, the top/bottom field decoder 1040 generates two field-based video, combines the two field-based video into one video, and transmits the video to a display unit 1060, which displays the video.
Referring to
When the enhancement-layer bit stream 1120 is transmitted to an encoding method determiner 1130 separately from the base-layer bit stream 1115, the encoding method determiner 1130 determines an encoding method of the enhancement-layer bit stream 1120. The encoding method determiner 1130 may determine the encoding method by interpreting information regarding an encoding method which is added to each bit stream transmitted, or by using other methods. The enhancement-layer bit stream 1120 is transmitted to one of the frame enhancement-layer decoder 1140, the top/bottom field enhancement-layer decoder 1150 and the left/right field enhancement-layer decoder 1160 and is adaptively decoded according to the determined encoding method.
The down-sampled video 1171, which was decoded by the base-layer decoder 1170, is up-sampled 1175 to provide an up-sampled video 1177. Then, the up-sampled video 1177 is transmitted to the frame enhancement-layer decoder 1140, the top/bottom field enhancement-layer decoder 1150 and the left/right field enhancement-layer decoder 1160.
The enhancement-layer decoder 1100 includes the frame enhancement-layer decoder 1140, the top/bottom field enhancement-layer decoder 1150, and the left/right field enhancement-layer decoder 1160. As illustrated in
The top/bottom field enhancement-layer decoder 1150 or the left/right field enhancement-layer decoder 1160 receives and decodes each of the enhancement-layer bit streams for two fields, and combines the decoded enhancement-layer bit streams into one video. This combining process is performed between performing the IDCT 227 on the enhancement-layer bit streams and combining the enhancement layer bit stream with the up-sampled video 1177 generated by the base-layer decoder 1170.
An encoding method of the received enhancement-layer bit stream is determined (S1530). Operation S1530 may be performed simultaneously with operation S1510 in which the base-layer bit stream is decoded. The received enhancement-layer bit stream is decoded adaptively using one of the frame enhancement-layer decoding method, the top/bottom field enhancement-layer decoding method, or the left/right field enhancement-layer decoding method according to the determined encoding method, and a decoded enhancement-layer video is generated (S1550).
In
I frames 1601 and 1604 in the base layer may be independently encoded and decoded, and P frames 1602, 1603, and 1605 are encoded by encoding respective differences between P frames 1602, 1603, and 1605 and previous frames. When spatially scalable encoding the base layer, since the size of a screen is smaller than a raw video, only I and P frames need to be used, thereby generating a lower bit rate. Therefore, when pictures are arranged in the base layer in order of I, P, P, I, P, P, I, P, and P as illustrated in
In the case of the top/bottom field encoder 950; the left/right field encoder 960 and the top/bottom field enhancement-layer decoder 1150; the left/right field enhancement-layer decoder 1160, when pictures are arranged in the enhancement layer as illustrated in
Referring to
Left and right field pictures in the B field picture also do not refer to one another to prevent the spread of an error, such as the loss of a frame. Therefore, as in the P field, right field pictures only refer to other right field pictures, and left field pictures only refer to other left field pictures. This rule applies to a case where a channel has many errors. However, the rule does not apply to a case where a channel has few errors or minor errors that a user cannot recognize or in an environment where errors can be reduced through channel error correction coding such as forward error correction (FEC). Thus, in such cases, left field and right field pictures can refer to one another. The same concept applies to the top/bottom field structure in top/bottom encoding and decoding
As described above with reference to the exemplary embodiments of the present invention, a video may be encoded using a left/right field encoding method. Thus, when an error occurs in an image having many horizontal motion vectors, the error can be effectively corrected.
In addition, an image may be divided into frame pictures, top/bottom field pictures and left/right field pictures, and encoded accordingly. Therefore, a compression rate can be increased, and error correction and concealment can be effectively performed on a lost field using a normally received field.
Furthermore, bit rates are flattened such that errors that may occur during transmission are not concentrated in important frames such as an I frame. In a field encoding method according to an exemplary embodiment of the present invention, top and bottom fields do not refer to each other, and an enhancement-layer bit stream and a base-layer bit stream which are generated using a spatially scalable encoding method are independently transmitted. Hence, errors that may occur during transmission can be effectively corrected and concealed.
The exemplary embodiments of the present invention can also be implemented as computer-readable code on a computer-readable recording medium. The computer-readable recording medium may be any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet).
The computer-readable recording medium can also be distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Also, functional programs, code, and code segments for accomplishing the present invention can be easily construed by programmers skilled in the art to which the present invention pertains.
The above description is illustrative and not restrictive. While exemplary embodiments of the present invention has been particularly shown and described, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims along with their full scope of equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2005-0061190 | Jul 2005 | KR | national |