The following co-pending U.S. patent applications relate to the present application and are hereby incorporated herein by reference: 1) U.S. patent application Ser. No. 10/622,378, entitled, “Advanced Bi-Directional Predictive Coding of Video Frames,” filed concurrently herewith; and 2) U.S. patent application Ser. No. 10/622,841, entitled, “Coding of Motion Vector Information,” filed concurrently herewith.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by any one of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
Techniques and tools for interlace coding and decoding in interframes and intraframes are described. For example, a video encoder encodes macroblocks in an interlaced video frame in a 4:1:1 format.
Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 frames per second. Each frame can include tens or hundreds of thousands of pixels (also called pels). Each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel with 24 bits. Thus, the number of bits per second, or bit rate, of a typical raw digital video sequence can be 5 million bits/second or more.
Most computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression can be lossless, in which quality of the video does not suffer but decreases in bit rate are limited by the complexity of the video. Or, compression can be lossy, in which quality of the video suffers but decreases in bit rate are more dramatic. Decompression reverses compression.
In general, video compression techniques include intraframe compression and interframe compression. Intraframe compression techniques compress individual frames, typically called I-frames or key frames. Interframe compression techniques compress frames with reference to preceding and/or following frames, which are typically called predicted frames, P-frames, or B-frames.
Microsoft Corporation's Windows Media Video, Version 8 [“WMV8”] includes a video encoder and a video decoder. The WMV8 encoder uses intraframe and interframe compression, and the WMV8 decoder uses intraframe and interframe decompression.
A. Intraframe Compression in WMV8
The encoder then quantizes 120 the DCT coefficients, resulting in an 8×8 block of quantized DCT coefficients 125. For example, the encoder applies a uniform, scalar quantization step size to each coefficient. Quantization is lossy. Since low frequency DCT coefficients tend to have higher values, quantization results in loss of precision but not complete loss of the information for the coefficients. On the other hand, since high frequency DCT coefficients tend to have values of zero or close to zero, quantization of the high frequency coefficients typically results in contiguous regions of zero values. In addition, in some cases high frequency DCT coefficients are quantized more coarsely than low frequency DCT coefficients, resulting in greater loss of precision/information for the high frequency DCT coefficients.
The encoder then prepares the 8×8 block of quantized DCT coefficients 125 for entropy encoding, which is a form of lossless compression. The exact type of entropy encoding can vary depending on whether a coefficient is a DC coefficient (lowest frequency), an AC coefficient (other frequencies) in the top row or left column, or another AC coefficient.
The encoder encodes the DC coefficient 126 as a differential from the DC coefficient 136 of a neighboring 8×8 block, which is a previously encoded neighbor (e.g., top or left) of the block being encoded. (
The entropy encoder can encode the left column or top row of AC coefficients as a differential from a corresponding column or row of the neighboring 8×8 block.
The encoder scans 150 the 8×8 block 145 of predicted, quantized AC DCT coefficients into a one-dimensional array 155 and then entropy encodes the scanned AC coefficients using a variation of run length coding 160. The encoder selects an entropy code from one or more run/level/last tables 165 and outputs the entropy code.
B. Interframe Compression in WMV8
Interframe compression in the WMV8 encoder uses block-based motion compensated prediction coding followed by transform coding of the residual error.
For example, the WMV8 encoder splits a predicted frame into 8×8 blocks of pixels. Groups of four 8×8 blocks form macroblocks. For each macroblocks, a motion estimation process is performed. The motion estimation approximates the motion of the macroblock of pixels relative to a reference frame, for example, a previously coded, preceding frame. In
The encoder then prepares the 8×8 block 355 of quantized DCT coefficients for entropy encoding. The encoder scans 360 the 8×8 block 355 into a one dimensional array 365 with 64 elements, such that coefficients are generally ordered from lowest frequency to highest frequency, which typically creates long runs of zero values.
The encoder entropy encodes the scanned coefficients using a variation of run length coding 370. The encoder selects an entropy code from one or more run/level/last tables 375 and outputs the entropy code.
In summary of
The amount of change between the original and reconstructed frame is termed the distortion and the number of bits required to code the frame is termed the rate for the frame. The amount of distortion is roughly inversely proportional to the rate. In other words, coding a frame with fewer bits (greater compression) will result in greater distortion, and vice versa.
C. Bi-Directional Prediction
Bi-directionally coded images (e.g., B-frames) use two images from the source video as reference (or anchor) images. For example, referring to
Some conventional encoders use five prediction modes (forward, backward, direct, interpolated and intra) to predict regions in a current B-frame. In intra mode, an encoder does not predict a macroblock from either reference image, and therefore calculates no motion vectors for the macroblock. In forward and backward modes, an encoder predicts a macroblock using either the previous or future reference frame, and therefore calculates one motion vector for the macroblock. In direct and interpolated modes, an encoder predicts a macroblock in a current frame using both reference frames. In interpolated mode, the encoder explicitly calculates two motion vectors for the macroblock. In direct mode, the encoder derives implied motion vectors by scaling the co-located motion vector in the future reference frame, and therefore does not explicitly calculate any motion vectors for the macroblock.
D. Interlace Coding
A typical interlaced video frame consists of two fields scanned at different times. For example, referring to
E. Standards for Video Compression and Decompression
Aside from WMV8, several international standards relate to video compression and decompression. These standards include the Motion Picture Experts Group [“MPEG”] 1, 2, and 4 standards and the H.261, H.262, and H.263 standards from the International Telecommunication Union [“ITU”]. Like WMV8, these standards use a combination of intraframe and interframe compression. The MPEG 4 standard describes coding of macroblocks in 4:2:0 format using, for example, frame DCT coding, where each luminance block is composed of lines from two fields alternately, and field DCT coding, where each luminance block is composed of lines from only one of two fields.
Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, they do not have the advantages of the following techniques and tools.
In summary, the detailed description is directed to various techniques and tools for encoding and decoding video images (e.g., interlaced frames). The various techniques and tools can be used in combination or independently.
In one aspect, macroblocks (e.g., in an interlaced video image) in a 4:1:1 format are processed. The 4:1:1 macroblocks comprise four 8×8 luminance blocks and four 4×8 chrominance blocks. The processing (e.g., video encoding or decoding) includes intra-frame and inter-frame processing. The macroblocks can be frame-coded macroblocks, or field-coded macroblocks having a top field and a bottom field.
In another aspect, a video encoder classifies a macroblock in an interlaced video image as a field-coded macroblock with a top field and a bottom field. The encoder encodes the top field and the bottom field using either an intra-coding mode or an inter-coding mode for each field. The coding modes used for encoding the top and bottom fields are selected independently of one another.
In another aspect, a video encoder sends encoded blocks in field order for a first field (e.g., an inter-coded field) and a second field (e.g., an intra-coded field) in a field-coded macroblock. The acts of sending encoded blocks in field order facilitate encoding the first field and the second field independently from one another. Intra-coded fields can be encoded using DC/AC prediction.
In another aspect, a video decoder receives encoded blocks in field order for a first encoded field and a second encoded field in a field-coded macroblock, and decodes the encoded fields. Receiving encoded blocks in field order facilitates decoding the first and second encoded fields independently from one another.
In another aspect, a video decoder finds a DC differential for a current block in the intra-coded field, finds a DC predictor for the current block, and obtains a DC value for the current block by adding the DC predictor to the DC differential. The intra-coded field is decoded independently from the second field.
In another aspect, a video decoder finds a DC differential for a current block in an intra-coded field and selects a DC predictor from a group of candidate DC predictors. The group of candidate DC predictors comprises DC values from blocks (e.g., previously decoded blocks) adjacent to the current block (e.g., the top, top-left, or left adjacent blocks). A candidate DC predictor is considered missing if it is not intra-coded, or if it is outside a picture boundary. The selected DC predictor is a non-missing candidate DC predictor.
In another aspect, a video encoder performs DC prediction for a current block in an interlaced macroblock and selectively enables AC prediction blocks in the macroblock. When the AC prediction is enabled, AC coefficients can be selected for differential coding based on the selected DC predictor for the current block. AC prediction can be signaled in a bit stream (e.g., with flags indicating whether AC prediction is performed for all blocks in a frame macroblock, or whether AC prediction is performed for blocks in a field in a field macroblock).
In another aspect, a video encoder finds a motion vector for an inter-coded field in a macroblock and encodes the macroblock using the motion vector for the first field, where the second field in the macroblock is an intra-coded field.
In another aspect, a video encoder finds a motion vector predictor for predicting a motion vector for a first field from among a group of candidate predictors. The candidate predictors are motion vectors for neighboring macroblocks, and the motion vector predictor is a motion vector for one corresponding field in a neighboring field-coded macroblock comprising two fields. The encoder calculates a motion vector for the first field using the motion vector predictor, and encodes the macroblock using the calculated motion vector. For example, the first field is a top field, and the one corresponding field in the neighboring field-coded macroblock is a top field.
In another aspect, a 4:1:1 macroblock in an interlaced video image is processed (e.g., in an encoder or decoder) by finding a luminance motion vector for the macroblock and deriving a chrominance motion vector for the macroblock from the luminance motion vector. The deriving can include scaling down the luminance motion vector by a factor of four. The chrominance motion vector can be rounded (e.g., to quarter-pixel resolution) and can be pulled back if it references an out-of-frame region in a reference frame.
In another aspect, a video decoder decodes a motion vector for a current interlaced macroblock (e.g., a frame or field macroblock) and obtains a prediction macroblock for the current macroblock using the decoded motion vector. The obtaining includes performing bi-cubic interpolation to obtain sub-pixel displacement for the current macroblock.
In another aspect, a 4:1:1 macroblock in a bi-directionally predicted video image (e.g., an interlaced image) is processed. The macroblock can be frame-coded macroblock (having up to two associated motion vectors) or field-coded (having up to four associated motion vectors). Direct mode macroblocks can also be classified as frame-type or field-type macroblocks.
Additional features and advantages will be made apparent from the following detailed description of different embodiments that proceeds with reference to the accompanying drawings.
The present application relates to techniques and tools for efficient compression and decompression of interlaced video. In various described embodiments, a video encoder and decoder incorporate techniques for encoding and decoding interlaced video frames, and signaling techniques for use in a bit stream format or syntax comprising different layers or levels (e.g., sequence level, frame/picture/image level, macroblock level, and/or block level).
The various techniques and tools can be used in combination or independently. Different embodiments implement one or more of the described techniques and tools.
I. Computing Environment
With reference to
A computing environment may have additional features. For example, the computing environment 700 includes storage 740, one or more input devices 750, one or more output devices 760, and one or more communication connections 770. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 700. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 700, and coordinates activities of the components of the computing environment 700.
The storage 740 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 700. The storage 740 stores instructions for the software 780 implementing the video encoder or decoder.
The input device(s) 750 may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 700. For audio or video encoding, the input device(s) 750 may be a sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD-ROM or CD-RW that reads audio or video samples into the computing environment 700. The output device(s) 760 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing environment 700.
The communication connection(s) 770 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
The techniques and tools can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 700, computer-readable media include memory 720, storage 740, communication media, and combinations of any of the above.
The techniques and tools can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
For the sake of presentation, the detailed description uses terms like “indicate,” “choose,” “obtain,” and “apply” to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
II. Generalized Video Encoder and Decoder
The relationships shown between modules within the encoder and decoder indicate the main flow of information in the encoder and decoder; other relationships are not shown for the sake of simplicity. In particular,
The encoder 800 and decoder 900 are block-based and use a 4:1:1 macroblock format. Each macroblock includes four 8×8 luminance blocks and four 4×8 chrominance blocks. Further details regarding the 4:1:1 format are provided below. The encoder 800 and decoder 900 also can use a 4:2:0 macroblock format with each macroblock including four 8×8 luminance blocks (at times treated as one 16×16 macroblock) and two 8×8 chrominance blocks. Alternatively, the encoder 800 and decoder 900 are object-based, use a different macroblock or block format, or perform operations on sets of pixels of different size or configuration.
Depending on implementation and the type of compression desired, modules of the encoder or decoder can be added, omitted, split into multiple modules, combined with other modules, and/or replaced with like modules. In alternative embodiments, encoder or decoders with different modules and/or other configurations of modules perform one or more of the described techniques.
A. Video Encoder
The encoder system 800 compresses predicted frames and key frames. For the sake of presentation,
A predicted frame (also called P-frame, B-frame, or inter-coded frame) is represented in terms of prediction (or difference) from one or more reference (or anchor) frames. A prediction residual is the difference between what was predicted and the original frame. In contrast, a key frame (also called l-frame, intra-coded frame) is compressed without reference to other frames.
If the current frame 805 is a forward-predicted frame, a motion estimator 810 estimates motion of ruacroblocks or other sets of pixels of the current frame 805 with respect to a reference frame, which is the reconstracted previous frame 825 buffered in a frame store (e.g., frame store 820). If the current frame 805 is a bi-directionally-predicted frame (a B-frame), a motion estimator 810 estimates motion in the current frame 805 with respect to two reconstructed reference frames. Typically, a motion estimator estimates motion in a B-frame with respect to a temporally previous reference frame and a temporally future reference frame. Accordingly, the encoder system 800 can comprise separate stores 820 and 822 Lot backward and forward reference frames. For more information on bi-directionally predicted frames, see U.S. patent application Ser. No. 10/622,378, entitled, “Advanced Bi-Directional Predictive Coding of Video Frames,” filed concurrently herewith.
The motion estimator 810 can estimate motion by pixel, ½ pixel, ¼ pixel, or other increments, and can switch the resolution of the motion estimation on a frame-by-frame basis or other basis. The resolution of the motion estimation can be the same or different horizontally and vertically. The motion estimator 810 outputs as side information motion information 815 such as motion vectors. A motion compensator 830 applies the motion information 815 to the reconstructed frame(s) 825 to form a motion-compensated current frame 835. The prediction is rarely perfect, however, and the difference between the motion-compensated current frame 835 and the original current frame 805 is the prediction residual 845. Alternatively, a motion estimator and motion compensator apply another type of motion estimation/compensation.
A frequency transformer 860 converts the spatial domain video information into frequency domain (i.e., spectral) data. For block-based video frames, the frequency transformer 860 applies a discrete cosine transform [“DCT”] or variant of DCT to blocks of the pixel data or prediction residual data, producing blocks of DCT coefficients. Alternatively, the frequency transformer 860 applies another conventional frequency transform such as a Fourier transform or uses wavelet or subband analysis. If the encoder uses spatial extrapolation (not shown in
A quantizer 870 then quantizes the blocks of spectral data coefficients. The quantizer applies uniform, scalar quantization to the spectral data with a step-size that varies on a frame-by-frame basis or other basis. Alternatively, the quantizer applies another type of quantization to the spectral data coefficients, for example, a non-uniform, vector, or non-adaptive quantization, or directly quantizes spatial domain data in an encoder system that does not use frequency transformations. In addition to adaptive quantization, the encoder 800 can use frame dropping, adaptive filtering, or other techniques for rate control.
If a given macroblock in a predicted frame has no information of certain types (e.g., no motion information for the macroblock and no residual information), the encoder 800 may encode the macroblock as a skipped macroblock. If so, the encoder signals the skipped macroblock in the output bit stream of compressed video information 895.
When a reconstructed current frame is needed for subsequent motion estimation/compensation, an inverse quantizer 876 performs inverse quantization on the quantized spectral data coefficients. An inverse frequency transformer 866 then performs the inverse of the operations of the frequency transformer 860, producing a reconstructed prediction residual (for a predicted frame) or a reconstructed key frame. If the current frame 805 was a key frame, the reconstructed key frame is taken as the reconstructed current frame (not shown). If the current frame 805 was a predicted frame, the reconstructed prediction residual is added to the motion-compensated current frame 835 to form the reconstructed current frame. A frame store (e.g., frame store 820 ) buffers the reconstructed current frame for use in predicting another frame. In some embodiments, the encoder applies a deblocking filter to the reconstructed frame to adaptively smooth discontinuities in the blocks of the frame.
The entropy coder 880 compresses the output of the quantizer 870 as well as certain side information (e.g., motion information 815, spatial extrapolation modes, quantization step size). Typical entropy coding techniques include arithmetic coding, differential coding, Huffman coding, run length coding, LZ coding, dictionary coding, and combinations of the above. The entropy coder 880 typically uses different coding techniques for different kinds of information (e.g., DC coefficients, AC coefficients, different kinds of side information), and can choose from among multiple code tables within a particular coding technique.
The entropy coder 880 puts compressed video information 895 in the buffer 890. A buffer level indicator is fed back to bit rate adaptive modules.
The compressed video information 895 is depleted from the buffer 890 at a constant or relatively constant bit rate and stored for subsequent streaming at that bit rate. Therefore, the level of the buffer 890 is primarily a function of the entropy of the filtered, quantized video information, which affects the efficiency of the entropy coding. Alternatively, the encoder system 800 streams compressed video information immediately following compression, and the level of the buffer 890 also depends on the rate at which information is depleted from the buffer 890 for transmission.
Before or after the buffer 890, the compressed video information 895 can be channel coded for transmission over the network. The channel coding can apply error detection and correction data to the compressed video information 895.
B. Video Decoder
The decoder system 900 decompresses predicted frames and key frames. For the sake of presentation,
A buffer 990 receives the information 995 for the compressed video sequence and makes the received information available to the entropy decoder 980. The buffer 990 typically receives the information at a rate that is fairly constant over time, and includes a jitter buffer to smooth short-term variations in bandwidth or transmission. The buffer 990 can include a playback buffer and other buffers as well. Alternatively, the buffer 990 receives information at a varying rate. Before or after the buffer 990, the compressed video information can be channel decoded and processed for error detection and correction.
The entropy decoder 980 entropy decodes entropy-coded quantized data as well as entropy-coded side information (e.g., motion information 915, spatial extrapolation modes, quantization step size), typically applying the inverse of the entropy encoding performed in the encoder. Entropy decoding techniques include arithmetic decoding, differential decoding, Huffman decoding, run length decoding, LZ decoding, dictionary decoding, and combinations of the above. The entropy decoder 980 frequently uses different decoding techniques for different kinds of information (e.g., DC coefficients, AC coefficients, different kinds of side information), and can choose from among multiple code tables within a particular decoding technique.
A motion compensator 930 applies motion information 915 to one or more reference frames 925 to form a prediction 935 of the frame 905 being reconstructed. For example, the motion compensator 930 uses a macroblock motion vector to find a macroblock in a reference frame 925. A frame buffer (e.g., frame buffer 920) stores previously reconstructed frames for use as reference frames. Typically, B-frames have more than one reference frame (e.g., a temporally previous reference frame and a temporally future reference frame). Accordingly, the decoder system 900 can comprise separate frame buffers 920 and 922 for backward and forward reference frames.
The motion compensator 930 can compensate for motion at pixel, ½ pixel, ¼ pixel, or other increments, and can switch the resolution of the motion compensation on a frame-by-frame basis or other basis. The resolution of the motion compensation can be the same or different horizontally and vertically. Alternatively, a motion compensator applies another type of motion compensation. The prediction by the motion compensator is rarely perfect, so the decoder 900 also reconstructs prediction residuals.
When the decoder needs a reconstructed frame for subsequent motion compensation, a frame buffer (e.g., frame buffer 920 ) buffers the reconstructed frame for use in predicting another frame. In some embodiments, the decoder applies a deblocking filter to the reconstructed frame to adaptively smooth discontinuities in the blocks of the frame.
An inverse quantizer 970 inverse quantizes entropy-decoded data. In general, the inverse quantizer applies uniform, scalar inverse quantization to the entropy-decoded data with a step-size that varies on a frame-by-frame basis or other basis. Alternatively, the inverse quantizer applies another type of inverse quantization to the data, for example, a non-uniform, vector, or non-adaptive quantization, or directly inverse quantizes spatial domain data in a decoder system that does not use inverse frequency transformations.
An inverse frequency transformer 960 converts the quantized, frequency domain data into spatial domain video information. For block-based video frames, the inverse frequency transformer 960 applies an inverse DCT [“IDCT”] or variant of IDCT to blocks of the DCT coefficients, producing pixel data or prediction residual data for key frames or predicted frames, respectively. Alternatively, the frequency transformer 960 applies another conventional inverse frequency transform such as a Fourier transform or uses wavelet or subband synthesis. If the decoder uses spatial extrapolation (not shown in
When a skipped macroblock is signaled in the bit stream of information 995 for a compressed sequence of video frames, the decoder 900 reconstructs the skipped macroblock without using the information (e.g., motion information and/or residual information) normally included in the bit stream for non-skipped macroblocks.
III. Interlace Coding
Interlaced content (such as the interlaced content prevalent in the television industry) is an important consideration in video encoding and decoding applications. Accordingly, described embodiments include techniques and tools for efficient compression and decompression of interlaced video.
As explained above, a typical interlaced video frame consists of two fields (e.g., a top field and a bottom field) scanned at different times. Described embodiments exploit this property and perform efficient compression by selectively compressing different regions of the image using different techniques. Typically, it is more efficient to encode stationary regions as a whole (frame coding). On the other hand, it is often more efficient to code moving regions by fields (field coding). Therefore, in described embodiments, macroblocks in an image can be encoded either as frame macroblocks or field macroblocks. Frame macroblocks are typically more suitable for stationary regions. Field macroblocks are typically more suitable for moving regions because the two fields in the macroblock tend to have different motion, and each field tends to have a higher correlation with itself than with the other field. Some described embodiments focus on field macroblock encoding for both intra-coded frames and inter-coded frames.
The features of the described embodiments include:
In some embodiments, a video encoder/decoder processes macroblocks in a 4:1:1 macroblock format.
The 4:1:1 format differs from the 4:2:0 format in the arrangement of the chrominance samples. Both 4:1:1 and 4:2:0 macroblocks have four 8×8 luminance blocks. A 4:2:0 macroblock has two 8×8 chrominance blocks, one for each of the U and V channels. The U and V channels are therefore sub-sampled by a factor of two in both the vertical and horizontal dimensions. However, a 4:1:1 macroblock has four 4×16 chrominance blocks, two for each of the U and V channels. The 4:1:1 format preserves the field structure in the chrominance domain and has a better chrominance sub-sampling ratio, which results in accurate reconstruction of moving color regions in interlaced video.
Macroblocks in interlaced frames can be classified as frame macroblocks or field macroblocks.
As explained above, in interlaced frames, the top field lines and the bottom field lines are scanned at different times. Referring again to
After a 4:1:1 macroblock is classified as a frame macroblock or a field macroblock, it is subdivided into blocks. For example,
B. Independent Coding of Macroblock Fields
In some embodiments, one field in a field-coded macroblock is capable of being inter-coded or intra-coded regardless of how the other field in the macroblock was encoded. This allows the macroblock to contain one inter-coded field and one intra-coded field, rather than being restricted to being entirely intra-coded or inter-coded. This flexibility is helpful, for example, in scene transitions where the two fields of an interlaced frame are from different scenes. One field (e.g., a field in a macroblock corresponding to a newly introduced scene) can be intra-coded while the other field (e.g., a field corresponding to a previous scene) can be inter-coded (i.e., predicted from other frames).
For example,
Finer encoding granularity (in terms of allowing for different kinds of motion in different fields) can be achieved when fields can be encoded independently from one another. To help achieve this finer granularity, some embodiments employ DC/AC prediction techniques for encoding an intra field independently from the other field in the macroblock.
1. DC/AC Prediction
In some embodiments, DC/AC prediction techniques facilitate the co-existence of inter- and intra-coded fields in the same macroblock.
For example, when coding an interlaced video frame, an encoder encodes 4:1:1 macroblocks (which have been classified as either field macroblocks or frame macroblocks) in raster scan order from left to right. Referring again to
For both the luminance and chrominance blocks, the encoder encodes DC coefficients differentially using the DC coefficients of neighboring blocks as predictors. While DC coefficients are always encoded differentially using neighboring blocks as predictors in these techniques, the encoder determines during encoding whether to predictively encode AC coefficients, and signals predictive AC coefficient encoding using flags (e.g., the ACPREDMB, ACPREDTFIELD, and/or ACPREDBFIELD flags described below). For a chrominance block, if row AC prediction is chosen, then the four coefficients of the first row are differentially coded.
In
a. DC Prediction
In DC/AC prediction, the quantized DC value for the current block is obtained by adding the DC predictor to the DC differential. The DC predictor is obtained from one of the previously decoded adjacent blocks. For example,
In some cases, one or more of the adjacent candidate predictor blocks with values A, B, and C are considered missing. For example, a candidate predictor block is considered missing if it is outside the picture boundary. Or, when finding a predictor for a current intra block in an interlaced inter-frame (e.g., an interlaced P-frame), the candidate predictor block is considered missing if it is not intra-coded. Only values from non-missing predictor blocks are used for DC prediction.
In some embodiments, if all three candidate blocks are present, the encoder/decoder selects the predictor value based on the following rule:
If an adjacent candidate block is missing, then the following rules apply:
If AC prediction is enabled for the current block, then the AC coefficients on either the top row or the left column of the current block may be differentially encoded. This decision is based on the DC predictor. For example, in some embodiments, AC prediction proceeds according to the following rules:
The AC coefficients in a predicted row or column are added to the corresponding decoded AC coefficients (prior to adding 128) in the current block to produce a reconstructed, quantized, DCT coefficient block.
2. Signaling for DC/AC Prediction
In some embodiments, an encoder/decoder uses signals in a bit stream at macroblock level to indicate whether AC prediction is active for a macroblock or for individual fields in a macroblock. For example, for frame macroblocks, an encoder indicates whether AC prediction will be performed for all blocks in the macroblock with the one-bit flag ACPREDMB. For field macroblocks, the encoder uses two one-bit flags to independently indicate whether AC prediction will be performed for blocks in the top field (ACPREDTFIELD) and bottom field (ACPREDBFIELD). Specifically, referring again to
C. Motion Vector Information in Inter-coded Interlaced Frames
As explained above, macroblocks are classified as frame macroblocks or field macroblocks and can be intra-coded or inter-coded. Thus, macroblocks can be one of four types: inter-coded frame macroblocks, inter-coded field macroblocks, intra-coded frame macroblocks, or intra-coded field macroblocks. Inter-coded macroblocks are motion compensated using motion vectors. For example, in P-frames, inter-coded frame macroblocks are motion compensated using one motion vector.
In some embodiments, inter-coded field macroblocks can have either one motion vector or two motion vectors. For example, when an inter-coded field macroblock has two motion vectors, each of the two fields in the macroblock has its own motion vector. On the other hand, when an inter-coded field macroblock has one motion vector, one of the two fields is intra-coded (not motion compensated) while the other field is inter-coded (motion compensated).
1. Motion Vector Predictors in Interlaced P-Frames
In general, motion vectors are computed by adding the motion vector differential to a motion vector predictor. In some embodiments, the motion vector predictor is computed using motion vectors from three neighboring macroblocks. For example, an encoder/decoder computes the motion vector predictor for a current macroblock by analyzing motion vector predictor candidates of the left, top, and top-right macroblocks. The motion vector predictor candidates are computed based on the current macroblock type.
In both cases, if there are no motion vectors for the candidate neighboring field or macroblock (e.g., the field or macroblock is intra coded), the motion vector for the candidate neighboring field or macroblock is set to be zero.
The predictor is calculated by taking the component-wise median of the three candidate motion vectors. For more information on median-of-three prediction, see U.S. patent application Ser. No. 10/622,841, entitled, “Coding of Motion Vector Information,” filed concurrently herewith, Alternativety, the predictor is calculated using some other method.
2. Derivation of Chrominance Motion Vectors from Luminance Motion Vectors
In some embodiments, an encoder/decoder derives chrominance motion vectors from luminance motion vectors. For example, an encoder/decoder reconstructs a chrominance motion vector for a macroblock from the corresponding frame/field luminance motion vector. For frame-coded macroblocks, there will be one chrominance motion vector corresponding to the single luminance motion vector for the macroblock. On the other hand, for field-coded macroblocks, there will be two chrominance motion vectors corresponding to the two luminance motion vectors for the macroblock (e.g., one motion vector for the top field and one motion vector for the bottom field).
An encoder/decoder can use the same rules for deriving chrominance motion vectors for both field and frame macroblocks; the derivation is only dependent on the luminance motion vector, and not the type of macroblock. In some embodiments, chrominance motion vectors are derived according to the following pseudo-code:
frac_x4=(lmv_x<<2) % 16;
int_x4=(lmv_x<<2)−frac_x;
ChromaMvRound [16]={0, 0, 0, 0.25, 0.25, 0.25, 0.5, 0.5, 0.5, 0.5, 0.5, 0.75, 0.75, 0.75, 1, 1};
cmv_y=lmv_y;
cmv_x=Sign (lmv_x)*(int_x4>>2)+ChromaMvRound [frac_x4];
cmv_x and cmv_y are chrominance motion vector components and lmv_x and lmv_y are corresponding luminance motion vector components. cmv_x is scaled by four while cmv_y is not scaled. The 4:1:1 format of the macroblock requires no scaling of in the y dimension. This derivation technique is therefore well-suited for a 4:1:1 macroblock format. The scaled cmv_x is also rounded to a quarter-pixel location. Rounding leads to lower implementation costs by favoring less complicated positions for interpolation (e.g., integer and half-integer locations).
After cmv13 x and cmv_y are computed, the encoder/decoder can check to see if components should be pulled back (e.g., if the components map to an out-of-frame macroblock.) For more information on motion vector pull-back techniques, see U.S. patent application Ser. No. 10/622,841, entitled, “Coding of Motion Vector Infonnation,” filed concurrently herewith.
3. Motion Compensation
A decoder uses a decoded motion vector to obtain a prediction macroblock (or field within a macroblock, etc.) in a reference frame. The horizontal and vertical motion vector components represent the displacement between the macroblock currently being decoded and the corresponding location in the reference frame. For example, positive values can represent locations that are below and to the right of the current location, while negative values can represent locations that are above and to the left of the current location.
If a current macroblock is frame-coded, one motion vector is used to obtain a prediction macroblock. In some embodiments, a decoder uses bi-cubic interpolation to obtain sub-pixel displacement. On the other hand, if the current macroblock is field-coded, the top field and bottom field have their own corresponding motion vectors. Accordingly, in some embodiments, given a field motion vector that points to a starting location in the reference frame, a decoder uses bi-cubic interpolation, taking alternating lines starting from the starting location, to compute the prediction field.
D. Interlaced B-frames
In some embodiments, a video encoder/decoder uses interlaced B-frames. For example, a video encoder/decoder encodes/decodes interlaced B-frames comprising macroblocks in a 4:1:1 format.
As explained above, in some embodiments, an encoder encodes macroblocks either as frame type or field type. For interlaced P-frames, an inter-coded field macroblock can have either one motion vector or two motion vectors. When an inter-coded field macroblock in a P-frame has two motion vectors, each of the two fields in the macroblock has its own motion vector and is compensated to form the residual. On the other hand, when an inter-coded field macroblock contains only one motion vector, one of the two fields is intra-coded while the other field is inter-coded.
In progressive B-frames, a macroblock can have from zero to two motion vectors, depending on the prediction mode for the macroblock. For example, in an encoder using five prediction modes (forward, backward, direct, interpolated and intra), forward and backward mode macroblocks have one motion vector for predicting motion from a previous reference or future frame. Direct mode macroblocks have zero motion vectors because in direct mode an encoder derives implied forward and backward pointing motion vectors—no actual motion vectors are sent for direct macroblocks. Intra mode macroblocks also have zero motion vectors. Interpolated mode macroblocks have two motion vectors (e.g., a backward motion vector and a forward motion vector).
For interlaced B-frames, an inter-coded field macroblock can have from zero to four motion vectors because each field can have from zero to two motion vectors, depending on the prediction mode of the field. For example:
The set of possible motion vector combinations for a frame type B-frame macroblock is identical to the set of possible motion vector combinations for a progressive B-frame macroblock.
Although no motion vectors are sent for macroblocks that use direct mode prediction, direct mode macroblocks in interlaced frames are still designated as either frame type (using one motion vector for motion compensation) or field type (using two motion vectors for motion compensation), followed by the appropriate motion vector scaling and motion compensation in each case. This enables direct mode macro blocks in interlaced frames to be processed differently under different motion scenarios for better compression.
Having described and illustrated the principles of our invention with reference to various embodiments, it will be recognized that the various embodiments can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment, unless indicated otherwise. Various types of general purpose or specialized computing environments may be used with or perform operations in accordance with the teachings described herein. Elements of embodiments shown in software may be implemented in hardware and vice versa.
In view of the many possible embodiments to which the principles of our invention may be applied, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.
Number | Name | Date | Kind |
---|---|---|---|
4454546 | Mori | Jun 1984 | A |
4661849 | Hinman | Apr 1987 | A |
4661853 | Roeder et al. | Apr 1987 | A |
4691329 | Juri et al. | Sep 1987 | A |
4695882 | Wada et al. | Sep 1987 | A |
4796087 | Guichard et al. | Jan 1989 | A |
4800432 | Barnett et al. | Jan 1989 | A |
4849812 | Borgers et al. | Jul 1989 | A |
4862267 | Gillard et al. | Aug 1989 | A |
4864393 | Harradine et al. | Sep 1989 | A |
5021879 | Vogel | Jun 1991 | A |
5068724 | Krause et al. | Nov 1991 | A |
5091782 | Krause et al. | Feb 1992 | A |
5103306 | Weiman et al. | Apr 1992 | A |
5157490 | Kawai et al. | Oct 1992 | A |
5175618 | Ueda | Dec 1992 | A |
5223949 | Honjo | Jun 1993 | A |
5258836 | Murata | Nov 1993 | A |
5298991 | Yagasaki et al. | Mar 1994 | A |
5412430 | Nagata | May 1995 | A |
RE34965 | Sugiyama | Jun 1995 | E |
5422676 | Herpel et al. | Jun 1995 | A |
5424779 | Odaka | Jun 1995 | A |
5428396 | Yagasaki | Jun 1995 | A |
5442400 | Sun | Aug 1995 | A |
5448297 | Alattar et al. | Sep 1995 | A |
5461421 | Moon | Oct 1995 | A |
RE35093 | Wang et al. | Nov 1995 | E |
5467086 | Jeong | Nov 1995 | A |
5467136 | Odaka | Nov 1995 | A |
5477272 | Zhang | Dec 1995 | A |
RE35158 | Sugiyama | Feb 1996 | E |
5510840 | Yonemitsu et al. | Apr 1996 | A |
5539466 | Igarashi et al. | Jul 1996 | A |
5544286 | Laney | Aug 1996 | A |
5552832 | Astle | Sep 1996 | A |
5565922 | Krause | Oct 1996 | A |
5594504 | Ebrahimi | Jan 1997 | A |
5623311 | Phillips et al. | Apr 1997 | A |
5659365 | Wilkinson | Aug 1997 | A |
5666461 | Igarashi et al. | Sep 1997 | A |
5668932 | Laney | Sep 1997 | A |
5673370 | Laney | Sep 1997 | A |
5692063 | Lee et al. | Nov 1997 | A |
5699476 | Van Der Meer | Dec 1997 | A |
5701164 | Kato | Dec 1997 | A |
5748789 | Lee et al. | May 1998 | A |
5764814 | Chen et al. | Jun 1998 | A |
5784175 | Lee | Jul 1998 | A |
5787203 | Lee et al. | Jul 1998 | A |
5796855 | Lee | Aug 1998 | A |
5799113 | Lee | Aug 1998 | A |
RE35910 | Nagata et al. | Sep 1998 | E |
5825929 | Chen et al. | Oct 1998 | A |
5835145 | Ouyang et al. | Nov 1998 | A |
5844613 | Chaddha | Dec 1998 | A |
5847776 | Khmelnitsky | Dec 1998 | A |
5874995 | Naimpally et al. | Feb 1999 | A |
5901248 | Fandrianto et al. | May 1999 | A |
5929940 | Jeannin | Jul 1999 | A |
5946042 | Kato | Aug 1999 | A |
5946043 | Lee et al. | Aug 1999 | A |
5946419 | Chen et al. | Aug 1999 | A |
5949489 | Nishikawa et al. | Sep 1999 | A |
5959673 | Lee | Sep 1999 | A |
5959674 | Jang et al. | Sep 1999 | A |
5963258 | Nishikawa et al. | Oct 1999 | A |
5963673 | Kodama et al. | Oct 1999 | A |
5970173 | Lee et al. | Oct 1999 | A |
5970175 | Nishikawa et al. | Oct 1999 | A |
5973743 | Han | Oct 1999 | A |
5973755 | Gabriel | Oct 1999 | A |
5974184 | Eifrig et al. | Oct 1999 | A |
5982437 | Okazaki et al. | Nov 1999 | A |
5982438 | Lin et al. | Nov 1999 | A |
5990960 | Murakami et al. | Nov 1999 | A |
5991447 | Eifrig et al. | Nov 1999 | A |
6002439 | Murakami et al. | Dec 1999 | A |
6005980 | Eifrig et al. | Dec 1999 | A |
RE36507 | Iu | Jan 2000 | E |
6011596 | Burl | Jan 2000 | A |
6026195 | Eifrig et al. | Feb 2000 | A |
6040863 | Kato | Mar 2000 | A |
6067322 | Wang | May 2000 | A |
6094225 | Han | Jul 2000 | A |
RE36822 | Sugiyama | Aug 2000 | E |
6097759 | Murakami et al. | Aug 2000 | A |
6130963 | Uz et al. | Oct 2000 | A |
6148109 | Boon et al. | Nov 2000 | A |
6154495 | Yamaguchi et al. | Nov 2000 | A |
6188725 | Sugiyama | Feb 2001 | B1 |
6188794 | Nishikawa et al. | Feb 2001 | B1 |
6201927 | Comer | Mar 2001 | B1 |
6205176 | Sugiyama | Mar 2001 | B1 |
6208761 | Passagio et al. | Mar 2001 | B1 |
6215905 | Lee et al. | Apr 2001 | B1 |
6219070 | Baker et al. | Apr 2001 | B1 |
6219464 | Greggain et al. | Apr 2001 | B1 |
6233017 | Chaddha | May 2001 | B1 |
RE37222 | Yonemitsu | Jun 2001 | E |
6243418 | Kim | Jun 2001 | B1 |
6259810 | Gill et al. | Jul 2001 | B1 |
6263024 | Matsumoto | Jul 2001 | B1 |
6271885 | Sugiyama | Aug 2001 | B2 |
6275531 | Li | Aug 2001 | B1 |
6281942 | Wang | Aug 2001 | B1 |
6282243 | Kazui et al. | Aug 2001 | B1 |
6292585 | Yamaguchi et al. | Sep 2001 | B1 |
6295376 | Nakaya | Sep 2001 | B1 |
6304928 | Mairs et al. | Oct 2001 | B1 |
6307887 | Gabriel | Oct 2001 | B1 |
6307973 | Nishikawa et al. | Oct 2001 | B2 |
6320593 | Sachs et al. | Nov 2001 | B1 |
6324216 | Igarashi | Nov 2001 | B1 |
6337881 | Chaddha | Jan 2002 | B1 |
6347116 | Haskell et al. | Feb 2002 | B1 |
6377628 | Schultz et al. | Apr 2002 | B1 |
6381279 | Taubman | Apr 2002 | B1 |
6404813 | Haskell et al. | Jun 2002 | B1 |
6418166 | Wu et al. | Jul 2002 | B1 |
6430316 | Wilkinson | Aug 2002 | B1 |
6441842 | Fandrianto et al. | Aug 2002 | B1 |
6496601 | Migdal et al. | Dec 2002 | B1 |
6529632 | Nakaya et al. | Mar 2003 | B1 |
6539056 | Sato et al. | Mar 2003 | B1 |
6563953 | Lin et al. | May 2003 | B2 |
6571019 | Kim et al. | May 2003 | B1 |
6573905 | MacInnis et al. | Jun 2003 | B1 |
6647061 | Panusopone et al. | Nov 2003 | B1 |
6650781 | Nakaya | Nov 2003 | B2 |
6728317 | Demos | Apr 2004 | B1 |
20020168066 | Li | Nov 2002 | A1 |
20020186890 | Lee et al. | Dec 2002 | A1 |
20030099292 | Wang et al. | May 2003 | A1 |
20030112864 | Karczewicz et al. | Jun 2003 | A1 |
20030113026 | Srinivasan et al. | Jun 2003 | A1 |
20030142748 | Tourapis | Jul 2003 | A1 |
20030152146 | Lin et al. | Aug 2003 | A1 |
20030156646 | Hsu et al. | Aug 2003 | A1 |
20040042549 | Huang et al. | Mar 2004 | A1 |
20040136457 | Funnell et al. | Jul 2004 | A1 |
20040141654 | Jeng | Jul 2004 | A1 |
20050013497 | Hsu et al. | Jan 2005 | A1 |
20050013498 | Srinivasan | Jan 2005 | A1 |
20050036759 | Lin et al. | Feb 2005 | A1 |
20050053156 | Lin et al. | Mar 2005 | A1 |
20050100093 | Holcomb | May 2005 | A1 |
20060013307 | Olivier et al. | Jan 2006 | A1 |
Number | Date | Country |
---|---|---|
0 279 053 | Aug 1988 | EP |
0397402 | Nov 1990 | EP |
0526163 | Feb 1993 | EP |
0535746 | Apr 1993 | EP |
0 830 029 | Mar 1998 | EP |
0863675 | Sep 1998 | EP |
0884912 | Dec 1998 | EP |
2343579 | May 2000 | GB |
61205086 | Sep 1986 | JP |
62 213 494 | Sep 1987 | JP |
3001688 | Jan 1991 | JP |
3 129 986 | Mar 1991 | JP |
6 078 298 | Mar 1994 | JP |
6078295 | Mar 1994 | JP |
7274171 | Oct 1995 | JP |
10 056 644 | Feb 1998 | JP |
6292188 | Oct 2004 | JP |
1003538510000 | Jan 2002 | KR |
WO 0033581 | Aug 2000 | WO |
WO 03026296 | Mar 2003 | WO |
Number | Date | Country | |
---|---|---|---|
20050013497 A1 | Jan 2005 | US |