The present invention relates to digital video processing, and more specifically to artifact and noise reduction in MPEG video.
MPEG is a name given to a set of international standards used for compressing and encoding digital audiovisual information. MPEG stands for Motion Picture Experts Group, the group who originally formulated the standards. Several standards have emerged and been promulgated by the International Standards Organization (ISO), including MPEG-1, MPEG-2, and MPEG-4, more formally known as ISO/IEC-11172, ISO/IEC-13818, and ISO/IEC-14496 respectively. For the purposes of this disclosure, “MPEG” means any image coding scheme meeting any of these standards or operating in a similar way. In general, MPEG algorithms perform block transforms (usually a discrete cosine transform or “DCT”) on blocks selected from frames of digital video, quantize each resulting coefficient set, and efficiently encode the coefficients for storage. An MPEG video sequence can be replayed by reversing the steps used for compression and rendering the resulting decompressed video.
Because MPEG performs “lossy” compression, the sequence recovered after compression and decompression differs from the original uncompressed sequence. These differences are sometimes called distortion. Generally, the amount of distortion introduced increases with increasing compression ratio, and artifacts of the distortion are often visible in the decompressed video sequence. For example, the edges of the blocks selected for the block transforms may be visible, and the decompressed sequence may appear “noisy”, often because visual edges within a frame have “ringing” or halo artifacts. More information about MPEG can be found in MPEG Video Compression Standard, edited by Joan L. Mitchell, William B. Pennebaker, Chad E. Fogg, and Didier J. LeGall, and published by Chapman & Hall, ISBN 0-412-08771-5.
Similar distortion issues arise in compressing and decompressing still images using the JPEG standard, named for the Joint Photographic Experts Group, the committee that developed the specifications for standard use of the technique and for the standard file format of JPEG image files. Various techniques have been devised for improving the quality of images reconstructed from JPEG files. For example, Nosratinia describes an algorithm in which a decompressed JPEG image is further processed by repeatedly shifting it spatially with respect to the block grid used for performing the block transforms, performing JPEG compression and decompression on each of the shifted images, shifting each back to its nominal position, and then averaging the resulting images. (See A. Nosratinia. “Enhancement of J PEG-compressed images by re-application of JPEG,” Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, vol. 27, pp. 69-79, February 2001.)
However, these techniques devised for still images generally perform poorly on some MPEG video frames, especially those predicted or interpolated from other frames.
Three different kinds of encoded frames may be used in constructing an MPEG video sequence. An “I-frame” is said to be intracoded. That is, the compressed frame is derived entirely from a single uncompressed frame of digital video, without regard to any other frames.
A “P-frame” is said to be predictively coded. In a P-frame, particular macroblocks of data are encoded differentially based on the most recent previous I- or P-frame. Some motion estimation is also encoded into a P-frame. To encode a particular macroblock in a P-frame, a region of the most recent previous I- or P-frame is searched to locate a macroblock that is similar to the current macroblock to be compressed and encoded. An array of pixel differences between that previous macroblock and the current block is computed, and that difference array is then quantized and encoded for storage. Motion vectors pointing to the location of the previous macroblock are also stored, so the current macroblock can be reconstructed.
A “B-frame” is said to be bi-directionally coded. That is, a B-frame is defined in reference to another I- or P-frame, but the reference frame may come temporally before or after the current frame being coded as a B-frame. Alternatively, a B-frame may be defined in reference to both a past I- or P-frame and a future I- or P-frame.
Various parameters may be specified for controlling the frame encoding. The size of the area to search in another frame for locating a similar macroblock for differential coding may be specified as well as the resolution with which to search. For example, a search may cover an area including all macroblocks within a specified distance from the location of the current macroblock in the current frame, in full-pixel increments, half-pixel increments, or quarter-pixel increments. The specified distance may be, for example ±16 pixels in each orthogonal direction, or some other distance. These specified parameters may be called a motion vector search range and a motion vector resolution. Additionally, the sequence of frame types, a bitrate, and rate control parameter settings may be specified. Optimal settings will depend on the particular application, and the desired tradeoff between compression ratio, speed, and image quality.
An MPEG video sequence is an interleaved set of frames, almost any of which may be I-frames, P-frames, or B-frames. For example, a video sequence could be stored using only I-frames. However, improved compression is possible if some P-frames are used, and still better compression is possible if B-frames are used as well. No particular ordering of I-, P-, and B-frames is specified. One commonly-used arrangement is to group fifteen frames together in the sequence IBBPBBPBBPBBPBB, and then repeat the sequence throughout the MPEG file. Each of these groups including an I-frame and the subsequent B- and P-frames occurring before the next I-frame is called a “group of pictures”, or GOP. A GOP that can be decompressed without referring to any frame outside the GOP is called a “closed GOP”. A GOP that includes I-frames as the first and last frames in the GOP is an example of a closed GOP. A GOP that begins or ends with a B-frame is an example of an “open GOP”, because frames outside the GOP are referred to in decompressing the GOP. Preferably, but not necessarily, a method in accordance with an example embodiment of the invention operates on a closed GOP.
In step 103, the initial decompressed GOP is further processed as follows. For at least two shift positions, the initial decompressed GOP is spatially shifted in relation to the grid used to define macroblocks. MPEG compression and decompression are applied to the GOP in each shift position. This results in a resulting decompressed GOP for each shift position.
In step 104, each resulting decompressed GOP is spatially shifted back to the initial position. In step 105, the resulting decompressed GOPs are combined into an improved GOP. In optional step 106, the improved GOP is displayed. (At least some optional steps are indicated in
Several of the steps in the method of
Original frame 200 may be captured by a digital camera or other imaging device in RGB format, wherein each pixel is described by three numerical values, one each representing the red, green, and blue components of the image at that pixel. An early step in MPEG compression converts the digital data to YCrCb format, which includes a luminance channel Y and two chrominance channels Cr and Cb. The two chrominance channels are downsampled so that the macroblock is represented by four 8×8 pixel luminance blocks and two 8×8 pixel chrominance blocks. In
In
While
Unfilled macroblocks that overlap edges of the frame, for example macroblocks 206 and 301 in
Also in step 103 of method 100, compression and decompression are applied to the GOP in each of the shifted positions. The result is a resulting decompressed GOP for each shift position. The compression and decompression may be full MPEG processing, or may be a subset chosen for computational efficiency.
Full MPEG compression and decompression of a GOP comprises several steps. In one example embodiment, the steps may be summarized by the table below.
Full MPEG decompression comprises complementary steps, performed in approximately reverse order:
In one example embodiment of the present invention, each shifted GOP is subjected to full MPEG compression and decompression. This may be a preferred implementation when an MPEG engine is available but not readily modifiable. For example, an MPEG engine may be implemented in hardware or in a software library routine.
If a custom implementation is possible, then preferably some portions of MPEG processing are omitted for computational efficiency. For example, the color space conversion and downsampling of the chrominance channels for compression need only be performed once for each frame in the GOP, as the result will be the same for each shifted position. All of the compression steps after quantization and all decompression steps before array multiplication may be omitted entirely. These steps are computationally expensive and are lossless, having no effect on the end result after decompression. For the purposes of this disclosure, “full” MPEG compression and decompression include all of the steps listed above. When some of those redundant or lossless steps are omitted, the resulting process is still MPEG compression and decompression for the purposes of this disclosure, but not “full” MPEG compression and decompression. In either case, MPEG compression comprises choosing a sequence of I-, P-, and B-frames (the sequence need not include all three frame types) and computing residual blocks and motion vectors for any P- and B-frames.
Preferably, but not necessarily, during the MPEG compression and decompression in accordance with an example embodiment of the invention the parameters controlling the MPEG processing are the same as were used to create the original MPEG video sequence. For example, the sequence of frame types, motion vector search range, motion vector resolution, bit rate, and rate control settings may be set to match the original MPEG video sequence. Alternatively, one or more settings may be altered. For example, the MPEG compression and decompression of step 103 of method 104 may be performed with a larger motion vector search range and a finer motion vector resolution than was the MPEG compression used to create the original MPEG video sequence.
In step 104 of method 100, each resulting decompressed GOP is shifted back to its nominal position. As with the original shifts, these shifts may be entirely conceptual, being accomplished by adjustments to indexing values used to read the arrays of data making up the frames.
In step 105, the resulting decompressed GOPs are combined to form a single improved GOP. In a preferred embodiment, the improved GOP is formed by averaging the resulting decompressed GOPs frame by frame and pixel by pixel. Other methods of combination may be used as well. For example, a weighted average may be used, wherein GOPs with smaller shift amounts are weighted differently in the weighted average than are GOPs with larger shift amounts.
In the example of
Once the improved GOP has been obtained, it may be used for any purpose for which any decompressed rendition of the original MPEG GOP may be used. For example, it may be used as part of a display of the original MPEG video sequence. Preferably in this application, all GOPs in the video sequence would be processed in accordance with an embodiment of the invention. In such an application, the quality of the video display will be improved over a display formed by simply decompressing the original MPEG video sequence.
In another useful application, a user of a camera, computer, or other imaging device may be able to select a particular frame from the GOP to be used as a still photograph. This application is particularly appropriate for users of digital cameras. Many modern digital cameras can take still photographs having five or more megapixels per photograph. Such digital photographs can be used for making enlarged prints up to 16 by 20 inches or more with excellent quality.
Many modern digital cameras also enable a camera user to use the same camera to capture video clips or sequences. Due to the processing and storage requirements of digital video, many cameras can record video only at resolutions considerably lower than the resolution at which they can take still photographs. For example, a five megapixel digital camera may limit its video frames to the “VGA” size of 640×480 pixels, or about one third of a megapixel per frame. Such cameras often also enable the user to extract a particular frame of digital video for use as a still photograph. While each frame of digital video is a digital photograph, it is a much lower resolution photograph than the camera is otherwise capable of, and the user may be disappointed that the photograph does not appear sharp when it is enlarged for printing. In this application, improvement of the quality of digital video is especially valuable.
Often, a frame that is extracted from a video sequence for use as a still photograph is upsampled so that the resulting still photograph has a number of pixels comparable to the number in a still photograph taken directly by the camera. The upsampling is usually accomplished by interpolating between the existing pixels. Any of many different well-known interpolation methods may be used. This process upsampling is sometimes referred to as increasing the resolution of the photograph, even though no additional spatial details are actually revealed in the photograph.
A method in accordance with an example embodiment of the invention may include upsampling of a frame extracted from a video sequence for use as a still photograph. Preferably, the upsampling is performed before the MPEG compression and decompression of step 103 of
A method in accordance with an example embodiment of the invention may be performed in a digital camera, computer, video phone, or other electronic imaging device capable of processing MPEG video.
A method according to an example embodiment of the invention may also be performed by a computer, the computer executing instructions stored on a computer-readable storage medium. The computer-readable storage medium may be a floppy disk, a compact disk read only memory (CD-ROM), a digital versatile disk (DVD), read only memory (ROM), random access memory (RAM), flash memory, or another kind of computer-readable memory.