This invention relates generally to image encoding and decoding techniques, and more particularly to a methods and systems for generating a partial image from a larger compressed image.
The Motion Picture Experts Groups (MPEG) video compression standards (ISO/IEC 11172-1 or MPEG-1 and ISO/IEC 13818-2 or MPEG-2) define encoding protocols for the compression of motion video sequences. MPEG video compression can also be used to flexibly encode individual still images or image sequences. For example, a single still image can be encoded as a video sequence consisting of a single Group-of-Pictures, containing a single picture encoded as an intra-coded image (1-frame). Multiple still images can be encoded together as a video sequence consisting of a single Group-of-Pictures, containing multiple pictures, at least the first of which is encoded as an intra-coded image (1-frame) and the remainder of which can be encoded as I-frame, prediction-coded images (P-frame) or bidirectional-prediction-coded (B-frame) images.
One advantage of encoding still images in this manner is that a hardware MPEG decoder can be used to create the video image content from a given MPEG data stream. This reduces the software requirements in the decoding/display system. An example of such a system is an integrated receiver/decoder or set-top box used for the reception, decoding and display of digitally-transmitted television signals. In such a system, a common hardware video decoder can be used to decode and present for display both conventional streaming MPEG video data, and individual MPEG-encoded still images or still image sequences.
Using a hardware MPEG decoder to decode and display still image data presents one significant limitation on the image content, namely that the still image match the size of the standard video format for the decoder. This means for example that an image larger than the standard video format cannot be decoded by the hardware decoder. One exception to this rule exists, in that a high-definition television (HDTV) digital MPEG decoder must be capable of decoding a full-size HDTV image and displaying only a standard-definition television (SDTV) image. This exception is controlled by special data in the HDTV sequence, which provides pan-and-scan data to determine which portion of the (larger) HDTV image is output to the (smaller) SDTV display. However, even in this case the encoded image must match the HDTV image format.
A number of inventions address the desire to encode a large image, using one of the MPEG standard compression protocols, and decode only a portion of the image for display. The concept is shown in
Civanlar et al. (U.S. Pat. No. 5,623,308) describe the use of an MPEG encoder to compress a large image which consists of a multiplicity of smaller sub-images. The large image is divided into slices according to the MPEG encoding standard, and each slice is divided into sub-slices, each of which corresponds to a macroblock row within a sub-image. The encoded data for a sub-image is extracted from the input data by searching for slice start codes corresponding to the desired sub-image, then recoding the slice start code and the macroblock address increment for the data, bit shifting the data for the remainder of the sub-slice, and padding the sub-slice to a byte boundary. Each such large image must be encoded as a P-frame, so that each sub-image is encoded independently of every other sub-image. This method has significant disadvantages:
McLaren (U.S. Pat. No. 5,867,208) presents a different method of extracting a sub-image from a larger MPEG-encoded image. Each row of macroblocks in the full image is encoded using standard I-frame encoding. If the full image is wider than the desired sub-image, each macroblock row must be broken into multiple slices, each of which contains at least two macroblocks. This limits horizontal offsets to pre-determined two-macroblock increments. By selecting the correct sequence of slices, a sub-image can be constructed from the full image. The resulting sub-image corresponds to the desired input image size for the hardware decoder, so a single header suffices for all sub-images from the full image. However, if multiple slices are encoded in a given row, the slices must be recoded to insert the proper macroblock address increment. Macroblock address increments are encoded using Huffman codes, so the modification of macroblock address increments requires bit-shifting, which as noted can be prohibitively slow on low-power processors.
Boyce et al. (U.S. Pat. No. 6,246,801) extract a sub-image from a larger MPEG-encoded image by modification of the MPEG decoding process. Undisplayed macroblocks at the beginning of each slice are decoded, but only the DC coefficients are retained for discarded macroblocks. This technique modifies the decoding process, and so does not solve the problem of providing a conforming MPEG sequence for use with a hardware decoder.
Boyer et al. (U.S. Pat. No. 6,337,882) modify the technique of U.S. Pat. No. 6,246,801 by encoding each macroblock independently, so that each macroblock can be decoded independently. This technique modifies both the encoding and decoding processes (for example, by using JPEG encoding), making the resulting data non-compliant with the MPEG encoding standards.
Zdepski et al. (US Patent application 2004/0096002) describe a technique for repositioning a sub-image within a larger image. This technique generates a P-frame image. In this technique, slices which do not contain any sub-image data are encoded as empty slices, while slices containing sub-image data require the generation of empty macroblocks, and the modification of the content of the sub-image data. As with U.S. Pat. No. 5,623,308, the resulting data do not constitute a valid MPEG video sequence, and so cannot be passed independently to a hardware video decoder.
What is desired is a method of extracting from a full image, data constituting a sub-image, which can then be constituted into a valid MPEG I-frame sequence without requiring bit-shift operations.
The current invention defines methods, systems and computer-program products for encoding an image so that a portion of the image can be extracted without requiring modification of any of the encoded data, thereby generating a valid MPEG I-frame sequence that can be fed directly to a hardware or software MPEG decoder for decompression and display.
Preferred and alternative embodiments of the present invention are described in detail below with reference to the following drawings.
The STB 36 receives input from the network 32 via an input/output controller 218, which directs signals to and from a video controller 220, an audio controller 224, and a central processing unit (CPU) 226. In one embodiment, the input/output controller 218 is a demultiplexer for routing video data blocks received from the network 32 to a video controller 220 in the nature of a video decoder, routing audio data blocks to an audio controller 224 in the nature of an audio decoder, and routing other data blocks to a CPU 226 for processing. In turn, the CPU 226 communicates through a system controller 228 with input and storage devices such as ROM 230, system memory 232, system storage 234, and input device controller 236.
The system 36 thus can receive incoming data files of various kinds. The system 36 can react to the files by receiving and processing changed data files received from the network 32.
After the STB 36 has identified the sub-image, the sub-image is encoded into MPEG format at a block 264. At a block 268, the MPEG formatted sub-image is decoded using a standard MPEG decoder. The decoded sub-image is then displayed at a block 270.
MPEG Video Encoding
Briefly, MPEG video encoding for intra-coded images is accomplished by dividing the image into a sequence of macroblocks, each of which is 16 columns by 16 rows of pixels. By convention, an MPEG-1 macroblock consists of 16×16 luminance (Y) values, and 8×8 sub-sampled blue (Cb) and red (Cr) chrominance difference values. These data are first divided into 6 8×8 blocks (four luminance, one blue chrominance, and one red chrominance). Each block is transformed by a discrete cosine transform, and the resulting coefficients are quantized using a fixed quantization multiplier and a matrix of coefficient-specific values. The resulting non-zero quantized coefficients are encoded using run/level encoding with a zig-zag pattern. A fixed bit sequence marks the end of the run/level encoding of a block, and the end of all encoded blocks signals the end of the macroblock.
The MPEG standard describes a hierarchy of elements which constitute a valid MPEG video stream.
Note that the size of a box in
Conventional slice headers comprise 38 bits (32 bit start code, 5 bit quantization value, 1 bit signal), so the start of the data of the first macroblock in each conventional slice does not lie on a byte boundary, and the start of the data of each subsequent macroblock will typically fall randomly on a bit position within the starting byte. This trait complicates the creation of sub-image files from larger MPEG image files, since the boundary of a macroblock can typically only be found by decoding the variable length codes that constitute the various elements of the macroblock and block coding structures.
The present invention uses a special encoded image file format (the Extractable MPEG Image or EMI format) that incorporates MPEG compressed data, but is not directly compliant with the MPEG video standard. Desired portions of the EMI file content are combined with minimal amounts of newly-generated data to create a new file or data buffer which is fully compliant with the MPEG-1 video standard, and thus can be directly decoded by any compliant MPEG video decoder.
Generating a Valid MPEG Sequence
For any given decoding/display system, the video image format will be the same from sequence to sequence. For example, in the NTSC standard, all video images are 720 columns by 480 rows, while for the PAL standard, all video images are 720 columns by 576 rows. Similarly, the other parameters of the video display, namely pixel aspect ratio and frame rate, are set by the video standard. Therefore, many elements of the sequence shown in
Conceptually the content of a single-frame video sequence can be described by the structure shown in
The EMI file format of the current invention incorporates MPEG-encoded data for every macroblock in the original image, encoded as an intra-coded macroblock. Using techniques described in detail below, the encoded data for each macroblock is made to occupy an integer number of bytes, that is, the bit count for each macroblock is a multiple of 8 and the data for each macroblock is byte-aligned. By this means, data from multiple sequential macroblocks can be extracted from the EMI file content and concatenated to yield a valid MPEG slice, without requiring bit shifts of the macroblock data.
Encoding Techniques to Force Byte Alignment
At the slice header level, extra slice information can be included within the slice header. Each extra slice information unit occupies 9 bits (a signal bit and 8 data bits). Multiple information units can therefore be used to extend the size of a slice header to ensure that the first macroblock after the slice header commences on a byte boundary.
At the macroblock header level, macroblock stuffing allows the encoder to modify the bit rate. Each instance of macroblock stuffing occupies 11 bits. Thus, multiple macroblock stuffing codes can be used to pad the number of bits in a single macroblock to any desired bit boundary. Additionally, for an intra-coded picture (I-frame), each macroblock type can be encoded in two ways, with and without the quantization value. Since the quantization value does not typically change during the course of encoding a still image, the quantization value need not be specified for each macroblock. However, in one embodiment, this technique is used to set the size of the macroblock type field to either 1 or 7 bits.
At the block level, coefficient run/level codes fall into two categories, those for which explicit codes are supplied, and those which must use the escape mechanism. Explicit run/level codes occupy from 2 to 17 bits, whereas escape sequences occupy 20 or 28 bits (depending on the level value). However, explicit run/level codes need not be used for a given run/level sequence, so an encoder can optionally substitute escape encoding for explicit run/level codes, thereby modifying the number of bits required to express a particular run/level code.
These four mechanisms can be used in concert to create encoded MPEG data which has the format shown in
The structure of the data produced using the method described in this invention is shown in
Byte-Aligning Slice Headers
A conventional slice header occupies 38 bits. The addition of two information bytes (of 9 bits each) to the slice header raises the total bit count to 56 (=7*8), resulting in a byte-aligned slice header.
Byte-Aligning Macroblocks
When encoding a macroblock, variable length codes are used to express the macroblock type, the macroblock address increment, the DC coefficient differential, and the run-level codes for the quantized coefficients. When attempting to byte-align the encoded data for a macroblock, the residuum of a given code (that is, the number of bits in the last partial byte of the pattern) is of importance, since modifying the macroblock type or encoding a run-level pair using an escape sequence can change the residuum of the corresponding code, and therefore change the residuum of the encoded data for the entire macroblock. As an example, the run-level pair 1,2 can be expressed using the Huffman code 0001 100 (a residuum of 7), or using the escape sequence 0000 01 0000 01 0000 0010 (a count of 20, a residuum of 4).
In the preferred embodiment of this invention, when encoding a single macroblock, the number of encoding bits for the entire macroblock will have a residuum of 0 (that is, be a exact multiple of 8 bits), and the total number of bytes will be less than 256, so the byte count can be expressed in a single 8-bit unsigned value.
To achieve an encoded data length with a residuum of zero, the macroblock is first encoded using the given quantization value, employing the minimal code words defined by the MPEG-1 video encoding specification. A count is kept of the number of run-level code words used for each residuum from 0 to 7.
The residuum of the total number of bits in the encoded data for the macroblock is determined. Based on this residuum, and the count of code words for each residuum, the macroblock is re-encoded. During the re-encoding, three possible changes can be made to the encoding process:
Table 1 gives the rules for how these changes are applied, based on the residuum for the original encoding and the count of code words. The macroblock is re-encoded using the rules, and the total number of bytes of data are determined. If the total number of bytes exceeds 255, the encoding process starts over. In this case, the number of the last possible encoded coefficient (which starts at 64) is decremented. This process repeats until the total number of bytes for the macroblock is less than 256. Note that this requirement can always be met, since in the limit of only one coefficient being encoded (the DC coefficient), the maximum block size is 1 bit for the type, 1 bit for the address increment, and 20 bits for each block, plus no more than 60 bits added by the modification process defined by Table 1.
At a block 306, coding rules are determined from Table 1 based on residuum of size of encoded data. The macroblock is re-encoded using the determined coding rules, see block 308. At a decision block 312, the process 300 determines if the encoded data is less than 256 bytes. If the decision at the decision block 312 is true, then the process 300 is complete. If the decision at the decision block 312 is false, then the coefficient limit is reduced by 1 and the process 300 returns to the block 304.
Creating an EMI File for an Image
In the preferred embodiment of this invention, the encoded data for an image is gathered into an EMI file with the format shown in
Generating an MPEG I-Frame Sequence from an EMI File
Given the data structure shown in
However, in constructing each slice one final factor is to be taken into account. The MPEG video encoding scheme takes advantage of spatial homogeneity when encoding the DC coefficients (Y, Cb and Cr) for each macroblock as a differential from the DC coefficient for the previous block of the same type. In other words, the DC coefficient is encoded as a difference value, rather than as an absolute value. In order for the decoding of a given macroblock to be performed properly, it is necessary that the DC coefficients be adjusted to the proper predictive values.
Recall the process of decoding the macroblocks for a single slice in an I-frame image. Before the first macroblock of the slice is decoded, the Y, Cb and Cr DC predictors are set to the nominal value of 128. The first macroblock is decoded; the data for each block includes a differential on the DC coefficient, which is added to the corresponding predictor (the DC predictor for the Y component accumulates for the four blocks of the macroblock). After the macroblock is decoded, the new values of the DC predictors are applied to the next sequential macroblock to be decoded.
The DC predictors for each macroblock are stored in the macroblock data table in the EMI file. Using these predictors, a guard macroblock can be created as the first block of each slice. The purpose of this macroblock is to establish the proper DC coefficients for the next sequential macroblock in the slice, which is the first valid image macroblock to be displayed in the slice.
Generating a Byte-Aligned Slice Header and Guard Macroblock
The process of generating a proper byte-aligned slice header and guard macroblock is controlled by the DC predictor coefficients required for the second macroblock in the slice. The required DC coefficient offsets are computed by subtracting the DC predictors for the second macroblock from the nominal predictor value of 128. The resulting Y, Cb and Cr DC coefficients will be encoded using the conventional MPEG-1 dct_dc_size_luminance and dct_dc_size_chrominance tables, Tables 2 and 3 (ISO/IEC 11172-2).
When the guard macroblock is decoded, the Y, Cb and Cr DC predictors will have the required values for the second macroblock in the slice.
Once the required DC coefficients are calculated, the total number of bits required to encode these three coefficients is determined, then added to the number of bits required to encode the slice start code (32), slice extra information bit (1), quantizer (5), and the remainder of the macroblock (address increment 1 bit, macroblock type 1 bit, six EOB codes at 2 bits each and three zero luminance coefficients at 3 bits each). Depending on the residuum of the total number of bits required for the slice header and guard macroblock, the encoding process is modified to produce an even number of bytes of data when producing the final encoding of the slice header and guard macroblock. The rules for the encoding process are given in Table 4.
Once the slice header and guard macroblock are generated, the data for the remaining macroblocks of the slice is concatenated onto the guard macroblock data. Since all data elements are byte-aligned, no bit shifting is required.
Note that in the case where the sub-image is aligned with the left boundary of the full image (that is, the first displayed macroblock is the first macroblock of the slice), a byte-aligned slice header can be generated without a guard macroblock, since the expected Y, Cb and Cr luminance values for the first macroblock will be as expected. In this case, the slice header comprises the 32-bit slice start code, 5-bit quantizer, two 9-bit extra information slice entries, and the extra information slice bit (for a total of 56 bits or 7 bytes). The slice macroblocks, including the first in the slice, can then be copied without modification.
Once the required number of slices are generated (30 for NTSC, 36 for PAL), an MPEG sequence end code is appended onto the data. The MPEG I-frame sequence can then be fed to any MPEG-compliant decoder for processing.
Note that the total size of the output MPEG I-frame sequence can be determined before the sequence is generated. The size of the sequence, GOP and picture header are known (28 bytes), as is the size of the sequence end code (4 bytes). The number of bytes required for each macroblock can be determined from the macroblock data table. The number of bytes required for each slice header and guard macroblock can be determined by examining the required Y, Cb and Cr DC predictors for the guard macroblock. Alternatively, the worst-case size of the slice header with guard block (19 bytes) can be used, leading to a conservative estimate for the final size. The computed size can be used to pre-allocate a buffer sufficiently large to hold the contents of the generated MPEG I-frame sequence.
In some cases, a desirable feature of sub-image extraction and display is the ability to extract a sub-image that overlaps the boundary of the full image.
In
To accomplish this, an Extractable MPEG Image Extended or EMIX format is created that contains additional data for black slices and empty macroblocks. This format is shown in
The data for the black slice consists of the following elements:
for a total of 171 bytes, while the data for an empty macroblock consists of the following elements:
for a total of 10 bytes.
The black slice and empty macroblock data can be used to generate padding in any case where the position of the sub-image is such that encoded data from the full image is not available to fill a given slice or macroblock. If an entire slice must be filled, the black slice is simply copied from the EMIX file into the generated MPEG I-frame sequence. If one or more padding macroblocks are required (to the left or right of existing full image data), the empty macroblock is copied from the EMIX file the required number of times to fill the space. The empty macroblock(s) are inserted after the guard macroblock for left padding or after the subimage macroblock data for right padding.
While the preferred embodiment of the invention has been illustrated and described, as noted above, many changes can be made without departing from the spirit and scope of the invention. Accordingly, the scope of the invention is not limited by the disclosure of the preferred embodiment. Instead, the invention should be determined entirely by reference to the claims that follow.