This invention relates generally to bandwidth reduction in the transmittal of images using digital communications techniques.
The MPEG video compression standards (ISO/IEC 11172-2 and 13818-2) provide powerful and flexible mechanisms for conveying full-color image data over a digital transmission mechanism. The standards use digital cosine transformation techniques, along with coefficient quantization and Huffman binary encoding, to define compact and highly efficient representations of video images. A number of systems utilize MPEG video compression to convey video and still image content, including digital satellite and cable television (TV) and high-definition TV (HDTV).
Of particular interest in the current invention is use of MPEG video compression for transmittal to integrated receiver/decoders (IRD, sometimes called set-top boxes or STBs). Such systems typically have limited hardware and software capability, and so make use of particular aspects of the MPEG video standard to provide functionality. STBs often are used to receive and execute interactive television (iTV) applications, small pieces of software that create specialized audio, video and graphics presentations for the viewer, typically in response to a remote control device supplied with the STB.
Normally, an STB receives an MPEG transport stream, which contains various meta-data, one or more elementary video streams, and one or more elementary audio streams. The STB hardware selectively processes one of the elementary video streams, in real time, decoding and displaying the sequence of video images encoded in the video stream. The STB may also possess the capability to create and display a graphical pattern overlaying the video image, including text, icons and graphical images. Due to the low cost and limited capability of the STB hardware, graphical content is often limited to only a few colors, typically 16 or 256. This limits the quality and variety of static content that can be displayed to the viewer by the STB when executing an iTV application.
To overcome the graphical limitations of the STB, the MPEG video decoding hardware can be used to decode and display static full-color still images, possibly in place of the display of the streaming video content. This can be accomplished by transmitting to the STB a data file containing the MPEG-encoded still image, then passing the MPEG data through a standard MPEG decoder in place of the elementary video stream. The decoder parses the MPEG image data and creates the corresponding full-size, full-color still image, which is then displayed on the screen of the attached television receiver.
One useful feature of the MPEG video compression scheme is the ability to encode a predictive, or P-frame, image. In a P-frame image, any desired portion of the image can be predicted from the previous reference image processed by the decoder. This feature can be particularly useful when only a portion of the previous image is to be updated. The portions of the image not altered can be left un-encoded, and only the new portions of the image need be specified.
A sample product display is shown in
The use of P-frame MPEG files to save bandwidth is well known in the art. For example, OpenTV, the creators of the OpenTV middleware for STBs, distributes a software tool called OpenFrame that enables the composition of a P-frame image, and encoding of the resulting data into a compliant MPEG data file.
While P-frame images represent a significant savings in bandwidth over I-frame images, there are several inherent limitations to using P-frame images. First, the entire content of the video frame must be encoded in the P-frame image, including those portions of the background which are not altered. This is done by including slices with empty content in the MPEG data file. Each slice defines a row of macroblocks across the screen, each macroblock encoding a block of 16 columns by 16 lines of video.
In
To show the encoding technique more clearly,
When P-frame encoding is used to update only a portion of the video image, the data file is typically created on a server, then broadcast to the STB. As described above, the OpenTV OpenFrame application is one tool for creating such an MPEG data file.
A second and much more significant limitation is the fact that any one P-frame data file represents a unique position for the sub-image content. This means that each different placement of the sub-frame content on the video screen requires a unique MPEG file. In one application, a series of sub images are desired to be shown in a line across the screen. By using the navigational keys of the remote control, the viewer can scroll the list left or right to examine a set of images larger than can fit on the screen at one time. With each navigational move, an image moves one position to the left or right. This replacement of the image requires a different MPEG image. Using the conventional technique of encoding a P-frame image, this application would require that each small image be provided in four different forms, one for each possible position of the small image on the video screen.
Boucher et al., in U.S. Pat. No. 6,675,387, describe a system that attempts to overcome some of the limitations of the MPEG file format. In particular, '387 addresses the issue that a single macroblock, the smallest increment of picture area that can be represented in an MPEG file, comprises a number of bits that is not predictable; furthermore, the bit count for a macroblock need not be a multiple of 8 bits. Therefore, within a given MPEG P-frame file, denotation of the bits encoding a single macroblock requires both start and end byte and bit locations. To accommodate this limitation, Boucher et al. define a “fat macroblock” encoding technique, whereby headers in the image file contain pointers to the beginning of each macroblock strip (slice), as well as pointers to the beginning of each macroblock within the strip. '387 describes the usage of fat macroblock data within an image server, wherein the specially encoded fat macroblock image is used to create a conforming MPEG P- or B-frame image file, which is then transmitted to the client for decoding and display. This approach requires communication from the client to the server for each image to be displayed.
Therefore, there exists a need for systems and methods that permit the efficient encoding of an image that is smaller than a full-sized image, coupled with efficient repositioning of the sub-image within the full-sized image.
The current invention describes a technique for encoding a sub-image, using conventional MPEG encoding technique, to generate a special image file that is smaller than the equivalent full-frame image file. This special image file can be used to regenerate any of a multiplicity of full-sized encoded image files, each of which results in the display of the sub-frame image at one of a multiplicity of positions within the full-sized image frame.
Preferred and alternative embodiments of the present invention are described in detail below with reference to the following drawings.
The current invention defines a special format for MPEG video data, which results in an intermediate sub-frame image data file (the ‘Q-frame’ format) which is not compliant with the MPEG standard, but which can be used efficiently to create a compliant MPEG P-frame data file, with the sub-frame positioned at any desired (macroblock) position within the full video frame.
The STB 36 receives input from the network 32 via an input/output controller 218, which directs signals to and from a video controller 220, an audio controller 224, and a central processing unit (CPU) 226. In one embodiment, the input/output controller 218 is a demultiplexer for routing video data blocks received from the network 32 to a video controller 220 in the nature of a video decoder, routing audio data blocks to an audio controller 224 in the nature of an audio decoder, and routing other data blocks to a CPU 226 for processing. In turn, the CPU 226 communicates through a system controller 228 with input and storage devices such as ROM 230, system memory 232, system storage 234, and input device controller 236.
The system 36 thus can receive incoming data files of various kinds. The system 36 can react to the files by receiving and processing changed data files received from the network 32.
When a P-frame data file is passed to the MPEG hardware decoder, the decoder will have a reference image created by previously decoding some desired MPEG video data, either streaming video or an MPEG data file. When the P-frame data file is passed through the decoder, the sub-frame image will replace the corresponding portions of the reference image, and be displayed on the television.
Q-Frame Data Format
The Q-frame data file contains correctly-formatted MPEG bit sequences for all of the macroblocks in the sub-frame image. To conserve space in the Q-frame data file, and thus in the broadcast stream, all unnecessary data are eliminated from the file.
The only special feature of the slice encoding in a Q-frame file is that the first macroblock address increment is 1 (encoded as a single 1 bit), and the first macroblock is encoded as type Intra with Quant (with the type field encoded as the six-bit sequence 000011), followed by the five-bit quantizer value. Following this are encoded DC and AC coefficients of the six blocks in the macroblock. The remaining macroblocks in the slice are encoded with macroblock address increments of 1 and macroblock encoding type Intra or Intra with Quant (for details, see ISO/IEC 11172-2).
Note several aspects of the Q-frame MPEG header. First, Q-frame MPEG header is fixed for any video frame size, and the only useful data in the header is the video width and height. As an alternative to including the header data in the Q-frame, the data could be generated on the STB 36 and prepended to the generated P-frame. Also, the Q-frame data file could be encoded as an MPEG-2 file, in which case the Q-frame header includes a sequence header, a sequence extension, a sequence display extension, a picture header, a picture coding extension and a picture display extension. Again, all of these data are constant for any given display system, and could be generated in the STB 36 rather than transferred as part of the Q-frame file.
Padding the Slice Data to Position the Sub-Frame Data
The macroblock data for any one slice includes a sequence of bit fields. The length and value of a field code depends on the type of field, and the encoding process is determined by successively interpreting each field, to determine the type of the following field. Because Huffman coding is used for most fields, the length of a field can only be determined from its content, and is not fixed a priori. The only exceptions to this rule are the quantizer field, which is always 5 bits, and the extra slice bit, which is always 1 bit.
The Q-frame format 320 specifies the sizes of a few additional fields, namely the first macroblock address increment field (1 bit), the first macroblock encoding type field (6 bits), and the presence and size (5 bits) of the quantizer field following the type field. Together, these stipulations produce the following bit pattern at the beginning of each slice:
qqqqq010 00001qqq qqxxxxx xxxxxxxx (1)
where qqqqq is the quantizer, and xxx . . . represents the additional bits encoding the remainder of the macroblock, and the other macroblocks in the slice after the first.
In order to reposition the sub-frame macroblocks within the (larger) slice of the full-frame video image, additional data may be inserted into the beginning of the slice data, and further additional data may be appended to the end of the slice. These data encode additional macroblock padding in the image in order to satisfy the MPEG rules for slice encoding, namely that the first and last macroblocks of each slice be encoded (even if the encoding merely copies the corresponding macroblock from the previous reference picture).
One way to insert initial padding is to start the macroblock sequence with an empty macroblock (encoded with the bit sequence 1 001 1 1, corresponding to a macroblock address increment of 1, a type of motion-compensated with no data, and two zero motion vectors). The six-bit sequence specifies that the corresponding macroblock from the reference image is copied into the macroblock location, which is equivalent to skipping the macroblock. Then if necessary, the macroblock address increment field can be changed from a value of 1 to any desired value, by substituting the appropriate bit sequence into the output data file. The appropriate bit sequences for various macroblock address increments are given in ISO/IEC 11172-2.
For example, moving the left-most macroblock of a sub-frame from the left edge of the video image to a position three macroblocks (48 columns) from the left edge, would require the following sequence of bits at the beginning of the slice data:
qqqqq010 01110100 00001qqq qqxxxxx xxxxxxxx (2)
with the 8 inserted bits encoding a macroblock address increment of 1 (single 1 bit), a macroblock type of motion-compensated, no data (001), two zero motion vectors (two single 1 bits), and a macroblock address increment of 3 (010). The encoded data for the sub-frame follows this insertion. This pattern can be compared with the original Q-frame data for the slice
qqqqq010 00001qqq qqxxxxx xxxxxxxx (3)
to observe that in the new P-frame data, the first byte is identical to the original and a single new byte of data has been inserted. All remaining bytes of the original data are copied without modification.
Note a particular feature of this insertion, namely that the length of the insertion is exactly 8 bits. This means that the remainder of the encoded macroblock data for the slice is ready properly byte-aligned. Suppose for comparison that this technique were used to position the sub-frame data 64 columns from the left edge. This would result in the
qqqqq010 01110011 000001qq qqqxxxx xxxxxxxx (4)
In this case, the macroblock data for the sub-frame would have to be shifted by one bit position right from its original byte location. Creating the new P-frame data in this case would require bit shift operations for every byte of the macroblock data in each slice, as opposed to the simpler byte copy operation that would suffice for the case described above.
Table 1 shows the size of the insertion required to specify various numbers of empty macroblocks when repositioning a sub-frame within a conventional (720-column) video image, using the procedure described above. Only the bold cases would result in byte alignment of the following encoded macroblock data.
The MPEG-1 video encoding standard allows a technique for padding macroblock data, namely, the insertion of one or more copies of the macroblock stuffing pattern at each occurrence of the macroblock address increment field. This 11-bit pattern can be repeated any number of times, and is ignored by a compliant decoder. Thus the second can described above could have been encoded with the following bit pattern
qqqqq010 01110000 00011110 00000011 11000000 01111000 0000111 00000001 11100110 0001qqq qqxxxxx xxxxxxxx (5)
This approach is undesirable. While inserting macroblock stuffing results in proper byte alignment for the following macroblock data, a total of seven extra bytes of data is required for each slice in this example. An additional complication with this technique is that macroblock stuffing is not supported in the MPEG-2 video standard, so encoding a P-frame as an MPEG-2 video data file could not use this approach. The current invention circumvents this limitation.
Optimized Left-Margin Padding for P-Frames
Thus, an important aspect of the current invention is a technique for inserting left-margin padding when generating the P-frame data from a Q-frame data file, such that the encoded macroblock data for each slice can be copied in a byte-aligned manner, that is, without bit shifting, from the Q-frame data source into the corresponding P-frame data buffer. This section describes how this is accomplished.
Two features of MPEG macroblock video encoding are used to accomplish this. The first feature is the interchange of the “Intra with Quant” and “Intra” macroblock encoding types for a P-frame. When a macroblock is encoded as “Intra with Quant”, the type field is 6 bits, and the quantizer field is 5 bits. When the same macroblock is encoded as “Intra”, the type field is 5 bits, and no quantizer field appears.
The second feature is the ability to insert “zero-motion-vector” macroblocks at any padding position. Encoding a macroblock as type “motion-compensated, no data” and motion vectors of (0,0), which is done in the above examples with the first macroblock in each slice, simply copies the macroblock from the reference image. Encoding such a macroblock requires 5 bits, plus the macroblock address increment field.
These two features can be combined in appropriate ways to permit the insertion of any (left-hand) margin by a combination of bits that permits the following encoded macroblock data to be copied from the Q-frame data buffer into the P-frame data buffer without any bit shifting. How this is accomplished can be illustrated with a few examples.
First, suppose the sub-image data is to be positioned 16 columns from the left edge of the frame. This requires one padding macroblock to be inserted into the P-frame data stream at the beginning of each slice containing sub-frame image data. This can be accomplished by inserting a zero-motion-vector macroblock as the first macroblock in the slice (encoded as 1 001 1 1), then changing the encoding type of the first macroblock in the encoded sub-image data from type “Intra with Quant” (000001) to “Intra” (00011). The resulting data stream is
qqqqq010 01111000 11xxxxx xxxxxxxx (6)
which can be compared with the original few bytes [from (1) above] of the Q-frame slice data
qqqqq010 00001qqq qqxxxxx xxxxxxxx (7)
to observe that in the new P-frame data, the first byte is identical to the original, the second byte has been changed, and the third byte has had the leading two bits set to 11. All remaining bytes of the original data are copied without modification.
As another example, consider the case where the sub-image data is placed 64 columns from the left edge of the video frame. Rather than using the previous encoding depicted in (3) above, the initial portion of the slice is encoded using two consecutive zero-motion-vector macroblocks, a macroblock address increment of 3, and an encoding type of “Intra”. The resulting bit pattern is
qqqqq010 01111001 11010000 11xxxxx xxxxxxxx (8)
which changes the second byte of the original data, inserts a single new byte, and changes the leading bits of the third byte of the original data, leaving the remainder of the original data unchanged.
Using these two techniques (inserting one or more zero-motion-vector blocks, and modifying the encoding type of the first macroblock in the sub-frame data), every possible border width can be encoded such that in creating the P-frame data for the macroblock slice, the first byte of the original macroblock data from the Q-frame data buffer is unchanged; the second byte is modified; zero or more bytes are inserted; the third byte is either unchanged, or has the leading two bits set to 11; and the remainder of the Q-frame data for the slice is copied unchanged. Table 2 depicts the appropriate encoding technique for each possible border width. In the contents listing, inc signals the macroblock address increment, ZMB (zero-motion-vector macroblock), Intra signals a macroblock encoding type of “Intra”, and Quant signals a macroblock encoding type of “Intra/Quant”
Note that the encodings given in Table 2 are not the only encodings that produce the desired characteristic that most of the macroblock encoding data can be copied directly from the Q-frame data buffer to the P-frame data buffer without bit shifts. For example, if the left margin is 64 columns (four macroblocks), the left margin could be encoded by any of the following patterns:
inc 1, ZMB, inc 3, Quant
inc 1, ZMB, inc 1, ZMB, inc 2, Intra
inc 1, ZMB, inc 2, ZMB, inc 1, Intra
The encodings in Table 2 were selected for minimal number of inserted bytes, and the preferential use of “Intra with Quant” encoding.
Right-Side Padding
When creating the P-frame slice content from the Q-frame data, each slice containing Q-frame sub-image data must conclude on the right-most macroblock of the video frame. This may require the insertion of additional macroblock data at the end of the original Q-frame encoded macroblock data. Since the data for any given sequence of encoded macroblocks may end at an arbitrary bit boundary, the trailing data for the slice must be bit-aligned with the data from the Q-frame. For example, if a given Q-frame macroblock slice contains 0x103 bits, the last byte of the Q-frame slice data will contain three encoding bits, and 5 (zero) padding bits, as follows:
qqqqq010 . . . xxxxxxxx xxx0000 (9)
When additional macroblock data is appended to this data, the first new bit of the added data must be placed at the underlined position, with subsequent bits following.
The data appended to the end of each slice is constructed by inserting the appropriate macroblock address increment, followed by a macroblock encoded as type “motion vector, no data” with zero motion vectors. The required bit pattern can be constructed from the macroblock address increment table (ISO/IEC 11171-2), and is shown in the following table.
Appending the data patterns from Table 3 requires bit shifts and bit-wise OR operations to combine the original Q-frame encoded data with the padding macroblock data.
An additional observation can be made about the process for converting Q-frame data into the corresponding P-frame MPEG data file. Each empty slice in the P-frame requires the same amount of data (nine bytes, four for the slice start code and five to encode the quantizer, the first ZMB, the address increment of 44, and the last ZMB). The padding at the beginning of each slice containing sub-image data will be at most four bytes more than the original Q-frame data, while the padding at the end of the slice will be at most three bytes more than the original Q-frame data. The slice start code for each slice containing sub-image data will be four bytes, two more than the size of the slice-length field in the Q-frame data. A four-byte sequence end code is appended to the P-frame data. The size of each Q-frame slice is known at encoding time, so the size of the entire P-frame data buffer can be computed as:
This size is pre-computed and placed in the header of the Q-frame data file. When a P-frame data buffer is created from the Q-frame contents, the maximum possible size of the P-frame buffer can be used to pre-allocate the required memory. The contents of the Q-frame header are given in Table 4.
MPEG-2 Image Encoding
As stated above, the Q-frame file could utilize an MPEG-2 header, and the resulting P-frame file would be MPEG-2 compliant.
Support for Various Macroblock Encoding Types
The encoding of the sub-frame image content need not be constrained to any particular encoding type. For instance, the quantizer value can change from macroblock to macroblock. In this case, the quantizer value for each slice should equal the quantizer for the first encoded macroblock, so that if the macroblock quantizer is eliminated, the correct slice quantizer value is used when decoding the first macroblock.
The preferred embodiment describes the case where a Q-frame is encoded independent of the contents of any previous reference frame. This is not an essential limitation. A series of sub-frame images could be encoded, in which each image is coded relative to the previous reference sub-frame image. In this case, the encoding could include motion-compensated encoding for a macroblock. The first (left-most) macroblock in each macroblock row must be encoded using Intra coding only, but the remaining macroblocks in each row could utilize any MPEG encoding technique valid for a macroblock in a P-frame image.
Ensuring Byte-Alignment of the End of Slice Macroblock Data
The Q-frame encoder could manipulate the encoding of the last (right-most) macroblock in each row, by incorporating additional coefficients, or using alternate escape encoding of run-level values, to ensure that the number of bits in the row is an even multiple of eight, that is, that the data for the macroblock row exactly fills an integer number of bytes. In this case, the padding applied to the right border would also be byte-aligned, further simplifying the creation of the P-frame data.
Combining Q-Frame Sub-Images in a P-Frame Image
As a further alternative, the Q-frame encoder could ensure that each macroblock row filled an integer number of bytes, plus 6 bits, by incorporating additional coefficients into one or more macroblocks, or by using alternate escape encoding of run-level values. In this case, a second Q-frame image could be appended to the first image, optionally with padding between the images, using the techniques described above.
Combining two sub-images in this way would require mask and bit-OR operations on the last byte of the first sub-image and the first byte of the second sub-image, but the remainder of the operations on the second sub-image data would be carried out as described in the preferred embodiment.
Incorporating Motion of the Background Image
In some cases, such as the example shown in
Encoding Motion Video Using Sub-Frames
In the preferred embodiment, the Q-frame technique is described in conjunction with the use of single still-frame images. However, the technique could be extended to create streaming video representations of sub-frame video. Each frame of the sub-frame video would be encoded as a Q-frame; the Q-frame would then be transferred to the display system, and the corresponding P-frame reconstructed. In this case, motion-based forward prediction could be used (except for the left-most macroblock in each row, as noted above) to reduce the total bit count for the image as desired. Provided the first macroblock in each row is encoded as type “Intra with Quant”, the generated P-frame would still be a valid MPEG video frame. The remaining macroblocks in each row after the first would simply be decoded according to the normal rules for MPEG (or MPEG-2) decoding, with the result that the sub-frame video images would appear on the display in sequence. Necessarily, this would require sufficient processing power in the STB to regenerate each P-frame as required, but the current invention reduces the computational steps required to accomplish this task.
While the preferred embodiment of the invention has been illustrated and described, as noted above, many changes can be made without departing from the spirit and scope of the invention. For example, the present invention may be operable with video or image compression schemes other than MPEG or MPEG-2. Accordingly, the scope of the invention is not limited by the disclosure of the preferred embodiment. Instead, the invention should be determined entirely by reference to the claims that follow.
This application claims priority to provisional patent application Ser. No. 60/682,030, filed May 16, 2005 and is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60682030 | May 2005 | US |