1. Field of the Invention
This invention pertains in general to use video compression technology to encode and decode information, and in particular, to provide an enhanced hybrid video coder using bitmap-mode coding.
2. Description of the Related Art
Video compression is useful for transmission of digital video over a variety of bandwidth-limited networks, or for storage constrained applications. For example, the broadcast transmission of digital video at 24-bit per pixel sampled at 720 by 480 spatial resolution and 30 frames per second (fps) temporal resolution would require a bit rate of above 248 Mbps! Taking another example of supporting web browser applications with rich media content in a client-server architecture within a wireless network, bandwidth limitations of the wireless network itself may comprise one of the major limiting factors in fully utilizing the client-server architecture. Client devices, such as mobile phones, may additionally be resource-constrained with respect to the device's capabilities, including processing power, memory and battery life limitations. Compounding this, web browser applications are continually embracing rich media content, such as digital video and audio, which in turn poses further challenges for a client-server architecture. For applications such as digital television broadcasting, satellite television, Internet video streaming, video conferencing and video security over a variety of networks, limited transmission bandwidth or storage capacity stresses the demand for higher video compression ratios.
To improve compression efficiency, currently available coding standards, such as MPEG-1, MPEG-2, MEPG4 and H.264/AVC etc., remove information redundancy spatially within a video frame and temporally between video frames. The goal of video compression systems is to achieve the best fidelity (or the lowest distortion D) given the capacity of a transmission channel, subject to the coding rate constraint R(D). However, this optimization task is complicated by the fact that various coding options show varying efficiency with different scene content and at different bit rates.
One limitation with conventional hybrid video coders such as a H.264 video coder is inefficiency at removing encoding noise especially around sharp edges. Such encoding noise, such as the mosquito artifacts, is easily noticeable to human eyes especially when browsing images that often contains text with a simple background, such as black text on a white background. For browsing images, the text embedded in the images needs to be encoded with high fidelity, but the simple background can afford more compression since not much data are contained in the background. To allow regions of an input picture to be represented without any loss of fidelity, H.264 video coding standard includes a “PCM” macroblock mode, in which the values of input pixels are sent directly from an encoder to a decoder without prediction, transformation or quantization. Additional motivation for this macroblock mode is to impose a minimum upper bound on the number of bits that can be used to represent a macroblock with sufficient accuracy. However, the PCM mode is not efficient to deal with encoding noise because an encoder can only choose to use PCM for high quality, at the expense of higher bit rate, or not use PCM for less high quality with a controlled bit rate for the encoder.
Hence, there is, inter alia, a lack of a system and method that provides an enhanced hybrid encoder within a video processing system.
The needs described above are addressed by a method and system for compressing video frames with optimization. In one embodiment, the system comprises an encoding unit for encoding a video frame in bitmap mode, a decoder for decoding in bitmap mode and a bitmap filtering unit. The encoding unit encodes data for transmission in response to a signal at the filtering unit indicating bitmap-mode encoding. The bitmap filtering unit is configured to determine the bitmap-mode encoding by filtering the video frame. The bitmap filtering unit generates a bitmap of the video frame by extracting the most significant bits of each pixel of the video frame and further generates DCT-type data of the frame by further manipulating the pixels of the frame. The encoding unit is configured to losslessly encode the bitmap data using a bitmap encoder and to lossy encode the DCT-type data of the frame using a DCT encoder, such as a H.264 video encoder. A bitstream generator of the encoding unit combines the encoded bitmap data and DCT-type data into an encoded bitstream for transmission and decoding. The decoder for decoding in bitmap-mode decodes the encoded bitstream generated in bitmap-mode encoding using a bitmap extractor, a bitmap decoder, a DCT decoder, and inverse bitmap filtering unit. The present invention also includes method for encoding and decoding corresponding to the encoding unit and the decoder of the system
The figures depict an embodiment for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
In one embodiment, the video input 110 comprises a sequence of video frames and each video frame includes of blocks of raw video signals/samples in an uncompressed format. The video input 110 may be received from a variety of video sources, such as a television station, a camcorder, a CD, a DVD, a network, a video database, or a volatile or non-volatile memory. Further, the video input 110 may be received in an analog format and converted to a digital format by an analog-to-digital converter before being processed by the encoding unit 350. In another embodiment, the video input 110 comprises a plurality of pixel-based browsing images of a video source displayed on a display screen or stored in a frame buffer.
The communications means 400 enables communications between the encoding unit 350 and the decoding unit 500. In one embodiment, the communications means 400 uses standard communications technologies and/or protocols. Thus, the communications means 400 may include fixed links using technologies such as Ethernet, integrated services digital network (ISDN), digital subscriber line (DSL), asynchronous transfer mode (ATM), or other fixed links technologies. The communications means 400 may also support mobile access using technologies such as Wideband Code Division Multiple Access (W-CDMA), CDMA200, Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), or similar technologies. Further, the communications means 400 may include wireless access using technologies, such as Wireless Local Area Network (W-LAN), Worldwide Interoperability for Microwave Access (WiMAX), or other wireless technologies.
Similarly, the networking protocols used on the communications means 400 may include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the session initiation protocol (SIP), the session description protocol (SDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), or any other suitable protocol. The data exchanged over the communications means 400 may be represented using technologies and/or formats including the hypertext markup language (HTML), the extensible markup language (XML), or any other suitable format. In addition, all or some of links may be encrypted using conventional encryption technologies, such as the secure sockets layer (SSL), Secure HTTP and/or virtual private networks (VPNs) or Internet Protocol security (IPsec). For example, for encoding sensitive data such as a user's personal bank statement displayed by the user's on-line banking system, the encoding unit 350 may encrypt the video channel to carry the encoded bitstream before sending it over the video channel. In one embodiment, an encryption unit may reside in the encoding unit 350 to encrypt the encoded bitstream. In another embodiment, the communications between the encoding unit 350 and the decoding unit 500 may use custom and/or dedicated data communications technologies instead of, or in addition to, the ones described above.
To enhance the encoding performance by conventional video encoders, the video processing system 100 uses bitmap-mode coding that deploys the optimized usage of conventional PCM mode. In contrast to the conventional PCM mode operation where a macroblock of a video frame is either encoded losslessly using the PCM mode, or lossy encoded without using PCM mode for higher compression ratio, the bitmap-encoding mode enables the encoding unit 350 to flexibly encode part of the video frame data lossless and encode the rest of the frame data lossy. As such, the lossless encoding of the frame provides high quality of important data of the frame, and at the same time maintains high compression ratio on less important data of the frame.
In one embodiment, compression is handled by dedicated hardware with a very low latency encoder. In another embodiment, image of a static web page without embedded or complex text/graphics can be compressed by a software encoder as a video frame. Other embodiments may implement the encoding unit 350 in both hardware and/or software. Other embodiments perform different and/or include additional modules than the ones described here. For example, the encoding unit 350 may comprise a frame buffer for reference frames management.
Turning now to the individual entities illustrated in
In response to a video frame to be encoded in bitmap mode, the bitmap filtering unit 326 filters the video frame in various ways to generate: 1) a bitmap for the video frame; and 2) filtered frame sample data of the video frame for further encoding. For the purpose of illustration of one embodiment, the filtered frame sample data of the video frame will be referred to as DCT-type data hereinafter. After filtering the video frame, the bitmap filtering unit 326 sends an indication of bitmap-mode encoding, the bitmap and the DCT-type data of the video frame to the video encoder 356 for further processing. The bitmap data path from the bitmap filtering unit 326 to the video encoder 356 is represented by the dashed line 370 and the DCT-type data path by the solid line 372 in
In one embodiment of the bitmap-mode encoding, for each pixel of the video frame, the bitmap filtering unit 326 extracts the most significant bit (MSB) of the pixel and generates the bitmap of the video frame by collecting these MSB bits from each pixel in the same order as the pixels are extracted. In other embodiments, a portion of variable sizes of frame pixels are used for bitmap generation. To generate the DCT-type data of the video frame, the bitmap filtering unit 326, in one embodiment, replaces the MSB of each pixel of the video frame with a zero. In another embodiment, the bitmap filtering unit 326 picks two integer values corresponding to the two dominant colors of the video frame, and replaces one value with another one. In this case, the bitmap data is bi-level run data that is a series of runs between replaced and static pixels. Each time the filter of the bitmap filtering unit 326 toggles between replacing a pixel and not replacing a pixel, the run between such states is encoded. For example, for a video frame of black text with white background, the bitmap filtering unit 326 may pick value 1 for white pixel and value 2 for black pixel, and replace the black pixels by white pixels after extracting the bitmap of the video frame. As such, the DCT-type data after filtering are all white pixels which can be compressed with higher compression ratio. It is also noted that, in other embodiments, more than two colors (i.e., black and white) can be used to generate bitmap data of the video frame.
In another embodiment, the bitmap filtering unit 326 may invert the pixel value for values above 127 after taking its MSB to avoid creating unnecessary energy in the pixel. Taking 8-bit pixels as an example, blindly setting the MSB to zero for a pixel may result a sawtooth like pattern in terms of pixel energy when a pixel value approaching 128 from 127. For an image that has smooth gradients around 128, unnecessary energy is created by the filtering process. By inverting the pixel value as described, the bitmap filtering unit 326 enables the filtered data after bitmap extraction to be compressed with high compression ratio but without creating unnecessary energy. In another embodiment, there can be more one region of bi-level in a video frame. For each region of interest, a decoder just needs to know the foreground color to replace back into the decoded frame. More specifically, an encoder can send a pixel value to use as the foreground color as often as needed. If there is only one value for the whole frame, it only has to send the foreground color pixel once.
In response to receiving the indication of bitmap-mode encoding from the bitmap filtering unit 326, the bitmap and DCT-type data of the video frame, the video encoder 356 encodes the bitmap and DCT-type data of the video frame into a video stream and sends it to its corresponding decoder over the network.
In response to encode the video frame in non-bitmap-mode, the video frame is processed by an encoding module 1120 of the video encoder 356. In one embodiment, the encoding module 1120 employees the conventional encoding algorithm, such as H.264 encoding, to encode the video frame. For example, in one embodiment, the encoding unit 350 employs the major components used in the H.264 video compression standard. More specifically, the video encoder 356 uses the DCT-like forward and reverse transforms on prediction residuals. The video encoder 356 also uses the flexible macroblock sizes ranging from 16×16 to 4×4. The various macro and sub-macroblock sizes allow fine-grained tuning of the blocks to the content being encoded. Other H.264 components, such as logarithmic quantizer, intra and inter prediction and context adaptive entropy encoding may also be used to perform the compression. In one embodiment, the encoding unit 350 also includes modules such as conventional rate controller and error recovery module used in conventional video coding standards as part of an optimized encoder control. Those skilled in the art will recognize that H.264 is used only by way of example and that a variety of other encoding and compression schemes may be used. In another embodiment, the video frame is encoded by the DCT encoder 1106. In yet another embodiment, the DCT encoder 1106 and the encoding module 1120 may be combined into one encoding function entity to encode DCT-type data in bitmap-mode encoding and/or original input video frame in non-bitmap-mode encoding.
For a video frame to be encoded in bitmap mode, the bitmap map encoder 1104 receives the bitmap data of the video frame from the bitmap filtering unit 326 (the dashed line) and the DCT encoder 1106 receives the DCT-type data of the video frame from the bitmap filtering unit 326 (the solid line). In one embodiment, the bitmap encoder 1104 encodes the bitmap using run-length encoding algorithm such as unsigned exp-golomb codes to process the data in scanline order. In other embodiments, other encoding algorithms known to those skilled in the art may be used to encode the bitmap. For the DCT-type data of the video frame, the DCT encoder 1106 encodes the DCT-type data as the conventional video encoder such as a H.264 video encoder. More specifically, the DCT encoder 1106 applies DCT-like transformation to the DCT-type data, quantization procedure on the DCT transform coefficients and entropy encoding of the quantization coefficients. In another embodiment, the DCT encoder 1106 and the encoding module 1120 may be combined into one encoding function entity to encode DCT-type data in bitmap-mode encoding and/or original input video frame in non-bitmap-mode encoding.
The bitstream generator 1108 combines the encoded bitmap data from the bitmap encoder 1104 and the encoded DCT-type data from the DCT encoder 1106 to generate the output video stream. In one embodiment, the bitstream generator 1108 extends the conventional H.264 encoded bitstream format to combine the encoded bitmap and DCT-type data. More specifically, a video bitstream after encoding in H.264 comprises a sequence of network access layer (NAL) units. A typical sequence of H.264 NALs may look like the following:
NAL1=SPS (sequence parameter set), NAL2=PPS (picture parameters set), and NAL3=Slice Data (picture data).
The bitstream generator 1108 may extend the above sequence into one like this:
NAL0=bitmap, NAL1=SPS, NAL2=PPS, and NAL3=Slice Data (picture data),
where NAL0 is typically not used by a conventional H.264 codec, and “bitmap” is the encoded bitmap data of the video frame and the rest of NALs stores the encoded DCT-type data as a typical H.264 encoder does.
Taking the example of the encoded bitstream generated by the video encoder 1100 above, the bitmap-mode video decoder 1200 receives the encoded bitstream 1202 in the following sequence of NALs from the encoder 1100,
NAL0=bitmap, NAL1=SPS, NAL2=PPS, and NAL3=Slice Data (picture data).
The bitmap extractor 1204 strips out NAL0 and sends it to the bitmap decoder 1206. The bitmap extract 1204 sends the rest of the NALs, i.e., NAL1, NAL2, and NAL3 to the DCT decoder 1208. In one embodiment, the bitmap decoder 1206 recreates the original bitmap using run-length decoding algorithm. The DCT decoder 1208 decodes the DCT-type data embedded in the NAL1-3 using conventional H.264 decoding procedures, such as entropy decoding, inverse quantization, followed by inverse DCT transform. The inverse bitmap filtering unit 1210 receives the recreated bitmap and DCT-type data of the video frame from the bitmap decoder 1206 and the DCT decoder 1208 and combines the received data in reverse operation of the bitmap filtering unit 326 at the video encoder 356. For example, if the MSB of each pixel is replaced by a zero at the bitmap filtering unit, the inverse bitmap filtering unit 1210 replaces the zero with the recreated MSB. The reconstruction unit 1212 reconstructs the video capture using the data from the inverse bitmap filtering unit. Due to the flexible hybrid lossless and lossy encoding and decoding of the video capture described above, the reconstructed video capture closely resembles the original input video frame.
The above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the invention. The scope of the invention is to be limited only by the following claims. From the above discussion, many variations will be apparent to one skilled in the relevant art that would yet be encompassed by the spirit and scope of the invention.
The present application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application No. 60/863,888, filed on Nov. 1, 2006, entitled “CONTENT ACCESS USING COMPRESSION” which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5708511 | Gandhi et al. | Jan 1998 | A |
5727159 | Kikinis | Mar 1998 | A |
5790269 | Masaki et al. | Aug 1998 | A |
5821915 | Graham et al. | Oct 1998 | A |
6008847 | Bauchspies | Dec 1999 | A |
6038257 | Brusewitz et al. | Mar 2000 | A |
6266817 | Chaddha | Jul 2001 | B1 |
6275534 | Shiojiri | Aug 2001 | B1 |
6282240 | Fukunaga et al. | Aug 2001 | B1 |
6285791 | Bjorklund | Sep 2001 | B1 |
6292834 | Ravi et al. | Sep 2001 | B1 |
6366298 | Haitsuka et al. | Apr 2002 | B1 |
6397230 | Carmel et al. | May 2002 | B1 |
6496203 | Beaumont et al. | Dec 2002 | B1 |
6529552 | Tsai et al. | Mar 2003 | B1 |
6563517 | Bhagwat et al. | May 2003 | B1 |
6578201 | LaRocca et al. | Jun 2003 | B1 |
6584493 | Butler | Jun 2003 | B1 |
6704024 | Robotham et al. | Mar 2004 | B2 |
6909753 | Meehan et al. | Jun 2005 | B2 |
6990534 | Mikhailov et al. | Jan 2006 | B2 |
7016963 | Judd et al. | Mar 2006 | B1 |
7043745 | Nygren et al. | May 2006 | B2 |
7054365 | Kim et al. | May 2006 | B2 |
7088398 | Wolf et al. | Aug 2006 | B1 |
7116843 | Wensley et al. | Oct 2006 | B1 |
7257158 | Figueredo et al. | Aug 2007 | B1 |
7483575 | Fukuhara et al. | Jan 2009 | B2 |
7617110 | Kim et al. | Nov 2009 | B2 |
7821953 | Yarlagadda et al. | Oct 2010 | B2 |
8018850 | Van Beek et al. | Sep 2011 | B2 |
20020015532 | Kostrzewski et al. | Feb 2002 | A1 |
20020041629 | Hannuksela | Apr 2002 | A1 |
20020059368 | Reynolds | May 2002 | A1 |
20020067353 | Kenyon et al. | Jun 2002 | A1 |
20020122491 | Karczewicz et al. | Sep 2002 | A1 |
20020131083 | Hamzy et al. | Sep 2002 | A1 |
20020146074 | Ariel et al. | Oct 2002 | A1 |
20020196853 | Liang | Dec 2002 | A1 |
20030020722 | Miura | Jan 2003 | A1 |
20030039312 | Horowitz et al. | Feb 2003 | A1 |
20030046708 | Jutzi | Mar 2003 | A1 |
20030079222 | Boykin et al. | Apr 2003 | A1 |
20030122954 | Kassatly | Jul 2003 | A1 |
20030132957 | Ullmann et al. | Jul 2003 | A1 |
20030138050 | Yamada et al. | Jul 2003 | A1 |
20030177269 | Robinson et al. | Sep 2003 | A1 |
20030198184 | Huang et al. | Oct 2003 | A1 |
20030202697 | Simard | Oct 2003 | A1 |
20030227977 | Henocq | Dec 2003 | A1 |
20040022322 | Dye | Feb 2004 | A1 |
20040067041 | Seo et al. | Apr 2004 | A1 |
20040083236 | Rust | Apr 2004 | A1 |
20040109005 | Witt et al. | Jun 2004 | A1 |
20040184523 | Dawson et al. | Sep 2004 | A1 |
20040217980 | Radburn et al. | Nov 2004 | A1 |
20050052294 | Liang et al. | Mar 2005 | A1 |
20050081158 | Hwang | Apr 2005 | A1 |
20050089092 | Hashimoto et al. | Apr 2005 | A1 |
20050100233 | Kajiki et al. | May 2005 | A1 |
20050105619 | Lee et al. | May 2005 | A1 |
20050132286 | Rohrabaugh et al. | Jun 2005 | A1 |
20050147247 | Westberg et al. | Jul 2005 | A1 |
20050195899 | Han | Sep 2005 | A1 |
20050232359 | Cha | Oct 2005 | A1 |
20050257167 | Fraleigh et al. | Nov 2005 | A1 |
20050267779 | Lee et al. | Dec 2005 | A1 |
20050283734 | Santoro et al. | Dec 2005 | A1 |
20060018378 | Piccinelli et al. | Jan 2006 | A1 |
20060069797 | Abdo et al. | Mar 2006 | A1 |
20060078051 | Liang et al. | Apr 2006 | A1 |
20060095944 | Demircin et al. | May 2006 | A1 |
20060098738 | Cosman et al. | May 2006 | A1 |
20060150224 | Kamariotis | Jul 2006 | A1 |
20060168101 | Mikhailov et al. | Jul 2006 | A1 |
20060170571 | Martinian | Aug 2006 | A1 |
20060174026 | Robinson et al. | Aug 2006 | A1 |
20060174614 | Dong et al. | Aug 2006 | A1 |
20060184614 | Baratto et al. | Aug 2006 | A1 |
20060210196 | Wensley et al. | Sep 2006 | A1 |
20060218285 | Talwar et al. | Sep 2006 | A1 |
20060233246 | Park et al. | Oct 2006 | A1 |
20060256380 | Klassen et al. | Nov 2006 | A1 |
20060277478 | Seraji et al. | Dec 2006 | A1 |
20060282855 | Margulis | Dec 2006 | A1 |
20060285594 | Kim et al. | Dec 2006 | A1 |
20060291561 | Seong et al. | Dec 2006 | A1 |
20070005795 | Gonzalez | Jan 2007 | A1 |
20070071100 | Shi | Mar 2007 | A1 |
20070098283 | Kim | May 2007 | A1 |
20070116117 | Tong et al. | May 2007 | A1 |
20070121720 | Yamane et al. | May 2007 | A1 |
20070250711 | Storey | Oct 2007 | A1 |
20070277109 | Chen et al. | Nov 2007 | A1 |
20080062322 | Dey et al. | Mar 2008 | A1 |
20080065980 | Hedbor | Mar 2008 | A1 |
20080071857 | Lie | Mar 2008 | A1 |
20080158333 | Krisbergh et al. | Jul 2008 | A1 |
20090219992 | Wang | Sep 2009 | A1 |
20090245668 | Fukuhara et al. | Oct 2009 | A1 |
Number | Date | Country |
---|---|---|
2003-134362 | May 2003 | JP |
2003-259310 | Sep 2003 | JP |
2006-270690 | Oct 2006 | JP |
WO 2005081528 | Sep 2005 | WO |
Entry |
---|
Hsieh, M. et al., “Stateful Session Handoff for Mobile WWW,” Information Sciences 2005 [online] [Retrieved on Apr. 10, 2008] Retrieved from the Internet<URL:http://64.233.179.104/scholar?num=30&h1=en&lr=&q=cache: hiW5F6of2CUJ:140.115.51.197/web/PaperManage/Paper/Stateful%2520session%2520handoff%2520for%2520mobile%2520WWW.pdf>. |
PCT International Search Report and Written Opinion, PCT/US07/83218, Jun. 12, 2008, 7 pages. |
PCT International Search Report and Written Opinion, PCT/US07/83214, Apr. 30, 2008, 7 pages. |
PCT International Search Report and Written Opinion, PCT/US07/83203, Apr. 3, 2008, 9 pages. |
Hsieh, M-D. et al., “Stateful Session Handoff for Mobile WWW,” Revised Form Jan. 27, 2005, Accepted Feb. 26, 2005, Information Sciences, Elsevier, pp. 1241-1265, vol. 176. |
Hsieh, M. et al., “Stateful Session Handoff for Mobile WWW,” Information Sciences 2005, [online] [Retrieved Apr. 10, 2008] Retrieved from the Internet<URL:http:140.115.51.197/web/PaperManage/Paper/Stateful%20session%20h andoff%20for%20mobile%20WWW.pdf>. |
“NewFront Browser v3.4,” Access Co., Ltd., Nov. 30, 2006 [online] [Retrieved on Jul. 11, 2008] Retrieved from the Internet<URL:http://www.access.company.com/PDF/NetFront/120406—NFv34.pdf>. |
PCT International Search Report and Written Opinion, PCT/US08/52129, Jul. 23, 2008, 7 pages. |
PCT International Search Report and Written Opinion, PCT/US08/52092, Jul. 14, 2008, 9 pages. |
Warabino, T. et al., “Video Transcoding Proxy for 3Gwireless Mobile Internet Access,” IEEE Communications Magazine, Oct. 2000, pp. 66-71. |
Japanese Office Action, Japanese Application No. 2009-534948, Feb. 18, 2013, 4 pages. |
Japanese Office Action, Japanese Application No. P2009-534948, Jul. 9, 2013, 6 pages. |
Number | Date | Country | |
---|---|---|---|
60863888 | Nov 2006 | US |