This disclosure relates to video encoding and decoding.
Digital video streams may represent video using a sequence of frames or still images. Digital video can be used for various applications including, for example, video conferencing, high definition video entertainment, video advertisements, or sharing of user-generated videos. A digital video stream can contain a large amount of data and consume a significant amount of computing or communication resources of a computing device for processing, transmission or storage of the video data. Various approaches have been proposed to reduce the amount of data in video streams, including compression and other encoding techniques.
Implementations of systems, methods, and apparatuses for encoding and decoding a video signal using transform-domain intra prediction are disclosed herein.
One aspect of the disclosed implementations is a method for encoding a video including a plurality of frames having a plurality of blocks including a current block. The method includes generating, using a two-dimensional transform, a set of transform coefficients for the current block; generating, using a one-dimensional transform, a set of transform coefficients for a plurality of previously coded pixel values in the frame; determining, using the set of transform coefficients for the previously coded pixel values, a set of transform coefficients for a prediction block; determining a residual based on the difference between the set of transform coefficients for the current block and the set of coefficients for the prediction block; and encoding the residual.
Another aspect of the disclosed implementations is a method for decoding a frame in an encoded video stream, the frame having a plurality of blocks including a current block. The method includes decoding a residual; generating, using a one-dimensional transform, a set of transform coefficients for a plurality of previously decoded pixel values in the frame; generating, using the set of transform coefficients for the previously decoded pixel values, a set of transform coefficients for a prediction block; determining a set of transform coefficients for the current block based on sum of the residual and the set of transform coefficients for the prediction block; and inverse transforming the set of transform coefficients for the current block;
Another aspect of the disclosed implementations is an apparatus for encoding a frame in a video stream having a plurality of blocks including a current block. The apparatus includes a memory and a processor configured to execute instructions stored in the memory to generate, using a two-dimensional transform, a set of transform coefficients for the current block; generate, using a one-dimensional transform, a set of transform coefficients for a plurality of previously coded pixel values in the frame; determine, using the set of transform coefficients for the previously coded pixel values, a set of transform coefficients for a prediction block; determine a residual based on the difference between the set of transform coefficients for the current block and the set of coefficients for the prediction block; and encode the residual.
Variations in these and other aspects will be described in additional detail hereafter.
The description herein makes reference to the accompanying drawings wherein like reference numerals refer to like parts throughout the several views, and wherein:
Digital video is used for various purposes, including, for example, remote business meetings via video conferencing, high definition video entertainment, video advertisements, or sharing of user-generated videos. Video encoding and decoding (codec) can use various compression schemes. These compression schemes may include breaking a video image into blocks and generating a digital video output bitstream using one or more techniques to limit the information included in the output. A received bitstream can be decoded to re-create the blocks and the source images from the limited information.
Encoding a video stream, or a portion thereof, such as a frame or a block, can include using temporal and spatial similarities in the video stream to improve coding efficiency. For example, a current block of a video stream may be encoded based on a previously encoded block in the video stream by predicting motion and color information for the current block based on the previously encoded block and identifying a difference (residual) between the predicted values and the current block.
Intra prediction can include using a previously encoded block from the current frame to predict a block. In some instances, intra prediction, such as spatial-domain intra prediction, may be based on directional features, such as horizontal or vertical features within a block; however, intra prediction based on directional features can be inefficient or imprecise for predicting objects within a block. For example, intra prediction can produce sub-optimal predictions for blocks including pixel values that increase or decrease along directional lines.
Instead of, or in addition to performing spatial-domain intra prediction, intra prediction can be performed in the transform domain, in which the blocks of pixel values may be transformed into transform coefficients and intra prediction may be performed on the transform coefficients. In some implementations, transform-domain intra prediction may produce more accurate prediction results, may incur a lower overhead by, for example, using a reduced set of prediction modes, or may increase accuracy and lower overhead.
In some implementations, transform-domain intra prediction may include transforming a block of pixel values in a video stream into a block of transform coefficients. Pixel values in previously encoded blocks, such as a row in a block immediately above the current block, or a column in a block immediately to the left of the current block, can be transformed into transform coefficients and may be referred to as a set of “transform-domain predictors”. The transform-domain predictors can be used to determine a transform-domain prediction block for the current block. A residual can be calculated as the difference of the transform-domain prediction block and the transform coefficient block, and can be encoded in the output bitstream.
These and other examples are now described with reference to the accompanying drawings.
A network 108 connects transmitting station 102 and a receiving station 110 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in transmitting station 102 and the encoded video stream can be decoded in receiving station 110. Network 108 can be, for example, the Internet. Network 108 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), a cellular telephone network or any other means of transferring the video stream from transmitting station 102 to, in this example, receiving station 110.
Receiving station 110 can, in one example, be a computer having an internal configuration of hardware including a processor such as a CPU 112 and a memory 114. CPU 112 is a controller for controlling the operations of receiving station 110. CPU 112 can be connected to memory 114 by, for example, a memory bus. Memory 114 can be ROM, RAM or any other suitable memory device. Memory 114 can store data and program instructions that are used by CPU 112. Other suitable implementations of receiving station 110 are possible. For example, the processing of receiving station 110 can be distributed among multiple devices.
A display 116 configured to display a video stream can be connected to receiving station 110. Display 116 can be implemented in various ways, including by a liquid crystal display (LCD), a cathode-ray tube (CRT), or a light emitting diode display (LED), such as an OLED display. Display 116 is coupled to CPU 112 and can be configured to display a rendering 118 of the video stream decoded in receiving station 110.
Other implementations of the encoder and decoder system 100 are also possible. For example, one implementation can omit network 108 and/or display 116. In another implementation, a video stream can be encoded and then stored for transmission at a later time by receiving station 110 or any other device having memory. In one implementation, receiving station 110 receives (e.g., via network 108, a computer bus, or some communication pathway) the encoded video stream and stores the video stream for later decoding. In another implementation, additional components can be added to the encoder and decoder system 100. For example, a display or a video camera can be attached to transmitting station 102 to capture the video stream to be encoded.
At the next level, single frame 208 can be divided into a set of blocks 210, which can contain data corresponding to, in some of the examples described below, a 4×4 pixel group in frame 208. Block 210 can also be of any other suitable size such as a block of 16×8 pixels, a block of 8×8 pixels, a block of 16×16 pixels or of any other size. Depending on the application, block 210 can also refer to a subblock, which is a subdivision of a macroblock. Unless otherwise noted, the term ‘block’ can include a subblock, a macroblock, a segment, a slice, a residual block or any other portion of a frame. A frame, a block, a pixel, or a combination thereof can include display information, such as luminance information, chrominance information, or any other information that can be used to store, modify, communicate, or display the video stream or a portion thereof.
When video stream 200 is presented for encoding, each frame 208 within video stream 200 can be processed in units of blocks. Referring to
At intra/inter prediction stage 306, each block can be encoded using either intra prediction (i.e., within a single frame) or inter prediction (i.e. from frame to frame). In either case, a prediction block can be formed. The prediction block is then subtracted from the block of transform coefficients to produce a residual block (also referred to herein as residual).
Intra prediction (also referred to herein as intra-prediction or intra-frame prediction) and inter prediction (also referred to herein as inter-prediction or inter-frame prediction) are techniques used in modern image/video compression schemes. In the case of intra-prediction, a prediction block can be formed from samples in the current frame that have been previously encoded and reconstructed. In the case of inter-prediction, a prediction block can be formed from samples in one or more previously constructed reference frames.
The prediction block is then subtracted from the block of transform coefficients; the difference, i.e., the residual is then encoded and transmitted to decoders. Image or video codecs may support many different intra and inter prediction modes; each image block can use one of the prediction modes to provide a prediction block that is most similar to the block of transform coefficients to minimize the information to be encoded in the residual. The prediction mode for each block of transform coefficients can also be encoded and transmitted, so a decoder can use same prediction mode(s) to form prediction blocks in the decoding and reconstruction process.
Quantization stage 308 converts the residual into discrete quantum values, which are referred to as quantized transform coefficients, using a quantizer value or quantization levels. The quantized transform coefficients are then entropy encoded by entropy encoding stage 310. The entropy-encoded coefficients, together with other information used to decode the block, which can include for example the type of prediction used, motion vectors, and quantization value, are then output to compressed bitstream 320. Compressed bitstream 320 can be formatted using various techniques, such as variable length encoding (VLC) and arithmetic coding. Compressed bitstream 320 can also be referred to as an encoded video stream and the terms will be used interchangeably herein.
The reconstruction path in
Other variations of encoder 300 can be used to encode compressed bitstream 320. For example, a non-transform based encoder 300 can quantize the residual block directly without transform stage 304. In another implementation, an encoder 300 can have quantization stage 308 and dequantization stage 312 combined into a single stage.
Decoder 400, similar to the reconstruction path of encoder 300 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 416 from compressed bitstream 320: an entropy decoding stage 402, a dequantization stage 404, an intra/inter prediction stage 406, a reconstruction stage 408, an inverse transform stage 410, a loop filtering stage 412, and a deblocking filtering stage 414. Other structural variations of decoder 400 can be used to decode compressed bitstream 320.
When compressed bitstream 320 is presented for decoding, the data elements within compressed bitstream 320 can be decoded by the entropy decoding stage 402 (using, for example, Context Adaptive Binary Arithmetic Decoding) to produce a set of quantized transform coefficients. Dequantization stage 404 dequantizes the quantized transform coefficients (i.e., derivative residual). Using header information decoded from compressed bitstream 320, decoder 400 can use intra/inter prediction stage 406 to create the same prediction block as was created in encoder 300, e.g., at intra/inter prediction stage 306. At reconstruction stage 408, the prediction block can be added to the dequantized transform coefficients (i.e., derivative residual) to create reconstructed transform coefficients. At inverse transform stage 410, the reconstructed transform coefficients can be inverse transformed to produce a reconstructed block that can be identical to the block created by inverse transform stage 316 in encoder 300. Loop filtering stage 412 can be applied to the reconstructed block to reduce blocking artifacts. Deblocking filtering stage 414 can be applied to the reconstructed block to reduce blocking distortion. Output video stream 416 can also be referred to as a decoded video stream and the terms will be used interchangeably herein.
Other variations of decoder 400 can be used to decode compressed bitstream 320. For example, decoder 400 can produce output video stream 416 without deblocking filtering stage 414.
Method of operation 500 can be implemented using specialized hardware or firmware. Some computing devices can have multiple memories, multiple processors, or both. The steps of method of operation 500 can be distributed using different processors, memories, or both. Use of the terms “processor” or “memory” in the singular encompasses computing devices that have one processor or one memory as well as devices that have multiple processors or multiple memories that can each be used in the performance of some or all of the recited steps.
Implementations of method of operation 500 can include, for example, receiving a frame of video data including a current block at a step 502, generating a set of transform coefficients for the current block at a step 504, generating a set of transform coefficients for previously coded pixel values at a step 506, determining a set of transform coefficients for a prediction block at a step 508, determining a residual based on a difference between the set of transform coefficients for the current block and the set of transform coefficients for the prediction block at a step 510, and encoding the residual at a step 512.
At step 502, a frame of video data having multiple blocks, including a current block, can be received by a computing device, such as transmitting station 102. Received, as used herein, includes acquired, obtained, read, or received in any manner whatsoever. The video data or stream can be received in any number of ways, such as by receiving the video data over a network, over a cable, or by reading the video data from a primary memory or other storage device, including a disk drive or removable media such as a CompactFlash (CF) card, Secure Digital (SD) card, or any other device capable of communicating video data. In some implementations, video data can be received from a video camera connected to the computing device.
At step 504, a set of transform coefficients can be generated for the current block using a two-dimensional transform. The set of transform coefficients can be arranged in, for example, a 4×4 block of transform coefficients. The two-dimensional transform can use any transform technique such as the examples described at transform stage 304 in
In some implementations, the two-dimensional transform can be applied in a row-column transform order, where the transform technique may be applied to at least one row of pixel values of the current block to determine an intermediate transform block, and to at least one column of the intermediate transform block to determine the set of transform coefficients.
In some implementations, the two-dimensional transform can be applied in a column-row transform order, where the transform technique may be applied to at least one column of pixel values of the current block to determine the intermediate transform block, and to at least one row of the intermediate transform block to determine the set of transform coefficients.
At step 506, a set of transform coefficients can be generated for previously coded pixel values using a one-dimensional transform. Data available for use during intra prediction can include previously coded pixel values. In some codec schemes, such as the schemes that use raster scanned coding, data available for use during intra prediction can include data in previously coded blocks in the frame. The previously coded blocks used for intra prediction can include, for example, blocks in rows above the current block and blocks to the left of the current block in the same row. For simplicity, the following examples are described using data in the row immediately above and the column immediately to the left of the current block. In other implementations, data from rows or columns not immediately adjacent to the current block, including data from blocks that are not adjacent to the current block, can be used to generate the set of transform coefficients for intra prediction.
The previously coded pixel values for use during intra prediction can be transformed into a set of transform coefficients (which may be referred to herein as “transform-domain predictors”) using the one-dimensional transform. The one-dimensional transform can use any transform technique such as the examples described at transform stage 304 in
At step 508, a set of transform coefficients for a prediction block can be determined using the set of transform coefficients for the previously coded pixel values. The prediction block, which may be referred to as the “transform-domain prediction block”, can be generated using a set of transform coefficients, such as the set of transform coefficients (i.e., transform-domain predictors) generated at step 506 for the row immediately above and immediately left of the current block.
The set of transform coefficients for the transform-domain prediction block can be generated using the examples shown in
At step 510, a residual (i.e., a transform-domain residual) can be determined based on the difference between the set of transform coefficients for the current block and the set of transform coefficients for the prediction block.
At step 512, the residual can be encoded. For example, the residual can be quantized at quantization stage 308, entropy coded at entropy encoding stage 310, and may be stored or transmitted in the encoded video stream 320.
Method of operation 500 is depicted and described as a series of steps. However, steps in accordance with this disclosure can occur in various orders or concurrently. For example, the transform-domain predictors in step 506 can be generated before or concurrently with the set of transform coefficients for the current block in step 504. Additionally, steps in accordance with this disclosure may occur with other steps not presented and described herein. In one example, a group of coefficients (e.g., DC coefficients) from the transform-domain residual can be further transformed using a transform technique, which can be the same as or different from the two-dimensional transform used in step 504. Furthermore, not all illustrated steps may be required to implement a method of transform-domain intra prediction. For example, the two-dimensional transform used in step 504 can be replaced by any other transform such as, for example, a one-dimensional transform.
In this example, the current block 600 and the corresponding transform coefficient block 602 each have a set of 4×4 pixels, which can be represented by a 4×4 matrix.
In the spatial domain, the current block 600 can be represented by a 4×4 matrix Dblk as follows:
Dblk=[Dblk(i,j)],i=0,1,2,3;j=0,1,2,3; (1)
where Dblk(i,j) is the pixel value for data element (i,j) in matrix Dblk.
A prediction block of the current block can be represented by a 4×4 matrix Pblk in the spatial domain as follows:
Pblk=[Pblk(i,j)],i=0,1,2,3;j=0,1,2,3; (2)
where Pblk(i,j) is the predicted pixel value for data element (i,j). Pblk(0,0) may be a DC component of prediction block Pblk. Pblk(0,1)-Pblk(3,3) may be AC components of prediction block Pblk.
A residual block can be represented by a 4×4 residual error matrix Rblk in the spatial domain as follows:
Rblk=[Rblk(i,j)]=[Dblk(i,j)−Pblk(i,j)],i=0,1,2,3;j=0,1,2,3; (3)
where Rblk(i,j) is the residual error value for data element (i,j) in matrix Rblk. In this example, the residual error value is the difference of pixel values between Dblk(i,j) of the current block and Pblk(i,j) of the prediction block.
Still in the spatial domain, an array DT of 4 pixels can be used to represent the data in a row immediately above the current block as follows:
DT=[Ti],i=0,1, . . . 3. (4)
An array DL of 4 pixels can be used to represent the data in a column immediately left of the current block as follows:
DL=[Lj],j=0,1,2,3. (5)
In addition, DTLC can be used to represent the pixel above and to the left of the current block. DTLC and arrays DT and DL, which may be previously coded pixel values, can be used during intra prediction for predicting the current block. When transform-domain intra prediction is used, the previously coded pixel values DTLC, DT, and DL, which may be referred to as “spatial predictors”, can be transformed using the one-dimensional transform described at step 506 into a set of transform-domain predictors CT, CL, TLC, an example of which will be discussed below in equation (9) below.
In the transform domain, Dblk can be transformed into a transform coefficient block Cd (e.g., block 602 in
Cd=[Cd(i,j)]=DCT2(Dblk),i=0,1,2,3;j=0,1,2,3; (6)
where DCT2( ) is a two-dimensional DCT function.
In the transform domain, Pblk can be transformed into a 4×4 matrix Cp (i.e., transform-domain prediction block) as follows:
Cp=[Cp(i,j)]=DCT2(Pblk),i=0,1,2,3;j=0,1,2,3. (7)
In the transform domain, Rblk can be transformed into a 4×4 matrix Cr as follows:
Cr=[Cr(i,j)]=[Cd(i,j)−Cp(i,j)],i=0,1,2,3;j=0,1,2,3. (8)
As shown in
CT=DCT(DT)=[CT0,CT1,CT2,CT3];
CL=DCT(DL)=[CL0,CL1,CL2,CL3]; (9)
where DCT( ) represents a one-dimensional 4-point DCT function; CT0 is a DC coefficient of array CT; CT1, CT2, CT3 are AC coefficients of array CT; CL0 is a DC coefficient of array CL; and CL1, CL2, CL3 are AC coefficients of array CL.
Transform-domain predictor TLC can take the value of DTLC (the pixel value above and to the left of the current block) or a different value, such as a multiple of a scalar value. The transform-domain predictors CT, CL and TLC can be used for intra prediction of the transform coefficient block 602 represented by matrix Cd, as shown in
An array RAC can be used to represent scaled AC coefficients for transform coefficient array CT as follows:
RAC=[0,RAC1,RAC2,RAC3]=Kr*[0,CT1,CT2,CT3]; (10)
where Kr is a scalar value. For example, Kr can be set as −1, 0, 1, or 2.
An array CAC can be used to represent scaled AC coefficients for transform coefficient array CL as follows:
CAC=[0,CAC1,CAC2,CAC3]=Kc*[0,CL1,CL2,CL3]; (11)
where Kc is a scalar value. For example, Kc can be set as −1, 0, 1, or 2. In some implementations, Kr and Kc can be determined as the values that minimize prediction errors.
As shown in
Cp=[DC,Kr*CT1,Kr*CT2,Kr*CT3;Kc*CL1,0,0,0,0;Kc*CL2,0,0,0;Kc*CL3,0,0,0]. (12)
DC in equation (12) may indicate a DC coefficient of transform-domain prediction matrix Cp, such as Cp(0,0), which can be predicted using a combination of adjacent transform-domain predictors, such as TLC, CT0, and CL0. In one implementation, DC can be predicted by the following equation:
DC=CT0+CL0. (13)
In other implementations, a weighted combination of the transform-domain predictors can be used for generating the DC value.
In some implementations, transform-domain intra prediction modes can correspond to spatial-domain intra prediction modes. Spatial-domain intra prediction modes can include, for example, DC prediction mode, horizontal prediction mode, vertical prediction mode, and TrueMotion prediction mode. In one implementation of DC prediction mode, a single value using the average of the pixels in a row above a current block and a column to the left of the current block can be used to predict the current block. In one implementation of horizontal prediction, each column of a current block can be filled with a copy of a column to the left of the current block. In one implementation of vertical prediction, each row of a current block can be filled with a copy of a row above the current block. In one implementation of TrueMotion prediction, in addition to the row above the current block and the column to the left of the current block, the pixel P above and to the left of the block may be used. Horizontal differences between pixels in the row above the current block (starting from P) can be propagated using the pixels from the column to the left of the current block to start each row. Other spatial-domain intra prediction modes can include, for example, a diagonal-down-left prediction mode, a diagonal-down-right prediction mode, a vertical-right prediction mode, a horizontal-down prediction mode, a vertical-left prediction mode, or a horizontal-up prediction mode.
When implementing transform-domain intra prediction, a transform-domain intra prediction mode can be implemented to correspond to a spatial-domain intra prediction mode, such as horizontal prediction mode, vertical prediction mode, TrueMotion prediction mode, or any other spatial-domain intra prediction mode.
For example, if Kr and Kc are both set as zero, as shown in (12), transform-domain prediction block Cp may provide an equivalent prediction result of a DC prediction mode.
In another example, the transform-domain prediction block Cp may be equivalent to a prediction block generated using the vertical prediction mode, and may be expressed in the following equations:
Kr=2;
Kc=0;
DC=2*CT0;
Cp=2*[CT0,CT1,CT2,CT3;0,0,0,0;0,0,0,0;0,0,0,0]. (14)
In another example, the transform-domain prediction block Cp may be equivalent to a prediction block generated using the horizontal prediction mode, and may be expressed in the following equations:
Kr=0;
Kc=2;
DC=2*CL0;
Cp=2*[CL0,0,0,0;CL1,0,0,0;CL2,0,0,0;CL3,0,0,0]. (15)
In another example, the transform-domain prediction block Cp may be equivalent to a prediction block generated using the TrueMotion prediction mode, and may be expressed in the following equations:
Kr=2;
Kc=2;
DC=(CT0+CL0−2*TLC);
Cp=2*[(CT0+CL0−2*TLC),CT1,CT2,CT3;CL1,0,0,0;CL2,0,0,0;CL3,0,0,0]. (16)
In some implementations, a transform-domain prediction mode can be a prediction mode that is not an equivalent of any spatial-domain prediction mode. Six more examples of transform-domain prediction modes, each represented by a different transform-domain prediction block Cp, are shown below from (17) to (22):
Kr=2;
Kc=1;
DC=2*CT0;
Cp=[2*CT0,2*CT1,2*CT2,2*CT3;CL1,0,0,0;CL2,0,0,0;CL3,0,0,0]. (17)
Kr=1;
Kc=2;
DC=2*CL0;
Cp=[2*CL0,CT1,CT2,CT3;2*CL1,0,0,0;2*CL2,0,0,0;2*CL3,0,0,0]. (18)
Kr=1;
Kc=1;
DC=CT0+CL0;
Cp=[CT0+CL0,CT1,CT2,CT3;CL1,0,0,0;CL2,0,0,0;CL3,0,0,0] (19)
Kr=−1;
Kc=1;
DC=CT0+CL0;
Cp=[CT0+CL0;−CT1,−CT2,−CT3;CL1,0,0,0;CL2,0,0,0;CL3,0,0,0]. (20)
Kr=1;
Kc=−1;
DC=CT0+CL0;
Cp=[CT0+CL0,CT1,CT2,CT3;−CL1,0,0,0;−CL2,0,0,0;−CL3,0,0,0]. (21)
Kr=−1;
Kc=−1;
DC=CT0+CL0;
Cp=[CT0+CL0,−CT1,−CT2,−CT3;−CL1,0,0,0;−CL2,0,0,0;−CL3,0,0,0] (22)
Using arrays RAC and CAC in (10) and (11), the transform-domain prediction matrix Cp can also be represented as follows:
Cp=[DC,RAC1,RAC2,RAC3;CAC1,0,0,0,0;CAC2,0,0,0;CAC3,0,0,0]. (23)
In some implementations, a data item in matrix Cp, such as RAC1, RAC2, CAC1, can be adjusted by an independent scalar factor to provide better predictions under different scenarios. During implementation, the overhead incurred by using multiple transform-domain prediction modes can be balanced with quality of prediction results to achieve a desirable outcome.
Implementations of decoding the encoded video stream can include, for example, receiving encoded video stream at a step 802, identifying an encoded block from an encoded frame at a step 804, decoding a residual from the encoded block at a step 806, generating a set of transform coefficients for previously decoded pixel values at a step 808, generating, using the set of transform coefficients for the previously decoded pixel values, a set of transform coefficients for a prediction block at a step 810, determining a set of transform coefficients for a derived current block based on sum of the residual and the set of transform coefficients for the prediction block at a step 812, and inverse transforming the set of transform coefficients to generate a derived current block at a step 814.
At step 802, a computing device such as receiving station 110 may receive encoded video stream, such as compressed bitstream 320. The encoded video stream (which may be referred to herein as the encoded video data) can be received in any number of ways, such as by receiving the video data over a network, over a cable, or by reading the video data from a primary memory or other storage device, including a disk drive or removable media such as a DVD, CompactFlash (CF) card, Secure Digital (SD) card, or any other device capable of communicating a video stream.
At step 804, an encoded block may be identified from an encoded frame in the encode video stream. The encoded frame can be identified in the encoded video data. The terms “identifies”, “identify”, or “identified” as used herein include to select, construct, determine, or specify in any manner whatsoever.
The encoded block can be, for example, a block that has been encoded at encoder 300 using any of the prediction techniques described herein, such as a prediction mode in the spatial domain or the transform domain. In one example, the block can be encoded using vertical predicting mode. In another example, the block can be encoded using any of the transform-domain prediction modes shown in
At step 806, a residual can be decoded from the encoded block. The residual can be decoded using decoding stages such as entropy decoding stage 402 and dequantization stage 404 shown in
At step 808, a set of transform coefficients (i.e., transform-domain predictors) can be generated for previously decoded pixel values. The previously decoded pixel values can include data in previously decoded blocks in the frame, which can include, for example, blocks in rows above the current block and blocks to the left of the current block in the same row. In some implementations, data in the row immediately above and the column immediately left of the current block, such as DT and DL shown in
The transform-domain predictors can be generated using, for example, a one dimensional transform, which can use any transform technique such as, for example, DCT or WHT. The one dimensional transform used at step 808 can be similar to the one dimensional transform described at step 506.
At step 810, a set of transform coefficients can be generated for a prediction block using the set of transform coefficients for the previously decoded pixel values. The transform coefficients for the prediction block can be determined from the transform-domain predictors using any transform-domain prediction technique, such as the transform-domain prediction modes shown in
At step 812, a set of transform coefficients can be determined for a derived current block based on a sum of the decoded residual, such as the decoded residual from step 806, and the set of transform coefficients for the prediction block, such as the decoded residual from step 810.
At step 814, the set of transform coefficients for the derived current block may be inversely transformed. The set of transform coefficients can be generated using, for example, a two dimensional inverse transform, which can include any transform technique, such as inverse DCT or inverse WHT. The two-dimensional inverse transform can be applied in the row-column order, the column-row order, or a combination thereof.
A frame can be reconstructed from the blocks derived from the inverse transformed coefficients, or the predicted values by intra or inter prediction, or both. For example, the frame can be reconstructed from the current block derived at step 814 using decoding stages such as loop filtering stage 412 and deblocking filtering stage 414 shown in
Method of operation 800 is depicted and described as a series of steps. However, steps in accordance with this disclosure can occur in various orders or concurrently. Additionally, steps in accordance with this disclosure may occur with other steps not presented and described herein. Furthermore, not all illustrated steps may be required to implement a method in accordance with the disclosed subject matter.
The implementations of encoding and decoding described above illustrate some exemplary encoding and decoding techniques. However, “encoding” and “decoding”, as those terms are used herein, could mean compression, decompression, transformation, or any other processing or change of data.
The words “example” or “exemplary” are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example’ or “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the words “example” or “exemplary” is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form. Moreover, use of the term “an implementation” or “one implementation” or “an implementation” or “one implementation” throughout is not intended to mean the same implementation or implementation unless described as such.
A computing device implementing the techniques disclosed herein (and the algorithms, methods, instructions, etc. stored thereon and/or executed thereby) can be realized in hardware, software, or any combination thereof including, for example, IP cores, ASICS, programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors or any other suitable circuit or other information processing device, now existing or hereafter developed. In the claims, the term “processor” should be understood as encompassing any of the foregoing hardware, either singly or in combination. The terms “signal” and “data” are used interchangeably.
Further, in some implementations, for example, the techniques described herein can be implemented using a general purpose computer/processor with a computer program that, when executed, carries out any of the respective methods, algorithms and/or instructions described herein. In addition or alternatively, for example, a special purpose computer/processor can be utilized which can contain specialized hardware for carrying out any of the methods, algorithms, or instructions described herein.
In some implementations, transmitting station 102 and receiving station 110 can, for example, be implemented on computers in a screencasting system. Alternatively, transmitting station 102 can be implemented on a server and receiving station 110 or 40 can be implemented on a device separate from the server, such as a hand-held communications device (i.e. a cell phone). In this instance, transmitting station 102 can encode content using an encoder 300 into an encoded video signal and transmit the encoded video signal to the communications device. In turn, the communications device can then decode the encoded video signal using a decoder 400. Alternatively, the communications device can decode content stored locally on the communications device, i.e. content that was not transmitted by transmitting station 102. Other suitable transmitting station 102 and receiving station 110 implementation schemes are available. For example, receiving station 110 can be a generally stationary personal computer rather than a portable communications device and/or a device including an encoder 300 may also include a decoder 400.
Further, all or a portion of implementations of the present invention can take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or a semiconductor device. Other suitable mediums are also available.
The above-described embodiments, implementations and aspects have been described in order to allow easy understanding of the present invention and do not limit the present invention. On the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structure as is permitted under the law.
Other embodiments or implementations may be within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
5150209 | Baker et al. | Sep 1992 | A |
5708473 | Mead | Jan 1998 | A |
5767987 | Wolff et al. | Jun 1998 | A |
5916449 | Ellwart et al. | Jun 1999 | A |
5930387 | Chan et al. | Jul 1999 | A |
5956467 | Rabbani et al. | Sep 1999 | A |
6005625 | Yokoyama | Dec 1999 | A |
6044166 | Bassman et al. | Mar 2000 | A |
6058211 | Bormans et al. | May 2000 | A |
6208765 | Bergen | Mar 2001 | B1 |
6285804 | Crinon et al. | Sep 2001 | B1 |
6292837 | Miller et al. | Sep 2001 | B1 |
6314208 | Konstantinides et al. | Nov 2001 | B1 |
6349154 | Kleihorst | Feb 2002 | B1 |
6473460 | Topper | Oct 2002 | B1 |
6611620 | Kobayashi et al. | Aug 2003 | B1 |
6628845 | Stone et al. | Sep 2003 | B1 |
6650704 | Carlson et al. | Nov 2003 | B1 |
6654419 | Sriram et al. | Nov 2003 | B1 |
6785425 | Feder et al. | Aug 2004 | B1 |
6798901 | Acharya et al. | Sep 2004 | B1 |
6907079 | Gomila et al. | Jun 2005 | B2 |
7106910 | Acharya et al. | Sep 2006 | B2 |
7116830 | Srinivasan | Oct 2006 | B2 |
7158681 | Persiantsev | Jan 2007 | B2 |
7197070 | Zhang et al. | Mar 2007 | B1 |
7218674 | Kuo | May 2007 | B2 |
7263125 | Lainema | Aug 2007 | B2 |
7333544 | Kim et al. | Feb 2008 | B2 |
7466774 | Boyce | Dec 2008 | B2 |
7602851 | Lee et al. | Oct 2009 | B2 |
7602997 | Young | Oct 2009 | B2 |
7689051 | Mukerjee | Mar 2010 | B2 |
7924918 | Lelescu et al. | Apr 2011 | B2 |
8094722 | Wang | Jan 2012 | B2 |
8111914 | Lee et al. | Feb 2012 | B2 |
8135064 | Tasaka et al. | Mar 2012 | B2 |
8320470 | Huang et al. | Nov 2012 | B2 |
8369402 | Kobayashi et al. | Feb 2013 | B2 |
8559512 | Paz | Oct 2013 | B2 |
8885956 | Sato | Nov 2014 | B2 |
9167268 | Gu et al. | Oct 2015 | B1 |
9247251 | Bultje | Jan 2016 | B1 |
20020017565 | Ju et al. | Feb 2002 | A1 |
20020026639 | Haneda | Feb 2002 | A1 |
20020071485 | Caglar et al. | Jun 2002 | A1 |
20030202705 | Sun | Oct 2003 | A1 |
20030215018 | MacInnis et al. | Nov 2003 | A1 |
20030215135 | Caron et al. | Nov 2003 | A1 |
20030234795 | Lee | Dec 2003 | A1 |
20040001634 | Mehrotra | Jan 2004 | A1 |
20040101045 | Yu et al. | May 2004 | A1 |
20040252886 | Pan et al. | Dec 2004 | A1 |
20050068208 | Liang et al. | Mar 2005 | A1 |
20050078754 | Liang et al. | Apr 2005 | A1 |
20050123207 | Marpe et al. | Jun 2005 | A1 |
20050180500 | Chiang et al. | Aug 2005 | A1 |
20060056689 | Wittebrood et al. | Mar 2006 | A1 |
20060164543 | Richardson et al. | Jul 2006 | A1 |
20060203916 | Chandramouly et al. | Sep 2006 | A1 |
20060215751 | Reichel et al. | Sep 2006 | A1 |
20070025441 | Ugur et al. | Feb 2007 | A1 |
20070036354 | Wee et al. | Feb 2007 | A1 |
20070076964 | Song | Apr 2007 | A1 |
20070080971 | Sung | Apr 2007 | A1 |
20070121100 | Divo | May 2007 | A1 |
20070177673 | Yang | Aug 2007 | A1 |
20070216777 | Quan et al. | Sep 2007 | A1 |
20070217701 | Liu et al. | Sep 2007 | A1 |
20080069440 | Forutanpour | Mar 2008 | A1 |
20080123750 | Bronstein et al. | May 2008 | A1 |
20080170615 | Sekiguchi et al. | Jul 2008 | A1 |
20080212678 | Booth et al. | Sep 2008 | A1 |
20080239354 | Usui | Oct 2008 | A1 |
20080260042 | Shah et al. | Oct 2008 | A1 |
20080294962 | Goel | Nov 2008 | A1 |
20080310745 | Ye et al. | Dec 2008 | A1 |
20090041119 | Thoreau et al. | Feb 2009 | A1 |
20090161763 | Rossignol et al. | Jun 2009 | A1 |
20090190659 | Lee et al. | Jul 2009 | A1 |
20090232401 | Yamashita et al. | Sep 2009 | A1 |
20090257492 | Andersson et al. | Oct 2009 | A1 |
20100021009 | Yao | Jan 2010 | A1 |
20100023979 | Patel et al. | Jan 2010 | A1 |
20100034265 | Kim et al. | Feb 2010 | A1 |
20100034268 | Kusakabe et al. | Feb 2010 | A1 |
20100086028 | Tanizawa et al. | Apr 2010 | A1 |
20100104021 | Schmit | Apr 2010 | A1 |
20100111182 | Karczewicz et al. | May 2010 | A1 |
20100118943 | Shiodera et al. | May 2010 | A1 |
20100118945 | Wada et al. | May 2010 | A1 |
20100195715 | Liu et al. | Aug 2010 | A1 |
20100266008 | Reznik | Oct 2010 | A1 |
20100312811 | Reznik | Dec 2010 | A1 |
20100329341 | Kam et al. | Dec 2010 | A1 |
20110002541 | Varekamp | Jan 2011 | A1 |
20110026591 | Bauza et al. | Feb 2011 | A1 |
20110033125 | Shiraishi | Feb 2011 | A1 |
20110069890 | Besley | Mar 2011 | A1 |
20110158529 | Malik | Jun 2011 | A1 |
20110170595 | Shi et al. | Jul 2011 | A1 |
20110170596 | Shi et al. | Jul 2011 | A1 |
20110170597 | Shi et al. | Jul 2011 | A1 |
20110170608 | Shi et al. | Jul 2011 | A1 |
20110206135 | Drugeon et al. | Aug 2011 | A1 |
20110206289 | Dikbas et al. | Aug 2011 | A1 |
20110211757 | Kim et al. | Sep 2011 | A1 |
20110216834 | Zhou | Sep 2011 | A1 |
20110235706 | Demircin et al. | Sep 2011 | A1 |
20110243225 | Min et al. | Oct 2011 | A1 |
20110243229 | Kim et al. | Oct 2011 | A1 |
20110243230 | Liu | Oct 2011 | A1 |
20110249741 | Zhao et al. | Oct 2011 | A1 |
20110255592 | Sung et al. | Oct 2011 | A1 |
20110268359 | Steinberg et al. | Nov 2011 | A1 |
20110293001 | Lim et al. | Dec 2011 | A1 |
20120014439 | Segall et al. | Jan 2012 | A1 |
20120014444 | Min et al. | Jan 2012 | A1 |
20120020408 | Chen et al. | Jan 2012 | A1 |
20120039384 | Reznik | Feb 2012 | A1 |
20120039388 | Kim et al. | Feb 2012 | A1 |
20120063691 | Yu et al. | Mar 2012 | A1 |
20120082220 | Mazurenko et al. | Apr 2012 | A1 |
20120177108 | Joshi et al. | Jul 2012 | A1 |
20120278433 | Liu et al. | Nov 2012 | A1 |
20120287986 | Paniconi et al. | Nov 2012 | A1 |
20120287998 | Sato | Nov 2012 | A1 |
20120300837 | Wilkins et al. | Nov 2012 | A1 |
20120307884 | MacInnis | Dec 2012 | A1 |
20120314942 | Williams et al. | Dec 2012 | A1 |
20120320975 | Kim et al. | Dec 2012 | A1 |
20130027230 | Marpe et al. | Jan 2013 | A1 |
20130121415 | Wahadaniah et al. | May 2013 | A1 |
20140044166 | Xu et al. | Feb 2014 | A1 |
20160037174 | Gu et al. | Feb 2016 | A1 |
Number | Date | Country |
---|---|---|
1903698 | Mar 2008 | EP |
2007267414 | Oct 2007 | JP |
Entry |
---|
“Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video; Advanced video coding for generic audiovisual services”. H.264. Version 1. International Telecommunication Union. Dated May 2003. |
“Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video; Advanced video coding for generic audiovisual services”. H.264. Version 3. International Telecommunication Union. Dated Mar. 2005. |
“Overview; VP7 Data Format and Decoder”. Version 1.5. On2 Technologies, Inc. Dated Mar. 28, 2005. |
“Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video; Advanced video coding for generic audiovisual services”. H.264. Amendment 1: Support of additional colour spaces and removal of the High 4:4:4 Profile. International Telecommunication Union. Dated Jun. 2006. |
“VP6 Bitstream & Decoder Specification”. Version 1.02. On2 Technologies, Inc. Dated Aug. 17, 2006. |
“Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video”. H.264. Amendment 2: New profiles for professional applications. International Telecommunication Union. Dated Apr. 2007. |
“VP6 Bitstream & Decoder Specification”. Version 1.03. On2 Technologies, Inc. Dated Oct. 29, 2007. |
“Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video”. H.264. Advanced video coding for generic audiovisual services. Version 8. International Telecommunication Union. Dated Nov. 1, 2007. |
“Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video”. H.264. Advanced video coding for generic audiovisual services. International Telecommunication Union. Version 11. Dated Mar. 2009. |
“Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video”. H.264. Advanced video coding for generic audiovisual services. International Telecommunication Union. Version 12. Dated Mar. 2010. |
“Implementors' Guide; Series H: Audiovisual and Multimedia Systems; Coding of moving video: Implementors Guide for H.264: Advanced video coding for generic audiovisual services”. H.264. International Telecommunication Union. Version 12. Dated Jul. 30, 2010. |
“VP8 Data Format and Decoding Guide”. WebM Project. Google On2. Dated: Dec. 1, 2010. |
Bankoski et al. “VP8 Data Format and Decoding Guide; draft-bankoski-vp8-bitstream-02” Network Working Group. Dated May 18, 2011. |
Bankoski et al. “Technical Overview of VP8, An Open Source Video Codec for the Web”. Dated Jul. 11, 2011. |
Bankoski, J., Koleszar, J., Quillio, L., Salonen, J., Wilkins, P., and Y. Xu, “VP8 Data Format and Decoding Guide”, RFC 6386, Nov. 2011. |
Mozilla, “Introduction to Video Coding Part 1: Transform Coding”, Video Compression Overview, Mar. 2012, 171 pp. |
ISR and Written Opinion in related matter PCT/US2013/054370, mailed Apr. 3, 2014. |
Su M—T Sun University of Washington et al. “Encoder Optimization for H.264/AVC Fidelity Range Extensions” Jul. 12, 2005. |
Park, Jun Sung, et al., “Selective Intra Prediction Mode Decision for H.264/AVC Encoders”, World Academy of Science, Engineering and Technology 13, (2006). |
Pan et al., “Fast mode decision algorithms for inter/intra prediction in H.264 video coding.” Advances in Multimedia information Processing PCM 2007. Springer Berlin Heidelberg, 2007. pp. 158-167. |
Kim et al., “Fast H.264 intra-prediction mode selection using joint spatial and transform domain features.” Journal of Visual Communication and Image Representation 17.2, 2006, pp. 291-310. |
Number | Date | Country | |
---|---|---|---|
20140044166 A1 | Feb 2014 | US |