The disclosure relates generally to video and image compression.
Digital data, such as digital images, moving picture images or videos, contain a lot of information (represented by pixels) that may need to be transformed. Transforming this information reduces the size of the digital data, such as images and videos, so that the digital data may be compressed. Thus, it is desirable to have a way in which that compressed digital data may be transformed.
The disclosure is particularly applicable to a transform system and method for compressed digital data, such as image data or video data, and it is in this context that the disclosure will be described. It will be appreciated, however, that the transform has greater utility since it may be used for other types of data that have the need to be transformed to more efficiently compress the digital data.
The transform system 104 may be implemented in hardware or software or a combination of hardware and software. For example, the transform system 104 may be implemented in a processor that executes a plurality of lines of instructions or microcode to implement the processes described below in
The transform information model may also be a matrix of values (as described below) that may be used to perform a mathematical function, such as dividing or multiplying, the coefficients in each block of the incoming data. For example, a transform information model for re-quantization or re-quantization may be an example of a transform information model that may be used to perform a mathematical function. For example, the transform information model for re-quantization may be used to divide the DCT and may further select a new quantization and scaling matrix for quantization and scaling which are both part of the information necessary in the H.264 stream to reconstruct a video when it is decoded.
In the transform method implementation, the values in the transform information model may be used to modify the values in the incoming compressed data stream. In some embodiments, the incoming compressed data stream may be partially decoded into a plurality of blocks of data values and the values in the transform information model may be used to modify the values in each block of data values. In one implementation, the numerical values stored in the transform information model may be subtracted from a given block in the compressed incoming data stream, such as an 8×8 DCT block in a DCT compressed incoming data stream, and the resulting image is visually indistinguishable from the image or video in the incoming data stream. The transform information model may have many different sets of values, each applicable under different circumstances as described in detail below.
Once the incoming data stream is partially decoded, the method selects a most applicable transform information model for the each block in the partially decoded data stream (304) as described below in more detail. For example, in a transform in which the frequencies in a DCT block are being modified, if the incoming DCT block has a relatively flat visual characteristic (so called “low gradience”), a specific transform information model may be selected for it such as the example of the transform information model in
The system may have a number of transform information models stored and available or generated and available. The transform information models may include transforms that subtract a certain class of transform information models from the DCT coefficients in each block and transforms that apply a range of re-quantizations to the DCT coefficients in each block. Other possible transforms may be non-linear transformations that may test the values for certain thresholds (of the DCT coefficients in the block) and modify the values by adding or subtracting a constant from the value, Furthermore, combination of the above transforms may be used to output a modified set of values in the block. In an example of the operation, a block from the incoming data may have a slowly changing gradient in the background along with other features. In this example of the operation, the system decides, based on statistical properties of the block that no requantization is required, a particular transform information model needs to be subtracted from the values in the block and some of the values in the block need to be modified non-linearly by adding or subtracting a constant. The choice of transforms is typically made to minimize the size of the block while minimizing the impact on artifacts visible to the human eye.
Once the transform information model for each block is selected, the transform information model data is used to modify the block in the incoming partially decoded data stream (306.) For example, in an implementation in which DCT coefficient values are being modified, the transform information model data may be removed from the block. In one embodiment when the DCT block values are modified, if the absolute value in each DCT block in the incoming data stream is smaller than the absolute value in the transform information model, the DCT coefficient in the output data stream (that corresponds to the particular DCT block) is set to 0. In the one embodiment when the DCT block values are modified, if the absolute value in each DCT block in the incoming data stream is not smaller than the absolute value in the transform information model, the absolute value of the DCT coefficient in the incoming data stream is reduced to corresponding value in the transform information model, while maintaining the sign of the DCT coefficient in the original data stream.
Once the variable values have been modified, such as by the redundancy being removed in the DCT coefficients, the method restores the frequency (308.) Specifically, for coefficients whose redundancy removal above has introduced unacceptable visual difference in the block (using, for example, the method described below), the method restores those coefficients in the DCT block back to their original values from the DCT. With other transform information models, the restoration of the values may occur. For example, if the quantization parameters of the values in the block has changed, then the restoration is performed to account for this change.
Once the restoring of the frequency has been completed, the data stream may be re-encoded (310) using the modified data values based on the transform information model. In the implementation in which the DCT coefficients are modified, the data stream may be re-encoded with the DCT coefficients modified from the method (including possibly some original coefficients as well as some coefficients from the transform information model.)
Selecting the Most Applicable Transform Information Model
The appropriate transform information model, such as the RIM in the case of frequency reduction, for each block may be picked using one or more of the following criteria that may be assessed, for example, by the data stream processor 202 shown in
1. Size of the DCT block.
2. Statistical properties of the DCT block. (AC Energy, variance, etc.)
3. Statistical properties of the fully decoded pixels represented by the DCT block. (average value of pixels, variance of the pixels).
4. Other properties of the data stream (such as whether it is a Standard Definition (SD) or a High Definition (HD) image).
For example, if the incoming DCT block indicates a very high AC energy but is low on luminance, a collection of transforms (collectively called a transform model, which will include RIM subtraction, re-quantization as well as non-linear modification) will be selected that is more aggressive in what is subtracted from the block. In other cases, a transform model that makes minimal changes might be selected. Moreover, for an HD stream, a typical transform model will have different values for the higher frequency components because of different expected behavior in the human eye than a SD stream. When it is known that a SD stream is going to be displayed on a High Definition device, the transform model will anticipate the scaling-up of the image and be less aggressive on the mid-range and high frequencies.
In the system, one or more different transform information models may be constructed for various combinations of these criteria. For each different type of transformation (frequency reduction, noise reduction, re-quantization values, etc.), the criteria above may be computed when a block of the partially decoded incoming data stream is processed and the one or more criteria for the particular transformation use case for the particular block may be used to select the transform information model for the particular block. For example, for the frequency reduction using the DCT coefficients, the appropriate criteria are computed when a DCT block is processed (after partial decoding) and the combination of the criteria above in the particular block points to the RIM that needs to be selected for the particular block. One example of this process is, for example, when a DCT block with low variance and low gradience (see below, where ‘low’ means less than certain pre-determined thresholds) is processed, the RIM that was constructed for this type of an instance is used. The transform information models can vary depending upon whether the streams are Standard Definition or High Definition.
Frequency Restoration Criteria for Frequency Reduction Transformation
The human eye is very sensitive to certain types of compression artifacts (such as blockiness and contouring). For certain types of DCT blocks, the selected RIM may cause a worsening of those compression artifacts. For example, image blocks that have slowly changing gradients may be particularly susceptible to an immediate worsening of compression artifacts.
The identification of these types of DCT blocks may be accomplished by computing a variance of the first derivative of the pixels represented by the DCT block and may be known as ‘Gradience.’ The gradience may be calculated, for example, by computing the variance of the values derived by subtracting the value of pixel from its left neighbor and top neighbors respectively.
When the gradience is less than a certain threshold (which is treated as a parameter to the system and method), the DCT block is selected for Frequency Restoration. The gradience of the block may range from 0 to 255 and may be clipped to 255 if the gradience value is greater than 255. The gradience thresholds may also be determined empirically and may be application dependent and have a range of between 0 and 12. Depending upon the specific value of the Gradience, and the size of the DCT block, one or more frequencies in the DCT block are restored to their original incoming values. For example, applications requiring a high amount of visual fidelity will use very low values of gradience, and applications requiring lower visual fidelity will allow for higher values of gradience. In low gradience DCT blocks, for example, the frequencies in the lower range are restored if the visual impact of the RIM is unacceptable. Visual impact is assessed usually be determining if the frequency in that range has been set to 0, but any other reasonable metric can also be used.
Noise Reduction Transforms and Re-Quantization Transforms
Similar to the frequency reduction embodiment described above, the transform system may also be used to modify the values of a block of the partially decoded incoming data stream for noise reduction and re-quantizations values. The other transforms for noise reduction or re-quantization may similarly be a block of values which is a square matrix in which each side is an interger power of 2 in which the block of values are used to modify the values of each block in the incoming partially decoded incoming data stream. As described above for the frequency reduction, the transform information models for noise reduction or Re-quantization values may be selected based on some or all of the same criteria as the frequency reduction described above and then used to modify the values in the blocks of the data stream and the data stream may then be re-compressed.
While the foregoing has been with reference to a particular embodiment of the invention, it will be appreciated by those skilled in the art that changes in this embodiment may be made without departing from the principles and spirit of the disclosure, the scope of which is defined by the appended claims.
This application claims the benefit under 35 USC 119(e) and priority under 35 USC 120 to U.S. Provisional Patent Application No. 61/825,487 filed on May 20, 2013 and titled “Frequency Reduction and Restoration System and Method in Video and Image Compression”, the entirety of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5596659 | Normile et al. | Jan 1997 | A |
5703799 | Ohta | Dec 1997 | A |
6016360 | Nguyen et al. | Jan 2000 | A |
6052205 | Matsuura | Apr 2000 | A |
6128407 | Inoue et al. | Oct 2000 | A |
6393156 | Nguyen et al. | May 2002 | B1 |
6463178 | Kondo et al. | Oct 2002 | B1 |
6625221 | Knee et al. | Sep 2003 | B2 |
6690731 | Gough et al. | Feb 2004 | B1 |
6697521 | Islam et al. | Feb 2004 | B2 |
6792153 | Tsujii | Sep 2004 | B1 |
7003167 | Mukherjee | Feb 2006 | B2 |
7149811 | Wise et al. | Dec 2006 | B2 |
7791508 | Wegener | Sep 2010 | B2 |
7916960 | Mizuno | Mar 2011 | B2 |
7965900 | Maurer et al. | Jun 2011 | B2 |
8077990 | Islam | Dec 2011 | B2 |
8130828 | Hsu et al. | Mar 2012 | B2 |
8265144 | Christoffersen et al. | Sep 2012 | B2 |
8422804 | Islam | Apr 2013 | B2 |
20010031009 | Knee et al. | Oct 2001 | A1 |
20010041011 | Passagio et al. | Nov 2001 | A1 |
20010048770 | Maeda | Dec 2001 | A1 |
20030002734 | Islam et al. | Jan 2003 | A1 |
20030202581 | Kodama | Oct 2003 | A1 |
20030206590 | Krishnamachari | Nov 2003 | A1 |
20040264793 | Okubo | Dec 2004 | A1 |
20050063599 | Sato | Mar 2005 | A1 |
20060039473 | Filippini et al. | Feb 2006 | A1 |
20060115166 | Sung et al. | Jun 2006 | A1 |
20060285587 | Luo et al. | Dec 2006 | A1 |
20070019875 | Sung | Jan 2007 | A1 |
20070237237 | Chang et al. | Oct 2007 | A1 |
20070248163 | Zuo et al. | Oct 2007 | A1 |
20080247658 | Lee | Oct 2008 | A1 |
20100066912 | Kumwilaisak et al. | Mar 2010 | A1 |
20100266008 | Reznik | Oct 2010 | A1 |
20110103445 | Jax et al. | May 2011 | A1 |
20110206287 | Islam | Aug 2011 | A1 |
20140188451 | Asahara et al. | Jul 2014 | A1 |
Number | Date | Country |
---|---|---|
2003-018412 | Jan 2003 | JP |
2004-173205 | Jun 2004 | JP |
2007-104645 | Apr 2007 | JP |
2007-318711 | Dec 2007 | JP |
2003-0007080 | Jan 2003 | KR |
10-2004-0106480 | Dec 2004 | KR |
10-2007-0090185 | Sep 2007 | KR |
10-2008-0090936 | Oct 2008 | KR |
2010-0918377 | Sep 2009 | KR |
WO 2006061734 | Jun 2006 | WO |
Number | Date | Country | |
---|---|---|---|
20140341304 A1 | Nov 2014 | US |
Number | Date | Country | |
---|---|---|---|
61825487 | May 2013 | US |