This disclosure relates to transforms in image and video applications and, more particularly, to discrete cosine transforms (DCTs).
Data compression is widely used in a variety of applications to reduce consumption of data storage space, transmission bandwidth, or both. Example applications of data compression include digital video coding, image coding, speech coding, and audio coding. Digital video coding, for example, is used in wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, cellular or satellite radio telephones, or the like. Digital video devices implement video compression techniques, such as MPEG-2, MPEG-4, or H.264/MPEG-4 Advanced Video Coding (AVC), to transmit and receive digital video more efficiently.
In general, video compression techniques perform predictive coding, such as intra-coding and/or inter-coding to reduce or remove redundancy inherent in video data. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames. For inter-coding, a video encoder performs motion estimation to track the movement of matching video blocks between two or more adjacent frames. Motion estimation generates motion vectors, which indicate the displacement of video blocks relative to corresponding video blocks in one or more reference frames. Motion compensation uses the motion vector to generate a prediction video block from a reference frame. After motion compensation, a residual video block is formed by subtracting the prediction video block from the original video block.
For intra-coding, a video encoder may perform spatial estimation to track continuations or replications of certain textures or patterns within the same frame. Spatial estimation generates prediction syntax, which indicates the manner in which a predictive block is generated based on, e.g., adjacent pixels within the same frame. The process of generating the predictive block during intra-coding is often called intra-prediction. After intra-prediction, a residual image is formed by subtracting the prediction block from the original block to be coded.
Following predictive coding, video encoder applies a discrete cosine transform (DCT), quantization, and entropy coding processes to further reduce the bit rate of the residual block produced by the video coding process. The DCT process is a transform process that converts a set of pixel values into transform coefficients, which represent the energy of the pixel values in the frequency domain. Quantization is then applied to the transform coefficients, and generally involves a process that limits the number of bits associated with any given transform coefficient. Entropy coding comprises one or more processes that collectively compress a sequence of quantized transform coefficients. Entropy encoding generally involves the application of arithmetic codes or variable length codes (VLCs) to further compress residual coefficients produced by the transform and quantization operations. Examples of entropy coding techniques include context-adaptive binary arithmetic coding (CABAC) and context-adaptive variable length coding (CAVLC), which may be used as alternative entropy coding modes in some encoders. A video decoder performs entropy decoding to decompress residual information for each of the blocks, and reconstructs the encoded video using the prediction syntax and the residual information.
In general, this disclosure describes the use of particular (non-dyadic) discrete cosine transform (DCT) sizes for performing a DCT in an encoder for image and video compression. This disclosure also describes the use of particular (non-dyadic) inverse discrete cosine transform (IDCT) sizes when performing an IDCT in a decoder for image and video decompression. In this disclosure, DCT refers to any variant of a DCT. For example, there include at least four variations of DCTs referred to as DCT-I, DCT-II, DCT-III, and DCT-IV transforms. Such transforms can work with distinct, non-overlapped blocks of pixels, or may be executed over partially overlapped blocks, therefore forming lapped variants of DCT. In this disclosure, DCT is defined to encompass any variation of DCT. Similarly, IDCT refers to any variant of an IDCT such as IDCT-I, IDCT-II, IDCT-III, or IDCT-IV transforms, executed over separated blocks or possibly executed with respect to overlapped blocks of data.
In one example, this disclosure describes an apparatus comprising a discrete cosine transform (DCT) unit that receives digital image data and generates DCT coefficients. The DCT unit applies a transform size of at least one of 6, 10, 11, 12, 13, 14, and 15.
In another example, this disclosure describes a method comprising receiving digital image data in a discrete cosine transform (DCT) unit and generating DCT coefficients via the DCT unit. The DCT unit applies a transform size of at least one of 6, 10, 11, 12, 13, 14, and 15.
In another example, this disclosure describes a computer readable storage medium comprising instructions that upon execution cause one or more processors to receive digital image data and generate discrete cosine transform (DCT) coefficients. The instructions cause the one or more processors to apply a transform size of at least one of 6, 10, 11, 12, 13, 14, and 15 to generate the DCT coefficients.
In another example, this disclosure describes an apparatus comprising means for receiving digital image data, and means for generating discrete cosine transform (DCT) coefficients. The means for generating DCT coefficients applies a transform size of at least one of 6, 10, 11, 12, 13, 14, and 15.
In another example, this disclosure describes a device comprising a discrete cosine transform (DCT) unit that receives digital image data and generates DCT coefficients. The DCT unit applies a transform size of at least one of 6, 10, 11, 12, 13, 14, and 15. The device further comprises a wireless transmitter that transmits an encoded bitstream that includes the DCT coefficients.
In another example, this disclosure describes a device comprising a wireless receiver that receives an encoded bitstream comprising an encoded unit of video data including a plurality of video blocks, discrete cosine transform (DCT) coefficients, and prediction syntax, an entropy decoding unit that receives the encoded bitstream from the wireless receiver and decodes the bitstream to generate the plurality of video blocks, the DCT coefficients, and the prediction syntax, and an inverse quantization unit that performs inverse quantization on the DCT coefficients. The device further comprises an inverse discrete cosine transform (IDCT) unit that performs an inverse DCT on the inverse quantized DCT coefficients by employing a transform size of at least one of 6, 10, 11, 12, 13, 14, and 15 to generate a residual block, and a prediction unit that receives the prediction syntax and generates a prediction block. The device further comprises a summer that sums the residual block and the prediction block to generate a reconstructed block, and a storage unit that stores the reconstructed block.
The details of one or more aspects of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims.
This disclosure describes the use of non-dyadic discrete cosine transform (DCT) sizes to perform a DCT. Similarly, this disclosure describes the use of non-dyadic inverse discrete cosine transform (IDCT) sizes to perform an IDCT. A DCT expresses a vector of data points in terms of a sum of weighted vectors of cosine functions oscillating at different frequencies. Similarly, a two-dimensional (2D) DCT expresses a matrix of data points in terms of a sum of weighted matrices of cosine functions oscillating at different frequencies. In image and video coding, the matrix of many data points may correspond to pixels within blocks of the video sequence. In this case, DCT generates a frequency domain representation of pixel values within a block in a video sequence. The output values following a DCT are referred to as DCT coefficients, which may be viewed as the weights of each cosine function. Inversely, an IDCT may receive DCT coefficients and generate a pixel domain representation of the block in the video sequence.
Conventional encoders are limited to dyadic sized blocks of data when performing a DCT in an encoder. To perform the DCT, a DCT unit employs a transform size that is the same as the size of the blocks of data. As used herein, when referring to a size of a transform, it should be noted that the size of the transform refers to the size of the block of data that is to be transformed. However, for ease of description, the terms “transform size,” “DCT of size,” and like will be used. For example, DCT of size 8 has served as the transform of choice in H.261, JPEG, MPEG-1, MPEG-2, H.263, and MPEG-4 (P.2). Consequently H.261, JPEG, MPEG-1, MPEG-2, H.263, and MPEG-4 (P.2) employ data block sizes of 8×8. More recent standards, such as MPEG-4 AVC/H.264, VC-1, and AVS have adopted integer approximations of DCT with transform sizes: 4, 8, and 16. Consequently MPEG-4 AVC/H.264, VC-1, and AVS employ data block sizes of 4×4, 8×8, and 16×16 (VC-1 additionally uses rectangular sizes, such as 4×8, 8×4, etc). An emerging JPEG-XR image compression algorithm uses overlapping transforms, which are also based on a DCT of size 4.
It should also be noted, that the term dyadic as used herein refers to an integer that can be factored to its lowest level employing only multiples of two. Stated another way, the logarithm with base two of a dyadic number is an integer. All other numbers are referred to as non-dyadic. Examples of dyadic numbers are 2, 4, 8, 16, 32, 64, and so on. Examples of non-dyadic numbers are 3, 5, 6, 7, 9, 10, and so on.
DCT sizes need not be limited to dyadic DCT sizes, i.e., DCT with respect to a block of data that defines a non-dyadic block size, nor do they need to be limited to even number sizes. For example, DCT sizes of 3, 5, 6, 9, and 15 may be used. Stated another way, a block of data may comprise a non-dyadic size of 3, 5, 6, 9, and 15, and may be transformed using a non-dyadic transform size. In some aspects, using a non-dyadic DCT size for a DCT may provide computational or power savings compared to using a dyadic DCT size for a DCT. In some instances, using non-dyadic DCT sizes may provide twice as much power saving compared to using dyadic DCT sizes.
System 10 may be configured to apply techniques for low complexity discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) of digital image data. In particular, the low complexity DCT techniques may be used for transforming residual block coefficients produced by a predictive video coding process on a non-dyadic block sizes. For example, system 10 may employ a DCT that uses a non-dyadic transform size.
In the example of
Source device 12 generates video for transmission to destination device 14. In some cases, however, devices 12, 14 may operate in a substantially symmetrical manner. For example, each of devices 12, 14 may include video encoding and decoding components. Hence, system 10 may support one-way or two-way video transmission between video devices 12, 14, e.g., for video streaming, video broadcasting, or video telephony. For other data compression and coding applications, devices 12, 14 could be configured to send and receive, or exchange, other types of data, such as image, speech or audio data, or combinations of two or more of video, image, speech and audio data. Accordingly, discussion of video applications is provided for purposes of illustration and should not be considered limiting of the various aspects of the disclosure as broadly described herein.
Video source 18 may include a video capture device, such as one or more video cameras, a video archive containing previously captured video, or a live video feed from a video content provider. As a further alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video and computer-generated video. In some cases, if video source 18 is a camera, source device 12 and destination device 14 may form so-called camera phones or video phones. In each case, the captured, pre-captured or computer-generated video may be encoded by video encoder 20 in techniques described in more detail below.
Video encoder 20 operates on blocks of pixels within individual video frames in order to encode the video data. The video blocks may have fixed or varying sizes, and may differ in size according to design considerations such as memory availability. Each video frame includes a series of slices. Each slice may include a series of macroblocks, which may be arranged into blocks. The blocks are blocks of pixels of the video frame. In accordance with the disclosure, the size of the blocks, i.e. blocks of pixels, may be non-dyadic, e.g. the size of the block is 15×15 or 6×6. Video encoder 20 performs a DCT of the blocks via DCT unit to generate DCT coefficients as described in more detail below with respect to
Once the video data is encoded by video encoder 20, the encoded video information may then be modulated by modem 21 according to a communication standard, e.g., such as code division multiple access (CDMA) or another communication standard or technique, and transmitted to destination device 14 via transmitter 22. Modem 21 may include various mixers, filters, amplifiers or other components designed for signal modulation. Transmitter 22 may include circuits designed for transmitting data, including amplifiers, filters, and one or more antennas.
Receiver 24 of destination device 14 receives information over channel 16, and modem 25 demodulates the information. Video decoder 26 operates on the encoded blocks to decompress the encoded video sequence. Similar to video encoder 20, video decoder 26 may employ a non-dyadic IDCT transform size. In some aspects, video decoder 26 may employ the same transform size as video encoder 20. Display device 28 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 26 may be configured to support scalable video coding for spatial, temporal and/or signal-to-noise ratio (SNR) scalability. In some aspects, video encoder 20 and video decoder 22 may be configured to support fine granularity SNR scalability (FGS) coding for SVC. Encoder 20 and decoder 26 may support various degrees of scalability by supporting encoding, transmission and decoding of a base layer and one or more scalable enhancement layers. For scalable video coding, a base layer carries video data with a minimum level of quality. One or more enhancement layers carry additional bitstream to support higher spatial, temporal and/or SNR levels.
Although not shown in
Video encoder 20 and video decoder 26 each may be implemented as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. In some aspects, system 10 may include one or more computer readable storage mediums that comprise instructions that upon execution cause the one or more processors to perform various functions described with respect to
Video encoder 20 and/or video decoder 26 of system 10 of
Although
Video source 32 may be substantially similar to video source 18 (
Filter 36 is coupled to DCT unit 34 and receives the DCT coefficients from DCT unit 34. Filter 36 may be a low pass, high pass, band pass, or any type of filter known in the art. For example, filter 36 may comprise a low pass filter with a certain cut-off frequency, such as for example 10 MHz. In this example, filter 36 may simply zero the values of the DCT coefficients that designate cosine frequencies that are greater than or equal to the specified cut-off frequency, e.g., greater than or equal to 10 MHz. Stated another way, filter 36 may replace the DCT coefficients that designate cosine frequencies that are greater than or equal to the specified cut-off frequency with zero.
In some examples, system 30 includes a quantization unit (not shown) that resides between DCT unit 34 and filter 36. The quantization unit may quantize the DCT coefficients generated by DCT unit 34. For example, if a DCT coefficient is 4.8, the quantizer module may quantize that DCT coefficient to 5. Filter 36 receives the quantized DCT coefficients in examples that include a quantizer module.
IDCT unit 38 is coupled to filter 36, and performs an inverse discrete cosine transform (IDCT) on the filtered DCT coefficients to generate a filtered video sequence. To perform the IDCT, IDCT unit 38 may employ non-dyadic transform sizes. The transform size of IDCT unit 38 may be the same size as the transform size of DCT unit 34. In some examples, system 30 includes a quantization unit (not shown) that resides between filter 36 and IDCT unit 38. The quantization unit may quantize the filtered DCT coefficients, and IDCT unit 38 performs inverse discrete cosine transforms on the quantized filtered DCT coefficients.
Display device 40 is coupled to IDCT unit 38 and receives the filtered video sequence to display the video sequence to a user. Display device 40 may include any of a variety of display devices such as a LCD, plasma display or OLED display. Generally, display unit 40 may be any display capable of displaying video.
During the encoding process, video encoder 20 receives a video block to be coded, and prediction unit 42 performs predictive coding techniques. For inter-coding, prediction unit 42 compares the video block to be encoded to various blocks in one or more video reference frames or slices in order to define a predictive block. For intra-coding, prediction unit 42 generates a predictive block based on neighboring data within the same coded unit. Prediction unit 42 outputs the prediction block and adder 54 subtracts the prediction block from the video block being coded in order to generate a residual block.
For inter-coding, prediction unit 42 may comprise motion estimation and motion compensation units that identify a motion vector that points to a prediction block and generates the prediction block based on the motion vector. Typically, motion estimation is considered the process of generating the motion vector, which estimates motion. For example, the motion vector may indicate the displacement of a predictive block within a predictive frame relative to the current block being coded within the current frame. Motion compensation is typically considered the process of fetching or generating the predictive block based on the motion vector determined by motion estimation. For intra coding, prediction unit 42 generates a predictive block based on neighboring data within the same coded unit. One or more intra-prediction modes may define how an intra prediction block can be defined. Consistent with the DCT techniques of this disclosure, the predictive coding techniques may take place with respect to non-dyadic blocks sizes of pixels such as 15×15, 10×10, 5×5, 3×3, 3×5, or 3×2 blocks of pixels. Other non-dyadic blocks sizes described herein with respect to DCT may also be used for blocks sizes of pixels used during predictive coding.
After prediction unit 42 outputs the prediction block and adder 54 subtracts the prediction block from the video block being coded in order to generate a residual block, DCT unit 46 applies a DCT to the residual block. In some aspects, the residual block may be referred to as digital image data. DCT unit 46 applies the transform to the residual block, producing a block of residual transform coefficients. The transform may convert the residual information from a pixel domain to a frequency domain. In accordance with the disclosure, DCT unit 46 applies non-dyadic transform sizes such as 15×15 or 10×10 based on the non-dyadic block size of the residual block. In some aspects, DCT unit 46 may divide the transform size and perform multiple transforms on the residual block. For example, if a residual block comprises a block size of 15×15, DCT unit 46 may subdivide the residual block into nine sub-blocks each comprising a sub-block size of 5×5. DCT unit 46 may then perform nine transforms where the DCT sub-transform size is 5×5 for each of the nine blocks. As another example, again assuming a residual block size of 15×15, DCT unit 46 may subdivide the residual block into twenty-five sub-blocks each comprising a block size of 3×3. DCT unit 46 may then perform twenty-five transforms where the DCT sub-transform size is 3×3 for each of the twenty-five blocks. As yet another example, again assuming a residual block size of 15×15, DCT unit 46 may subdivide the residual block into fifteen sub-blocks each comprising a block size of 3×5 or 5×3. DCT unit 46 may then perform fifteen transforms where the DCT sub-transform size is 3×5 or 5×3 for each of the fifteen blocks. Other permutations may also be possible. For example, assuming a residual bock size of 15×15, DCT unit 46 may subdivide the residual block into five sub-blocks of 3×15 or 15×3, three sub-blocks of 5×15 or 15×5. Furthermore, assuming a residual block size of 10×10, DCT unit 46 may subdivide the residual block into ten sub-blocks of 5×2 or 2×5. DCT unit 46 may then perform ten transforms where the DCT sub-transform size is 5×2 or 2×5 for each of the ten blocks.
DCT unit 46 may subdivide the residual block size based on the video or image quality of the residual block. For example, if the residual block includes image data corresponding to many sharp edges, DCT unit 46 may subdivide a residual block with size 15 to nine residual sub-blocks with size 5, twenty-five residual sub-blocks with size 3, fifteen residual sub-blocks with size 3×5 or 5×3, or five residual sub-blocks with size 5×15 or 15×5.
Quantization unit 48 then quantizes the residual transform coefficients to further reduce bit rate. Quantization unit 48, for example, may limit the number of bits used to code each of the coefficients. Following the quantization process, entropy encoding unit 54 encodes the quantized transform coefficients (along with any syntax elements) according to an entropy coding methodology, such as CAVLC or CABAC, to further compress the data. Syntax elements included in the entropy coded bitstream may include prediction syntax from prediction unit 42, such as motion vectors for inter-coding or prediction modes for intra-coding.
CAVLC is one type of entropy coding technique which may be applied on a vectorized basis by entropy encoding unit 54. CAVLC uses variable length coding (VLC) tables in a manner that effectively compresses serialized “runs” of transform coefficients and/or syntax elements. CABAC is another type of entropy coding technique which may be applied on a vectorized basis by entropy encoding unit 54. CABAC may involve several stages, including binarization, context model selection, and binary arithmetic coding. In this case, entropy encoding unit 54 codes transform coefficients and syntax elements according to CABAC. Many other types of entropy coding techniques also exist, and new entropy coding or new coding techniques will likely emerge in the future. This disclosure is not limited to any specific coding technique.
Following the entropy coding by entropy encoding unit 54, the encoded video may be transmitted to another device or archived for later transmission or retrieval. Again, the encoded video may comprise the entropy coded vectors and various syntax, which can be used by the decoder to properly configure the decoding process. Inverse quantization unit 50 and IDCT 52 apply inverse quantization and inverse DCT, respectively, to reconstruct the residual block in the pixel domain. Summer 56 adds the reconstructed residual block to the prediction block produced by prediction unit 42 to produce a reconstructed video block for storage in reference frame store 44.
Video decoder 26 includes an entropy decoding unit 58, which performs the reciprocal decoding function of the encoding performed by entropy encoding unit 54 of
Video decoder 26 also includes a prediction unit 60, an inverse quantization unit 62, an IDCT unit 64, a reference frame store 66, and a summer 68. In accordance with the disclosure, IDCT unit 64 performs an IDCT based on non-dyadic transform sizes such as 15×15 or 10×10. The transform size of IDCT unit 64 may be the same as DCT unit 46 (
Prediction unit 60 receives prediction syntax (such as motion vectors) from entropy decoding unit 58. Using the prediction syntax, prediction unit 60 generates the prediction blocks that were used to code video blocks. The blocks of data that prediction unit 60 operates with may comprise non-dyadic block sizes of pixels. Inverse quantization unit 62 performs inverse quantization, and IDCT unit 58 performs IDCT to change the coefficients of the residual video blocks back to the pixel domain. In some aspects, the coefficients of the residual video blocks may be referred to a digital image data. As noted above, IDCT unit 58 may employ non-dyadic transform sizes to change the DCT coefficients of the residual video blocks back to the pixel domain. Adder 68 combines each prediction block with the corresponding residual block output by IDCT unit 64 in order to reconstruct the video block. The reconstructed video blocks are accumulated in reference frame store 62 in order to reconstruct decoded frames (or other decodable units) of video information. The decoded units may be output from video decoder 60.
In accordance with the disclosure, a DCT unit such as DCT unit 34 (
The complexity of a DCT can be calculated in at least two ways. The first technique to compute the complexity of a DCT is derived by splitting multiplication steps that multiply by simple rational factors, e.g., 0.5, 1.5 or 1.25, into additions and shifts, and counting as multiplications only multiplications by irrational constants. The first technique is referred to as complexity metric I. The second technique to compute the complexity of a DCT is derived by counting all multiplications and counting all additions, but requiring no shifting operations. The second technique is referred to as complexity metric II.
A weighted complexity of a DCT or IDCT can be calculated based on the number of multiplications, additions, and shifts the DCT unit or IDCT unit needs to perform. Multiplications may take three cycles of operations, and additions and shifts may only take one cycle of operations. The weighted complexity is calculated by multiplying the number of multiplications by 3 and summing the resulting value with the number of additions and shifts.
Tables 1 and 2 are exemplary lists of the complexity of conventional DCTs of various conventional dyadic transform sizes. The column labeled “N” defines the size of the transform. The column labeled “Complexity Metric I” in table 1 defines the number of multiplications, additions, and shifts that need to be performed. The column labeled “Complexity Metric II” in table 2 defines the number of multiplications and additions that need to be performed. In the “Complexity Metric I” column and “Complexity Metric II” column, the letter “m” defines multiplication, “a” defines addition, and “s” defines shifts. For example, 1m+2a means that only one multiplication is performed and two additions are performed to perform a discrete cosine transform. The column labeled “Weighted Complexity Metric I” in table 1 defines the weighted complexity of the transform based on the Complexity Metric I calculation. The column labeled “Weighted Complexity Metric II” in table 2 defines the weighted complexity of the transform based on the Complexity Metric II calculation. In instances where the DCT requires no multiplications with simple rational numbers, and only multiplications with irrational numbers, the result for the complexity based on metric I and metric II will result in the same value. Accordingly, as can be seen in tables 1 and 2, table 2 only includes two transform sizes (3 and 5) because the result based on Complexity Metric I and Complexity Metric II for the other transform sizes is the same.
In accordance with the disclosure, Tables 3 and 4 are an exemplary list of the complexity of DCTs of various non-conventional non-dyadic transform sizes. Tables 3 and 4 follow the same nomenclature as Tables 1 and 2.
As can be seen from Tables 1 and 3, a discrete cosine transform with transform size 16 is over 1.5 times more computationally expensive compared to a transform size of 15. The value 1.5 is derived by dividing the weighted complexity for a transform size of 16, i.e., 174, by the weighted complexity for a transform size of 15, i.e., 115. In accordance with the disclosure, as one example, a DCT unit employing a non-dyadic transform size of 15 provides more computational efficient DCTs compared to a DCT unit employing a conventional dyadic transform size of 16.
A computer program simulation was generated to demonstrate the complexity of a DCT at various dyadic and non-dyadic transform sizes. The computer program used two standard images, each at two different resolutions as inputs for the computer program. The first image was a standard image, identified as Kodak's standard image number four. The second image was another standard image, identified as Kodak's standard image number five. The resolutions for Kodak's standard image number four and five were 384×256 and 3072×2048. The image detail for image number four is less than the image detail for image number five.
The computer program simulation (“the program”) first split the image into N×N blocks. The program next performed DCT transforms with N×N transform sizes on each N×N block to produce DCT coefficients. The program quantized the DCT coefficients, and collected statistics of the quantized coefficients. Next the program reconstructed the image via an inverse discrete cosine transform based on the DCT coefficients. The program then measured the average distortions of the reconstructed blocks by comparing the original image to the reconstructed image. Finally, the program estimated the total number of bits needed to encode the image based on the collected statistics of the coefficients.
The program output peak signal-to-noise ratio (PSNR) values as a function of bits per pixel for various DCT transform sizes. The PSNR values were calculated by comparing the reconstructed image to the original image and determining the distortion between the reconstructed image and the original image. Example PSNR values are shown in
It is important to reiterate that for standard image number four with resolutions of 384×256 and 3072×2048, the PSNR values for a transform of size 16 was substantially similar to the PSNR values for a transform of size 15. Stated another way, a DCT unit employing a transform size of 15 performs substantially similar to a DCT unit employing a transform size of 16. However, as seen in Tables 1 and 3, the complexity of a transform of size 16 is one and half times as complex as a transform of size 15. Accordingly, it may be beneficial for a DCT unit to employ a transform size of 15 instead of a transform size of 16 because a transform size of 15 is one and half times less computationally expensive and provides substantially similar PSNR values as a transform size of 16.
As noted above, Kodak's standard image number five contains much more detail than Kodak's standard image number four. Due to the high-detailed nature of image number five, saturation of DCTs occurred when N equals 3 for a resolution of 384×256. For a resolution of 3072×2048, saturation of DCTs occurred when N equals 7. Since the PSNR values saturated for N equals 5 and higher for a resolution of 384×256, instead of performing one DCT with transform size of 15×15 on blocks of size 15×15, it may be beneficial to divide the 15×15 block sizes into twenty-five sub-blocks of size 3×3 and perform twenty-five DCTs with sub-transform size of 3×3.
As was reiterated for standard image number four, it is important to reiterate that for standard image number five with resolutions of 384×256 and 3072×2048, the PSNR values for a transform of size 16 was substantially similar to the PSNR values for a transform of size 15. Accordingly, it may be beneficial for a DCT unit to employ a transform size of 15 instead of a transform size of 16 because a transform size of 15 is one and half times less computationally expensive and provides substantially similar PSNR values as a transform size of 16.
where
Notably, factors c1, c6, and c7 are rational numbers with values −5/4, −3/2, and 15/8, respectively. As described above, in some examples, multiplication by a rational number may be easily computed by a series of addition and shift operations. The 67 additions are denoted at nodes a1 through a67. Nodes connecting junctions of pairs of lines denote an addition. For example, node a2 indicates the summation of x(12) and x(2), and node a1 indicates the summation of x(7), x(12), and x(2). The additions can be counted by following the flow of computation (either from left to right for a DCT, or from right to left for an IDCT). The constant factors, e.g., c1 through c17, are multiplied with intermediate values at corresponding lines in the flow diagram shown in
DCT unit 46 receives a residual block where the residual block is the prediction block subtracted from the video block that is to be encoded. The residual block may be referred to as digital image data, and may also comprise a non-dyadic block size, as discussed herein. DCT unit 46 performs DCT on the residual image frame based on a transform size of at least one of 6, 10, 11, 12, 13, 14, and 15 to generate DCT coefficients (94). Quantization unit 48 quantizes the DCT coefficients (96). Entropy coding unit 54 entropy codes the quantized DCT coefficients to generate a bitstream that may be subsequently decoded or stored (98).
The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof. Any features described as modules or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a computer-readable medium comprising instructions that, when executed, performs one or more of the methods described above. The computer-readable medium may form part of a computer program product, which may include packaging materials. The computer-readable medium may comprise random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media, and the like. The techniques additionally, or alternatively, may be realized at least in part by a computer-readable communication medium that carries or communicates code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computer.
The code may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, field programmable logic arrays FPGAs, or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated in a combined video encoder-decoder (CODEC). Hence, the disclosure also contemplates any of a variety of integrated circuit devices that include circuitry to implement one or more of the techniques described in this disclosure. Such circuitry may be provided in a single integrated circuit chip or in multiple, interoperable integrated circuit chips.
Various examples have been described. These and other examples are within the scope of the following claims.