Embodiments of the present invention generally relate to modeling of transform coefficients such as DCT coefficients, and in particular to methods and systems having transparent composite model for transform coefficients.
From its earlier adoption in JPEG to its recent application in HEVC (High Efficiency Video Coding), the newest video coding standard [3], the discrete cosine transform (DCT) has been widely applied in digital signal processing, particularly in lossy image and video coding. It has thus attracted, during the past few decades, a lot of interest in understanding the statistical distribution of DCT coefficients (see, for example, [1], [4], [7], [9], and references therein). Deep and accurate understanding of the distribution of DCT coefficients would be useful to quantization design [12], entropy coding, rate control [7], image understanding and enhancement [1], and image and video analytics [13] in general.
In the literature, Laplacian distributions, Cauchy distributions, Gaussian distributions, mixtures thereof, and generalized Gaussian (GG) distributions have all been suggested to model the distribution of DCT coefficients (see, for example, [2], [4], [9], and references therein). Depending on the actual image data sources used and the need to balance modeling accuracy and model's simplicity/practicality, each of these models may be justified to some degree for some specific application. In general, it is believed that in terms of modeling accuracy, GG distributions with a shape parameter and a scale parameter achieve the best performance [2][9]. However, parameter estimation for GG distributions is difficult and hence the applicability of the GG model to applications, particularly online applications, may be limited. On the other hand, the Laplacian model has been found to balance well between complexity and modeling accuracy; it has been widely adopted in image and video coding [12], although its modeling accuracy is significantly inferior to that of the GG model [2].
To better handle the flat tail phenomenon commonly seen in DCT coefficients, a system and method is provided including a model dubbed a transparent composite model (TCM). Given a sequence of DCT coefficients, a TCM first separates the tail of the sequence from the main body of the sequence. Then, a uniform distribution is used to model DCT coefficients in the flat tail while a different parametric distribution (such as truncated Laplacian, generalized Gaussian (GG), and geometric distributions) is used to model data in the main body. The TCM is continuous if each DCT coefficient is regarded continuous (i.e., analog), and discrete if each DCT coefficient is discrete. The separate boundary and other parameters of the TCM can be estimated via maximum likelihood (ML) estimation. Efficient online algorithms with global convergence are developed to compute the ML estimates of these parameters. Analysis and experimental results show that for real-valued continuous AC coefficients, (1) the TCM with truncated GG distribution as its parametric distribution (GGTCM) offers the best modeling accuracy among pure Laplacian models, pure GG models, and the TCM with truncated Laplacian distribution as its parametric distribution (LPTCM), at the cost of extra complexity; and (2) LPTCM offers a modeling accuracy comparable to pure GG models, but with a lower complexity. On the other hand, for discrete/integer DCT coefficients, which are mostly seen in real-world applications of DCT, extensive experiments show via both the divergence test and Chi-square test that the discrete TCM with truncated geometric distribution as its parametric distribution (GMTCM) models AC coefficients more accurately than pure Laplacian models and GG models in majority cases while having simplicity and practicality similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a good capability of feature extraction—DCT coefficients in the flat tail identified by the GMTCM are truly outliers, and these outliers across all AC frequencies of an image represent an outlier image revealing some unique global features of the image. This, together with the low complexity of GMTCM, makes the GMTCM a desirable choice for modeling discrete/integer DCT coefficients in real-world applications, such as image and video coding, image understanding, image enhancement, etc.
To further improve modeling accuracy, the concept of TCM can be extended by further separating the main portion into multiple sub-portions and modeling each sub-portion by a different parametric distribution (such as truncated Laplacian, generalized Gaussian (GG), and geometric distributions). The resulting model is dubbed a multiple segment TCM (MTCM). In the case of general MTCMs based on truncated Laplacian and geometric distributions (referred to as MLTCM and MGTCM, respectively), a greedy algorithm is developed for determining a desired number of segments and for estimating the corresponding separation boundaries and other MTCM parameters. For bi-segment TCMs, an efficient online algorithm is further presented for computing the maximum likelihood (ML) estimates of the separation boundary and other parameters. Experiments based on Kullback-Leibler (KL) divergence and χ̂2 test show that (1) for real-valued continuous AC coefficients, the bi-segment TCM based on truncated Laplacian (BLTCM) models AC coefficients more accurately than the LPTCM and GG model while having simplicity and practicality similar to those of LPTCM and pure Laplacian; and (2) for discrete (integer or quantized) DCT coefficients, the bi-segment TCM based on truncated geometric distribution (BGTCM) significantly outperforms the GMTCM and GG model in terms of modeling accuracy, while having simplicity and practicality similar to those of GMTCM. Also shown is that the MGTCM derived by the greedy algorithm further improves the modeling accuracy over BGTCM at the cost of more parameters and slight increase in complexity.
In accordance with an example embodiment, there is provided a method for modelling a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value; determining one or more parameters of a first distribution model for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values; determining parameters of at least one further distribution model for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values; and performing a device operation on at least part of a composite distribution model which is a composite of the first distribution model and the at least one further distribution model having the respective determined parameters.
In accordance with an example embodiment, there is provided a method for a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value which satisfies a maximum likelihood estimation between the set of transform coefficients and a composite distribution model which is a composite of a plurality of distribution models each for a subset of transform coefficients of the set bounded by each of the at least one boundary coefficient values; and performing a device operation on at least one of the subsets of transform coefficients.
In accordance with an example embodiment, there is provided a method for modelling a set of transform coefficients, the method being performed by a device and including: determining a boundary coefficient value; determining one or more parameters of a uniform distribution model for transform coefficients of the set the magnitudes of which are greater than the boundary coefficient value; determining parameters of a parametric distribution model for transform coefficients of the set the magnitudes of which are less than the boundary coefficient value; and performing a device operation on at least part of a composite distribution model which is a composite of the uniform distribution model and the parametric distribution model having the respective determined parameters.
In accordance with an example embodiment, there is provided a device, including memory, a component configured to access a set of transform coefficients, and a processor configured to execute instructions stored in the memory in order to perform any or all of the described methods.
In accordance with an example embodiment, there is provided a non-transitory computer-readable medium containing instructions executable by a processor for performing any or all of the described methods.
Reference will now be made, by way of example, to the accompanying drawings which show example embodiments, in which:
Similar reference numerals may be used in different figures to denote similar components.
In accordance with an example embodiment, there is provided a method for modelling a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value; determining one or more parameters of a first distribution model for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values; determining parameters of at least one further distribution model for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values; and performing a device operation on at least part of a composite distribution model which is a composite of the first distribution model and the at least one further distribution model having the respective determined parameters.
In accordance with an example embodiment, there is provided a method for a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value which satisfies a maximum likelihood estimation between the set of transform coefficients and a composite distribution model which is a composite of a plurality of distribution models each for a subset of transform coefficients of the set bounded by each of the at least one boundary coefficient values; and performing a device operation on at least one of the subsets of transform coefficients.
1 Introduction to TCM
Both Laplacian and GG distributions decay exponentially fast. However, in many cases it is observed herein that DCT coefficients have a relatively flat tail, which can not be effectively modeled by an exponentially decaying function (see
To better handle the flat tail phenomenon in DCT coefficients, in this disclosure, we develop a model dubbed transparent composite model (TCM), in which the tail portion of DCT coefficients is modeled separately from the main portion of DCT coefficients by a first distribution, and the main portion is modeled instead by a different parametric distribution such as truncated Laplacian, GG, and geometric distributions. This composite model introduces a boundary parameter to control which model to use for any given DCT coefficient; it is marked as transparent because there is no ambiguity regarding which model (the first distribution model or at least one further distribution model) a given DCT coefficient will fall into once the TCM is determined. The TCM is continuous if each DCT coefficient is regarded continuous (i.e., analog), and discrete if each DCT coefficient is discrete.
The separate boundary and other parameters of the TCM can be estimated via maximum likelihood (ML) estimation. We further propose efficient online algorithms with global convergence to compute the ML estimates of these parameters. Analysis and experimental results show that for real-valued continuous AC coefficients, (1) the TCM with truncated GG distribution as its parametric distribution (GGTCM) offers the best modeling accuracy among pure Laplacian models, pure GG models, and the TCM with truncated Laplacian distribution as its parametric distribution (LPTCM), at the cost of extra complexity; and (2) LPTCM matches up to pure GG models in term of modeling accuracy, but with simplicity and practicality similar to those of pure Laplacian models, hence having the best of both pure GG and Laplacian models. On the other hand, for discrete/integer DCT coefficients, which are mostly seen in real-world applications of DCT, extensive experiments show via both the divergence test and Chi-square test that the discrete TCM with truncated geometric distribution as its parametric distribution (GMTCM) models AC coefficients more accurately than pure Laplacian models and GG models in majority cases while having simplicity and practicality similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a good capability of feature extraction. I.e., DCT coefficients in the flat tail identified by the GMTCM are truly outliers, and these outliers across all AC frequencies of an image represent an outlier image revealing some unique global features of the image. This, together with the simplicity of modeling and the low complexity of computing online the ML estimates of the parameters of the GMTCM, makes the GMTCM a desirable choice for modeling discrete/integer DCT coefficients in real-world applications, such as image and video coding, image understanding, image enhancement, etc.
2 DCT Models and the Flat Tail Phenomenon
This section first reviews briefly some relevant studies in the literature for modeling DCT coefficients. We then discuss the flat tail phenomenon in DCT coefficients.
2.1 Models in the Literature for DCT Coefficients
2.1.1 Gaussian Distributions
As Gaussian distributions are widely used in natural and social sciences for real-valued random variables, they have been naturally applied to model DCT coefficients [1]. The justification for the Gaussian model may come from the central limit theorem (CLT) [11], which states that the mean of a sufficiently large number of independent random variables will be approximately normally distributed. Consider the linear weighted summation nature of DCT. The CLT provides a meaningful guidance for modeling DCT coefficients with Gaussian distributions. A comprehensive collection of distributions based on Gaussian probability density function were studied in [8].
Although the Gaussian model is backed up by the CLT, it was observed that DCT coefficients for natural images/video usually possess a tail heavier than Gaussian distributions [2]. Consequently, generalized Gaussian distributions have been suggested for modeling DCT coefficients.
2.1.2 Generalized Gaussian Distributions
The DCT coefficients may be modeled with a generalized Gaussian distribution with zero mean, as follows
where α is a positive scale parameter, β defines a positive shape parameter, and Γ(•) denotes the gamma function.
It is easy to see that when β=1, the above GG distribution is de-generalized to a Laplacian distribution. When β=2, it becomes the Gaussian distribution with variance α2/2. With the free choice of the scale parameter α and the shape parameter β, the GG distribution has shown an effective way to parameterize a family of symmetric distributions spanning from Gaussian to uniform densities, and a family of symmetric distributions spanning from Laplacian to Gaussian distributions. As mentioned above, DCT coefficient distributions are observed to posses flat tails. In this regard, the GG distribution allows for either heavier-than-Gaussian tails with β<2, heavier-than-Laplacian tails with β<1, or lighter-than-Gaussian tails with β>2. As such, with this flexibility, the GG model outperforms in general both the Gaussian and Laplacian models in terms of modeling accuracy for modeling DCT coefficients.
Nevertheless, the benefit of accurate modeling by the GG model comes with some inevitable drawbacks. For example, the lack of closed-form cumulative distribution function (cdf) makes it difficult to apply the GG model in practice. Another main drawback is the high complexity for its parameter estimation. For example, given a sequence of samples Yi,i=1, . . . , n, the ML estimate of the shape parameter β is the root of the following equation [2],
and γ=0.577 . . . denotes the Euler constant. Clearly, the terms Σi=1n |Yi|β log |Yi| and βΣi=1n |Yi|β yield a significant amount of computation when a numerical iterative solution of β is used.
2.1.3 Laplacian Distributions
Due to its ability to balance modeling accuracy and model's simplicity/practicality, the Laplacian model for DCT coefficients is the most popular one in use [10], [9]. A Laplacian density function with zero mean is given as follows,
where λ denotes a positive scale parameter. Given a sequence of samples Yi=1, . . . , n, the ML estimate of λ can be easily computed as
In addition, under the Laplacian distribution, the probability for an interval [L, H] with H>L≧0 can also be computed easily as
2.1.4 Other Distributions
There are other distributions investigated in the literature for modeling DCT, [5], [7], [6], [8]. In [5], alpha-stable distributions were used to model DCT coefficients for watermark detection. As a special case of alpha-stable distributions, Cauchy distribution was used in [7] for modeling DCT coefficient in video coding. The alpha-stable distributions were reported to provide a satisfactory modeling accuracy for the corresponding image processing goals as in [5] and [7]. Yet, the lack of closed-form for the alpha-stable family distributions usually leads to difficulties for parameter estimation and a limited application for modeling DCT coefficients. In [6], a symmetric normal inverse Gaussian distribution was studied for modeling DCT coefficients, as follows:
This model
was tested using the Kolmogorov-Simrnov test and reported with improved modeling accuracy over General Gaussian and Laplacian distributions using the Kolmogorov-Simrnov test. Yet, its complexity is still significantly more than that of a Laplacian model. Moreover, the Kolmogorov-Simrnov test is generally regarded as less preferable for measuring the modeling accuracy than the χ2 test [2], and by χ2 test, the best modeling accuracy is achieved by the GG distributions. The test statistics χ2 is defined as
where I is the number of intervals into which the sample space is partitioned, n is the total number of samples, ni denotes the number of samples in the ith interval, and pi is the probability under the underlying theoretical model that a sample falls into the interval i.
Similar as in [2], this disclosure prefers the χ2 test over the Kolmogorov-Simrnov for measuring the modeling accuracy. Besides the justification provided in [2] for using the χ2 test, our preference also roots in the flat-tail phenomena of DCT coefficients. Specifically, χ2 test better characterized a statistically insignificant tail portion in a distribution while the Kolmogorov-Simrnov test, which depends on a sample distribution function, tends to overlook the tail part. Nevertheless, the flat-tail phenomena has been widely observed for DCT coefficients, as in [5], [7]. In the following, more detailed discussions are present for the flat tail phenomena.
2.2 Flat Tails
Laplacian, Gaussian, and GG distributions all decay exponentially fast. As illustrated in
The flat tail phenomenon in the Lenna image is widely observed in other images as well. As shown in [2], the estimated shape parameter β for the GG distribution for various images is less than 1 in most cases, indicating that the data distribution possesses a tail heavier than that of the Laplacian distribution. In [7], it was also observed that the tail of DCT coefficients in video coding is much heavier than that of the Laplacian distribution, and a Cauchy distribution was used instead for deriving rate and distortion models for DCT coefficients. However, as mentioned before, the Cauchy model may not model the main portion of DCT coefficients effectively, and is in general inferior to the GG model in term of the overall modeling accuracy [4]. Therefore, it is advantageous to have a model which can balance well the main portion and tail portion of DCT coefficients while having both simplicity and superior modeling accuracy.
3 Continuous Transparent Composite Model
To better handle the flat tail phenomenon in DCT coefficients, we now separate the tail portion of DCT coefficients from the main portion of DCT coefficients and use a different model to model each of them. Since DCT coefficients in the tail portion are insignificant statistically, each of them often appears once or a few times in the entire image or video frame. Hence it would make sense to model them separately by a uniform distribution while modeling the main portion by a parametric distribution such as truncated Laplacian, GG, and geometric distributions, yielding a model we call a transparent composite model. In this section, we assume that DCT coefficients are continuous (i.e. can take any real value), and consider continuous TCMs.
3.1 Description of General Continuous TCMs
Consider a probability density function (pdf) f(y|θ) with parameters θ∈Θ where θ could be a vector, and Θ is the parameter space. Let F(y|θ) be the corresponding cdf, i.e.
F(y|θ)∫−∞yf(u|θ)du.
Assume that f(y|θ) is symmetric in y with respect to the origin, and F(y|θ) is concave as a function of y in the region y≧0. It is easy to verify that Laplacian, Gaussian, and GG distributions all satisfy this assumption. The TCM based on F (y|θ) is defined as
where 0≦b≦1, 0<d≦yc<a, and a represents the largest magnitude a sample y can take. Here both a and d are assumed to be known. It is not hard to see that given (yc, b, θ), as a function of y, p(y|yc, b, θ) is indeed a pdf, and also symmetric with respect to the origin.
According to the TCM defined in (7), a sample y is generated according to the truncated distribution
with probability b, and according to the uniform distribution
(also called the outlier distribution) with probability 1-b. The composite model is transparent since given parameters (yc, b, θ), there is no ambiguity regarding which distribution a sample y≠±yc comes from. At y=±yc, p(y|yc, b, θ) can be defined arbitrarily since one can arbitrarily modify the value of a pdf over a set of zero Lebesgue measure without changing its cdf. As shown later, selecting p(y|yc, b, θ) at y=±yc to be the maximum of
will facilitate our subsequent argument for ML estimation. Hereafter, samples from the outlier distribution will be referred to as outliers.
3.2 ML Estimate of TCM Parameters
In practice, parameters yc, b, θ are often unknown and hence have to be estimated, say, through ML estimation. Let Y1n=Y1, Y2, . . . , Yn be a sequence of DCT coefficients in an image or in a large coding unit (such as a block, a slice or a frame in video coding) at a particular frequency or across frequencies of interest. Assume that Y1n behaves according to the TCM defined in (7) with Ymaxmax{|Yi|:1≦i≦n}<a and Ymax≧d. (When Ymax<d, there would be no outliers and the ML estimate of yc and b is equal to d and 1, respectively.) We next investigate how to compute the ML estimate of yc, b and θ.
Given Y1n with d≦Ymax<a, let
N
1(yc){i:|Yi|<yc}
N
2(yc){i:yc<|Yi|}
N
3(yc){i:|Yi|=yc}.
Then the log-likelihood function g(yc,b,θ|Y1n) according to (7) is equal to
where |S| denotes the cardinality of a finite set S, and the equality 1) is due to (7) and the fact that lnz is strictly increasing in the region z>0. Since F(y|θ) is nondecreasing with respect to y, it follows from (8) that for any Ymax<yc<a,
To continue, we now sort |Y1|, |Y2|, . . . , |Yn| in ascending order into W1≦W2≦ . . . ≦Wn. Note that Wn=Ymax. Let m be the smallest integer i such that Wi≧d. Define
I
m=(d,Wm)
and for any m<i≦n,
I
i=(Wi−1,Wi).
Then it is easy to see that the interval [d, Ymax] can be decomposed as
[d,Ymax]={d,Wm,Wm+1, . . . ,Wn}∪(∪i=mnIi)
which, together with (9), implies that
Note that for any nonempty Ii with i>m, Ni(yc) and N2(yc) remain the same and N3(yc) is empty for all yc∈Ii. Since by assumption F(y|θ) as a function of y is concave, it is not hard to verify that as a function of yc
−|N2(yc)| ln 2(a−yc)−|N1(yc)| ln [2F(yc|θ)−1]
is convex over yc∈Ii, and hence its value over yc∈Ii is upper bounded by the maximum of its value at yc=Wi and yc=Wi−1, i.e., the endpoints of Ii. Therefore, in view of (8), we have
When Im is nonempty, a similar argument leads to
Putting (10) to (12) together yields
Therefore, the ML estimate of yc is equal to one of d, Wm, Wm+1, . . . , Wn.
We are now led to investigating
Further define
Note that the difference between g+(yc, b, θ|Y1n) and g−(yc, b, θ|Y1n) lies in whether or not we regard yc itself as an outlier when yc is equal to some Wi. Comparing (8) with (14) and (15), we have
Then from (14) and (15), it is not hard to see that
and θ+(yc) and θ−(yc) are the ML estimate of θ for the truncated distribution
over the sample sets {Yi:i∈N1+(yc)} and {Yi:i∈N1(yc)}, respectively. In view of (17), one can then determine (b(yc), θ(yc)) by setting
Finally, the ML estimate of (yc, b, θ) can be determined as
y
c*=arg maxy
b*=b(yc*)
θ*=θ(yc). (20)
Summarizing the above derivations into Algorithm 1 (
Theorem 1: The vector (yc*, b*, θ*) computed by Algorithm 1 is indeed the ML estimate of (yc, b, θ) in the TCM specified in (7).
Remark 1: When implementing Algorithm 1 for a sequence {Yi}i=1n of DCT coefficients with a flat tail as shown in
Depending on whether or not Step 6 in Algorithm 1 can be implemented efficiently, the computation complexity of Algorithm 1 varies from one parametric family f(y|θ) to another. For some parametric family f(y|θ) such as Laplacian distributions, Step 6 can be easily solved and hence Algorithm 1 can be implemented efficiently. On the other hand, when f(y|θ) is the GG family, Step 6 is quite involved. In the next two subsections, we will examine Step 6 in two cases: (1) f(y|θ) is the Laplacian family, and the corresponding TCM is referred to as the LPTCM; and (2) f(y|θ) is the GG family, and the corresponding TCM is referred to as the GGTCM.
3.3 LPTCM
Plugging the Laplacian density function in (3) into (7), we get the LPTCM given by
With reference to Step 6 in Algorithm 1, let S be either N1+(yc) or N1(yc). Then Step 6 in Algorithm 1 is equivalent to determining the ML estimate (denoted by λy
from the sample set {Yi:i∈S}. Since |Yi|≦yc for any i∈S, the log-likelihood function of the sample set {Yi:i∈S} with respect to p(y|λ) is equal to
It is not hard to verify that L(1/t) as a function of t>0 is strictly concave. Computing the derivative of L(λ) with respect to λ and setting it to 0 yields
It can be shown (see the proof of Theorem 2 below) that
is a strictly increasing function of λ>0, and
Then it follows that (1) when C=0, λy
We are now led to solving (23) when 0<C<yc/2. To this end, we developed the iterative procedure described in Algorithm 2 (
Theorem 2 below shows that Algorithm 2 converges exponentially fast when 0<C<yc/2.
Theorem 2: Assume that 0<C<yc/2. Then λi computed in Step 9 of Algorithm 2 strictly increases and converges exponentially fast to λy
Proof Define
It is not hard to verify that the derivative of r(λ) with respect to λ is
for any λ>0. Therefore, r(λ) is strictly increasing over λ>0.
Since λ0=C>0, it follows from (25) that λ1>λ0. In general, for any i≧1, we have
which implies that λi+1−λi>0 whenever λi−λi−1>0. By mathematic induction, it then follows that λi strictly increases as i increases.
We next show that all λi, i≧1, are bounded. Indeed, it follows from (25) that
which, together with (26) and the fact that r(λy
All remaining is to show that the convergence speed in (28) is exponentially fast. To this end, let
Then it follows from (26) that δ<1. This, together with (27), implies that
λi+1−λi≦δ(λi−λi−1)
for any i≧1, and hence λi converges to λy
Plugging Algorithm 2 into Step 6 in Algorithm 1, one then gets an efficient algorithm for computing the ML estimate of (yc, b, λ) in the LPTCM. To illustrate the effectiveness of the LPTCM, the resulting algorithm was applied to the same DCT coefficients shown in
3.4 GGTCM
Plugging the GG density function in (1) into (7), we get the GGTCM given by
where γ(s,x) is defined as
γ(s,x)∫0xts−1e−tdt.
With reference to Algorithm 1, in this case, Step 6 in Algorithm 1 is equivalent to determining the ML estimate (denoted by (αy
from the sample set {Yi:i∈S}. Since |Yi|≦yc for any i∈S, the log-likelihood function of the sample set {Yi:i∈S} with respect to p(γ|α,β) is equal to
Computing the partial derivatives of L(α,β) with respect to α and β and setting them to zero yields
where t=(γc/α)β. One can then take a solution to (31) as (αy
Unlike the case of LPTCM, however, solving (31) does not seem to be easy. In particular, at this point, we do not know whether (31) admits a unique solution. There is no developed algorithm with global convergence to compute such a solution either even if the solution is unique. As such, Step 6 in Algorithm 1 in the case of GGTCM is much more complicated than that in the case of LPTCM.
Suboptimal alternatives are to derive approximate solutions to (31). One approach is to solve the two equations in (31) iteratively, starting with an initial value of β given by (2): (1) fix β and solve the first equation in (31); (2) fix α and solve the second equation in (31); and (3) repeat these two steps until no noticeable improvement can be made. Together with this suboptimal solution to (31), Algorithm 1 was applied to to the same DCT coefficients shown in
4 Discrete Transparent Composite Model
Though DCT in theory provides a mapping from a real-valued space to another real-valued space and generates continuous DCT coefficients, in practice (particularly in lossy image and video coding), DCT is often designed and implemented as a mapping from an integer-valued space (e.g., 8-bits pixels) to another integer-valued space and gives rise to integer DCT coefficients (e.g., 12-bits DCT coefficients in H.264). In addition, since most images and video are stored in a compressed format such as JPEG, H.264, etc., for applications (e.g., image enhancement, image retrieval, image annotation, etc.) based on compressed images and video, DCT coefficients are available only in their quantized values. Therefore, it is desirable to establish a good model for discrete (integer or quantized) DCT coefficients as well.
Following the idea of continuous TCM, in this section we develop a discrete TCM which partitions discrete DCT coefficients into the main and tail portions, and models the main portion by a discrete parametric distribution and the tail portion by a discrete uniform distribution. The particular discrete parametric distribution we will consider is a truncated geometric distribution, and the resulting discrete TCM is referred to as the GMTCM. To provide a uniform treatment for both integer and quantized DCT coefficients, we introduce a quantization factor of step size. Then both integer and quantized DCT coefficients can be regarded as integers multiplied by a properly chosen step size.
4.1 GMTCM
Uniform quantization with dead zone is widely used in image and video coding (see, for example, H.264 and HEVC). Mathematically, a uniform quantizer with dead zone and step size q is given by
where q/2≦Δ<q. Its input-output relationship is shown in
is distributed as follows
With the help of q, discrete (integer or quantized) DCT coefficients then take values of integers multiplied by q. (Hereafter, these integers will be referred to as DCT indices.) Note that pi in (32) is essentially a geometric distribution. Using a geometric distribution to model the main portion of discrete DCT coefficients, we then get the GMTCM given by
where 0≦p≦1 is the probability of the zero coefficient, 0≦b≦1, 1≦K≦a, and a is the largest index in a given sequence of DCT indices. Here a is assumed known, and b, p, λ and K are model parameters.
4.2 ML Estimate of GMTCM Parameters
4.2.1 Algorithms
Let un=u1, u2, . . . , un be a sequence of DCT indices. Assume that un behaves according to the GMTCM defined by (33) with umaxmax{|ui|: 1≦i≦n}≦a. We now investigate how to compute the ML estimate (b*,p*,λ*,K*) of (b,p,λ,K) from un.
Let
N
0
={j:u
j=0},N1(K)={j:0<|uj|≦K}, and N2(K)={j:|uj|>K}.
The log-likelihood function of un according to (33) is equal to
For any 1≦K≦a, let
In view of (34), one can verify that
When K=1, G(K,λ,b,p) does not depend on λ and hence λ1 can selected arbitrarily.
We are now led to determining λK for each 1<K≦a. At this point, we invoke the following lemma, which is proved in Appendix A (below).
Lemma 1: Let
Then for any
as a function of t>0 is strictly concave, and for any K>1, g(t) is strictly decreasing over t∈(0,∞), and
Computing the derivative of L(K,λ) with respect to λ and setting it to 0 yields
In view of Lemma 1, then it follows that (1) when C=0, λK=0; (2) when
and (3) when
is the unique solution to (37). In Case (3), the iterative procedure described in Algorithm 3 (
Combining the above derivations together, we get a compete procedure for computing the ML estimate (b*, p*, λ*, K*) of (b, p, λ, K) in the GMTCM, which is described in Algorithm 4 (
Remark 2: When implementing Algorithm 4 for actual DCT indices {ui: 1≦i≦n} with flat tail, there is no need to start Algorithm 4 with K=1. Instead, one can first choose K0 such that |N2 (K0)| is a fraction of n and then run Algorithm 4 for K∈[K0, a]. In our experiments, we have found that choosing K0 such that |N2(K0)| is around 20% of n is a good choice.
4.2.2 Convergence and Complexity Analysis
In parallel with Algorithm 2, Algorithm 3 also converges exponentially fast when 0<C<(K−1)/2. In particular, we have the following result, which is proved in Appendix B (below).
Theorem 3: Assume that 0<C<(K−1)/2. Then λ(i) computed in Step 12 of Algorithm 3 strictly increases and converges exponentially fast to λK as i→∞.
The complexity of computing the ML estimate of the GMTCM parameters comes from two parts. The first part is to evaluate the cost of (34) over a set of K. The second part is to compute λK for every K using the Algorithm 3. Note that C in Algorithm 3 can be easily pre-computed for interesting values of K. Thus, the main complexity of Algorithm 3 is to evaluate the two simple equations in (38) for a small number of times in light of the exponential convergence, which is generally negligible. Essentially, the major complexity for the parameter estimation by Algorithms 3 and 4 is to collect the data histogram {hj,j=1, . . . , a} once. Compared with the complexity of solving (2) for GG parameters, where the data samples and the parameters to be estimated are closely tied together as in the Σi=1n|xi|β log |xi| term and the βΣi=1n|xi|β term, the complexity of parameter estimation in the case of GMTCM is significantly lower.
Remark 3: In our discussion on TCMs for DCT coefficients so far, DCT coefficients are separated into two portions: the main portion and tail portion. As will be apparent to one skilled in the art, the main portion could be further separated into multiple sub-portions with each sub-portion modeled by a different parametric distribution. The resulting TCM would be called a multiple segment TCM (MTCM), described in greater detail below. In addition, the tail portion could be modeled by another parametric distribution such as a truncated Laplacian, GG, or geometric distribution as well since a uniform distribution is a de-generated Laplacian, GG, or geometric distribution.
Remark 4: Although we have used both continuous and discrete DCT coefficients as our data examples, all TCM models discussed so far are applied equally well to other types of data such as wavelet transform coefficients, prediction residuals arising from prediction in predictive coding and other prediction applications, and data which is traditionally modeled by Laplacian distributions.
5 Experimental results on Tests of Modeling Accuracy
This section presents experimental results obtained from applying TCMs to both continuous and discrete DCT coefficients and compare them with those from the Laplacian and GG models.
5.1 Test Materials and Performance Metric
Two criteria are applied in this disclosure to test modeling accuracy: the χ2 test, as defined in (6), and the divergence distance test defined as follows
where I is the number of intervals into which the sample space is partitioned in the continuous case or the alphabet size of a discrete source, pi represents probabilities observed from the data, and qi stands for probabilities obtained from a given model. Note that pi=0 is dealt with by defining 0 ln 0=0.
Three sets of testing images are deliberately selected to cover a variety of image content. The first set, as shown in
Tests for continuous DCT coefficients were conducted by computing 8×8 DCT using floating point matrix multiplication. In our tests for discrete DCT coefficients, a raw image was first compressed using a Matlab JPEG codec with various quality factors (QF) ranging from 100, 90, 80, to 70; the resulting quantized DCT coefficients and corresponding quantization step sizes were then read from obtained JPEG files.
Tests were carried out for five different models: the Laplacian model, GG model, GGTCM, LPTCM, and GMTCM. Due to its high computation complexity, GGTCM was applied only to continuous DCT coefficients. On the other hand, GMTCM is applicable only to discrete coefficients. The Laplacian and GG models were applied to both continuous and discrete DCT coefficients; the same parameter estimation algorithms, (4) for the Laplacian model and (2) for the GG model, were used for both continuous and discrete DCT coefficients.
5.2 Overall Comparisons for Each Image
In the continuous case, the GGTCM outperforms the GG model, the LPTCM outperforms the Laplacian model, and the GG models outperforms the Laplacian model in general, as one would expect. An interesting comparison in this case is between the GG model and LPTCM. Table 1 (
In the discrete case, comparisons were conducted among the GMTCM, GG model, and Laplacian model in terms of both the divergence distance and χ2 value. As expected, the GMTCM is always better than the Laplacian model according to both the divergence distance and χ2 value, and hence the corresponding results are not included here. For the comparison between the GMTCM and GG model, results are shown in Tables 2, 3, 4, and 5 for quantized DCT coefficients from JPEG coded images with various QFs, where wd stands for the percentage of frequencies among all tested AC positions that are in favor of the GMTCM over the GG model in terms of the divergence distance, and wχ
5.3 Comparisons of χ2 Among Three Models for Individual Frequencies
In the above overall comparisons, Table 2 (
From Tables 6, 7, and 8, it is fair to say that (1) the GMTCM dramatically improves the modeling accuracy over the Laplacian model; (2) when the GMTCM is better than the GG model, χGMTCM2 is often much smaller, up to 15658 tunes smaller, than χGGD2; and (3) when the GG model is better than the GMTCM, the difference between χGMTCM2 and χGGD2 is not as significant as one would see in Case (2)—for example, in Table 8, χGGD2 is only up to 9 times smaller than χGMTCM2.
Another interesting result is observed in Table 9 (
6 Applications
This section briefly discusses applications of TCM in various areas such as data compression and image understanding. For example, as shown in
6.1 Data Compression
As DCT is widely used in image/video compression, e.g. in JPEG, H.264, and HEVC, an accurate model for DCT coefficients would be helpful to further improvement in compression efficiency, complexity, or both in image/video coding.
6.1.1 Lossless Coding Algorithm Design
Entropy coding design in image and video coding such as JPEG, H.264 and HEVC is closely related to understanding the DCT coefficient statistics, due to the wide application of DCT in image and video compression. The superior modeling accuracy by TCM has been utilized by us to design an entropy coding scheme for discrete DCT coefficients (such as in JPEG images). Specifically, GMTCM parameters are calculated and coded for each frequency. Then, a bit-mask is coded to identify outliers, so that outliers and DCT coefficients within the main portion can be further coded separately with their respective context modeling. For DCT coefficients within the main portion, parameters of the truncated geometric distributions are encoded and then used to further improve the coding efficiency. In spite of the overhead for coding outliers flags, the new entropy codec shows on average 25% rate saving when compared with a standard JPEG entropy codec for high fidelity JPEG images (with quantization step size being 1 for most low frequency AC positions), which are significantly better than other state-of-the-art lossless coding methods for DCT coefficients [15] and for gray-scale images [16]. A suitable decoder can implement at least some or all of the functions of the encoder, as an inverse.
6.1.2 Lossy Coding Algorithm Design
Quantization design, as the core of lossy coding, roots in the rate distortion theory, which generally requires a statistic model to provide guidance to practical designs. Quantization design in DCT-based image and video coding usually assumes a Laplacian distribution due to its simplicity and fair modeling accuracy [12]. Since the LPTCM improves dramatically upon the Laplacian model in terms of modeling accuracy while having similar simplicity, it has been applied by us in to design quantizers for DCT coefficients and a DCT-based non-predictive image compression system, which is significantly better than JPEG and the state-of-the-art DCT-based non-predictive image codec [14] in terms of compression efficiency and compares favorably with the state-of-the-art DCT-based predictive codecs such as H.264/AVC intra coding and HEVC intra coding in high rate cases in terms of the trade-off between compression efficiency and complexity.
For example, as shown in
6.2 Image Understanding
Image understanding is another application for DCT coefficient modeling. It is interesting to observe that in natural images the statistically insignificant outliers detected by the GMTCM carry perceptually important information, which shed lights into DCT-based image analysis.
6.2.1 Featured Outlier Images Based on GMTCM
One important parameter in the GMTCM model is the cutting point yc=Kq between a parametric distribution for the main portion and the uniform distribution for the flat tail portion. Statistically, the outlier coefficients that fall beyond yc into the tail portion are not significant—although the actual number of outliers varies from one frequency to another and from one image to another, it typically ranges in our experiments from less than 0.1% of the total number of AC coefficients to 19% with an average around 1.2%. However, from the image quality perception perspective, the outliers carry very important information, as demonstrated by
As the inlier image contains all DC components and inlier AC components, a down-sizing operation would impact our perception on the difference between the original image and the inlier image. Hence,
It is interesting to show the information rate for outliers, i.e., how many bits are needed to represent outlier images. We have also applied TCM to enhance entropy coding design for DCT coefficients, where outliers are encoded separately from inliers. It is observed that outliers only consume about 5% of the total bits.
Finally, it is worthwhile to point out that outlier images are related to, but different from conventional edge detection. An outlier image captures some global uniqueness in an image, while edges are usually detected based on local irregularity in the pixel domain. For example, the large area of vertical patterns on the left-top corner of
6.2.2 Image Similarity
Similarity measurement among images plays a key role in image management, which attracts more and more attention in industry nowadays due to the fast growth of digital photography in the past decade. One application of DCT models is to measure the similarity among images by estimating the model parameters of different images and calculating a distribution distance. Because DCT coefficients well capture some spatial patterns in the pixel domain, e.g., AC1 reflecting a vertical pattern and AC8 preserving a horizontal pattern, the distribution distance between DCT coefficient models well represents the similarity between two images. Apparently, this type of similarity measurement roots in data histogram. Yet, in practice, histogram is not a good choice to be used, as it requires a flat overhead. This is particularly problematic for a large scale image management system. On the other hand, model-based distribution distances use only a few parameters with negligible overhead, thus providing a good similarity measurement between digital images particularly when the modeling accuracy is high. The inventors have studied along this line to use the GMTCM for image similarity and show promising performance.
The outlier images shown and discussed in Subsection 6.2.1 can be used to further enhance image similarity testing based on model-based distribution distances. Since outliers are insignificant statistically, their impact on model-based distribution distances may not be significant. And yet, if two images look similar, their respective outlier images must look similar too. As such, one can build other metrics based on outlier images to further enhance image similarity testing. In addition, an outlier image can also be used to detect whether a given image is scenic and to help improving face detection. These and other applications using the GMTCM are contemplated as being within the scope of the present disclosure.
Reference is now made to
From the method 2600, a composite distribution model can be defined as a composite of the first distribution model (e.g. uniform distribution model) and the at least one further distribution model having the respective determined parameters. At event 2608, the method 2600 includes performing a device operation on at least part of the composite distribution model. For example, the device operation may be implemented on one of the distribution models but not the others. In an example embodiment, the device operation is performed on the entire composite distribution model.
In an example embodiment, the at least one parametric distribution model includes at least one of: a Laplacian distribution model, a generalized Gaussian model, and a geometric distribution model.
Referring to event 2608, in some example embodiments, the device operation includes at least one of storing on a memory, transmitting to a second device, transmitting to a network, outputting to an output device, displaying on a display screen, improving data compression of the set of transform coefficients using the composite distribution model, determining image similarity between different images by comparing at least part of the composite distribution model, determining a goodness-of-fit between the composite distribution model and the set of transform coefficients, and generating an identifier which associates the composite distribution model with the set of transform coefficients.
The set of transform coefficients includes: discrete cosine transform coefficients, Laplace transform coefficients, Fourier transform coefficients, wavelet transform coefficients, prediction residuals arising from prediction in predictive coding and other prediction applications, or data which is traditionally modeled by Laplacian distributions. The set of transform coefficients can be generated in real-time (e.g. from a source image or media file), obtained from the memory 2504 (
Reference is still made to
At event 2610, the method 2600 includes determining at least one boundary coefficient value which satisfies a maximum likelihood estimation between the set of transform coefficients and a composite distribution model which is a composite of a plurality of distribution models each for a subset of transform coefficients of the set bounded by each of the at least one boundary coefficient values. This can include determining one or more parameters of a first distribution model, for example at least one uniform distribution model, for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values. This can include determining parameters of at least one further distribution model, such as at least one parametric distribution model, for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values. The maximum likelihood estimation at event 2610 is illustrated with double-arrows because of the co-dependence between the variables or parameters between the at least one boundary coefficient value and the distribution models.
At event 2608, the method 2600 includes performing a device operation on at least one of the subsets of transform coefficients. In some example embodiments, the device operation on the at least one of the subsets of transform coefficients includes at least one of: encoding, storing on a memory, transmitting to a second device, transmitting to a network, outputting to an output device, decoding, displaying a decoded version on a display screen, determining image similarity between different images by comparison of the at least one of the subsets of discrete transform coefficients, and generating an identifier which associates the composite distribution model with the at least one of the subsets of discrete transform coefficients.
Still referring to event 2608, the device operation on the subset of coefficients can be used to filter an image using the boundary coefficient value, for example maintaining at least one subset bounded by the boundary coefficient value and setting the remaining subsets of coefficient values to a zero value. The remaining subset(s) can then be decoded and displayed on a display, for example. This has been illustrated in detail with respect to
7 Conclusions to TCM
Motivated by the flat tail phenomenon in DCT coefficients and its perceptual importance, this disclosure has developed a model dubbed transparent composite model (TCM) for modeling DCT coefficients, which separates the tail portion of DCT coefficients from the main portion of DCT coefficients and uses a different distribution to model each portion: a uniform distribution for the tail portion and a parametric distribution such as truncated Laplacian, generalized gaussian (GG), and geometric distributions for the mail portion. Efficient online algorithms with global convergence have been developed to compute the ML estimates of the parameters in the TCM. It has been shown that among the Laplacian model, GG model, GGTCM, and LPTCM, the GGTCM offers the best modeling accuracy for real-valued DCT coefficients at the cost of large extra complexity. On the other hand, for discrete DCT coefficients, tests over a wide variety of images based on both divergence distance and χ2 test have shown that the GMTCM outperforms both the Laplacian and GG models in term of modeling accuracy in majority cases while having simplicity and practicality similar to those of the Laplacian model, thus making the GMTCM a desirable choice for modeling discrete DCT coefficients in real-world applications. In addition, it has been demonstrated that the tail portion identified by the GMTCM gives rise to an image called an outlier image, which, on one hand, achieves dramatic dimension reduction in comparison with the original image, and on the other hand preserves perceptually important unique global features of the original image. It has been further suggested that the applications of the TCM, in particular the LPTCM and GMTCM, include image and video coding, quantization design, entropy coding design, and image understanding and management (image similarity testing, scenic image blind detection, face detection, etc.).
In this appendix, we prove Lemma 1.
First note that g(t) can be rewritten as
Its derivative is equal to
It is not hard to verify that
whenever K>1, where
This, together with (41), implies that g1(t)<0 for any t>0 whenever K>1. Hence g(t) is strictly decreasing over t∈(0, ∞).
Next we have
Finally, the strict concavity of
as a function of t to follows from (41) and the fact that
This completes the proof of Lemma 1.
In this appendix, we prove Theorem 3.
First, arguments similar to those in the proof of Theorem 2 can be used to show that λ(i) is upper bounded by λK, strictly increases, and converges to λK as i→∞. Therefore what remains is to show that the convergence is exponentially fast. To this end, let
In view of (38), it follows that
In view of Lemma 1 and its proof (particularly (41)), it is not hard to verify that 0<δ<1. Therefore, as i→∞, h(λ(i)) converges to h(λK) exponentially fast. Since the derivative of h(λ) is positive over λ∈[λ(0),λK] and bounded away from 0, it follows that λ(i) also converges to λK exponentially fast. This competes the proof of Theorem 3.
8 Introduction to MTCM
The above example embodiments have shown that (1) for real-valued continuous AC coefficients, LPTCM offers a superior trade-off between modeling accuracy and complexity; and (2) for discrete (integer or quantized) DCT coefficients, which are mostly seen in real-world applications of DCT, GMTCM models AC coefficients more accurately than the Laplacian model and GG model in majority cases while having simplicity and practicality similar to those of the Laplacian model. When limited to AC coefficients at low frequencies, however, GMTCM only ties up with the GG model in terms of modeling accuracy. Since DCT coefficients at low frequencies are generally more important than those at high frequencies to human perception, it would be advantageous to further improve the modeling accuracy of LPTCM and GMTCM for low frequency DCT coefficients without sacrificing modeling simplicity and practicality.
In accordance with at least some example embodiments, we extend the concept of TCM by further separating the main portion of DCT coefficients into multiple sub-portions and modeling each sub-portion by a different parametric distribution (such as truncated Laplacian, GG, and geometric distributions). The resulting model is dubbed a multiple segment TCM (MTCM). In the case of general MTCMs based on truncated Laplacian and geometric distributions (referred to as MLTCM and MGTCM, respectively), a greedy algorithm is developed for determining a desired number of segments and for estimating the corresponding separation boundaries and other MTCM parameters. For bi-segment TCMs, an efficient online algorithm is further presented for computing the maximum likelihood (ML) estimate of their parameters. Experiments based on Kullback-Leibler (KL) divergence and χ2 test show that (1) for real-valued continuous AC coefficients, the bi-segment TCM based on truncated Laplacian (BLTCM) models AC coefficients more accurately than LPTCM and the GG model while having simplicity and practicality similar to those of LPTCM and pure Laplacian; and (2) for discrete DCT coefficients, the bi-segment TCM based on truncated geometric distributions (BGTCM) significantly outperforms GMTCM and the GG model in terms of modeling accuracy, while having simplicity and practicality similar to those of GMTCM. Also shown is that the MGTCM derived by the greedy algorithm further improves the modeling accuracy over BGTCM at the cost of more parameters and slight increase in complexity. Therefore, BLTCM/MLTCM and BGTCM/MGTCM represent the state of the art in terms of modeling accuracy for continuous and discrete DCT coefficients (or similar type of data), respectively, which, together with their simplicity and practicality, makes them a desirable choice for modeling DCT coefficients (or similar type of data) in real-world image/video applications.
9 Review of TCM
In this section, we briefly review the concept of TCM for continuous DCT coefficients, as described in detail above.
Let f(y|θ) be a probability density function (pdf) with parameter θ∈Θ, where θ could be a vector, and Θ is the parameter space. Let F(y|θ) be the corresponding cumulative distribution function (cdt), i.e.
F(y|θ)∫−∞yf(u|θ)du.
Equation numbers will re-start from (1) for convenience of reference.
Assume that f(y|θ) is symmetric in y with respect to the origin, and F(y|θ) is concave as a function of y in the region y≦0. It is easy to verify that Laplacian, Gaussian, and GG distributions all satisfy this assumption. The continuous TCM based on F(y|θ) is defined as
where 0≦b≦1, 0<d≦yc<a, and a represents the largest magnitude a sample y can take. Here both a and d are assumed to be known. It is not hard to see that given (yc, b, θ), as a function of y, p(y|yc, b, θ) is indeed a pdf, and also symmetric with respect to the origin.
According to the TCM defined in (42), a sample y is generated according to the truncated distribution
with probability b, and according to the uniform distribution
(also called the outlier distribution) with probability 1-b. The composite model is transparent since given parameters (yc, b, θ), there is no ambiguity regarding which distribution a sample y≠±yc comes from. The ML estimates of the separation boundary yc and parameter (θ, b) can be computed efficiently through the online algorithm with global convergence developed in Sections 3 and 4 above, especially when f(y|74) is Laplacian. As described in Sections 3 to 6 above, the value of b is on average around 0.99. As such, the portions modeled by the truncated distribution
and the outlier distribution are referred to as the main and tail portions, respectively.
10 Continuous Multiple Segment Transparent Composite Model
To improve the modeling accuracy of TCM, especially for AC coefficients at low frequencies, we now further separate the main portion of DCT coefficients into multiple sub-portions and model each sub-portion independently by a different parametric distribution, yielding a model we call a multiple segment transparent composite model. Assuming DCT coefficients are continuous (i.e. can take any real value), in this section we describe and analyze continuous MTCMs.
10.1 Description of General Continuous MTCMs
Separate the main portion further into l sub-portions. The MTCM based on F(y|θ) with l+1 segments is defined as
whenever y=±|yc
Note that in the MTCM defined in (43), the tail portion is also modeled by a truncated distribution based on f(y|θ). This deviation from the TCM defined in (42) is motivated by the observation that given
as some parameter in θl+1 goes to ∞ for most parametric distributions f(y|θ) such as the Laplacian, Gaussian, and GG distributions. Therefore, leaving θl+1 to be determined by ML estimation would improve modeling accuracy in general.
Depending on f(y|θ), estimating the MTCM parameters
10.2 ML Estimates of Bi-Segment TCM Parameters
In the case of bi-segment TCM, we have l=1 and the parameters to be estimated are yc
F″(y|θ)[1−F(y|θ)]+[F′(y|θ)]2≧0 (44)
for any y≧0. It is not hard to verify that the Laplacian, Gaussian, and GG distributions with the shape parameter β≦1 all satisfy (44).
Let Y1n=(Y1, Y2, . . . , Yn) be a sequence of DCT coefficients in an image or in a large coding unit (such as a block, a slice or a frame in video coding) at a particular frequency or across frequencies of interest. Assume that Y1n behaves according to the MTCM defined in (43) with l=1 and with Ymaxmax{|Yi|:1≦i≦n}<a and Ymax≦d. (When Ymax<d, the ML estimates of yc
Given Y1n with d≦Ymax<a, define
N
1(yc
N
2(yc
N
3(yc
Then the log-likelihood function g(yc
where |S| denotes the cardinality of a finite set S. In view of (44) and the assumption that F(y|θ) is concave, one can verify that given |N1(yc
−|N1(yc
as a function of yc
Therefore, the ML estimate of yc
For any yc
(b1(yc
Given yc
Summarizing the above derivations into Algorithm 5 (
Theorem 4: The vector (yc
Remark 5: When f(y|θ) is Laplacian, the distribution of the tail in the bi-segment TCM specified in (43) with l=1 approaches the uniform distribution over (yc
10.3 Estimates of MLTCM Parameters
Suppose now that f (y|θ) is Laplacian. Plugging the Laplacian density function
into (43), we get the MLTCM with l+1 segments given by
where
0<yc
whenever |y|=yc
In practice, neither l nor (
Let VT=(V1, V2, . . . , VT) be a sequence of samples drawn independently according to the generic truncated Laplacian distribution given in (49). From the proof of Theorem 2, the ML estimate λ(VT) of λ from VT is the unique solution to
and by convention, the solution to (50) is equal to 0 if C≦0, and +∞ if C≧yc/2. From the central limit theorem and strong law of large numbers, we have
Therefore, we have
In particular,
Let λ+(VT) be the unique solution to
and λ−(VT) be the unique solution to
as a function of λ is strictly increasing over λ>0, it follows from (51), (52), and (53) that (51) is equivalent to
In other words, [λ−(VT),λ+(VT)] is a confidence interval for estimating λ with asymptotic confidence level 1−α.
The above derivation provides a theoretic basis for us to develop a greedy method for determining l and for estimating
Pr{λ
−(VT)≦λ≦λ+(VT)}
can be well approximated by 1−α. As before, sort |Y1|, |Y2|, . . . , |Yn| in ascending order into W1≦W2≦ . . . ≦Wn. Then let
W=(W1,W2, . . . ,Wn)
and write (Wi, Wi+1, . . . , Wj), for any 1≦i≦j≦n, as Wij, and (W1, W2, . . . , Wj) simply as Wj. Pick a T such that T≧T*, WT>0, and WT+1>WT. Let
ΔT|{i:Wi=WT+1}|.
Compute λ+(WT) and λ−(WT) as in (52) and (53) respectively with yc=WT, by replacing Vi by Wi. Compute λ(WT+ΔT) as in (50) with yc=WT+ΔT by replacing Vi by Wi. In view of the derivations from (50) to (54), WT and WT+1T+ΔT would be deemed to come from the same Laplacian model if λ(WT+ΔT)∈[λ−(WT),λ+(WT)], and from different models otherwise. Using this criterion, one can then grow each segment recursively by padding each sample immediately adjacent to that segment into that segment until that sample and that segment are deemed to come from different models. This is the underlying idea behind the greedy method described as Algorithm 6 (
Denote the value of T at the end of Algorithm 6 in response to W as T(W). After the first segment with length T1=T(W) is identified and the values of yc
(WT
with ys=yc
T
2
=T(WT
y
c
=y
c
+Y
c(WT
b
2
=B(WT
λ2=Λ(WT
This procedure can be repeated again and again until there are no more remaining samples, yielding a greedy method described as Algorithm 7 (
Let J denote the value of j at the end of Algorithm 7 in response to Y1n. Then yc
11 Discrete Multiple Segment Transparent Composite Model
Though DCT in theory provides a mapping from a real-valued space to another real-valued space and generates continuous DCT coefficients, in practice (particularly in lossy image and video coding), DCT is often designed and implemented as a mapping from an integer-valued space (e.g., 8-bits pixels) to another integer-valued space and gives rise to integer DCT coefficients (e.g., 12-bits DCT coefficients in H.264). In addition, since most images and video are stored in a compressed format such as JPEG, H.264, etc., for applications (e.g., image enhancement, image retrieval, image annotation, etc.) based on compressed images and video, DCT coefficients are available only in their quantized values. Therefore, it is desirable to further improve the modeling accuracy of GMTCM for discrete (integer or quantized) DCT coefficients in practice by considering the discrete counterpart of continuous MTCMs, i.e., discrete MTCMs.
The particular discrete MTCM we shall consider and analyze in this section is the one where each segment is modeled by a truncated geometric distribution. The resulting discrete MTCM is broadly referred to as MGTCM in general and as BGTCM in the special case of two segments. To provide a unified treatment for both integer and quantized DCT coefficients, we introduce a quantization factor of step size. Then both integer and quantized DCT coefficients can be regarded as integers multiplied by a properly chosen step size.
11.1 MGTCM
Consider uniform quantization with dead zone, which is widely used in image and video coding (see, for example, H.264 and HEVC). Mathematically, the output of the uniform quantizer with dead zone Δ and step size q in response to an input X is given by
where q/2≦Δ<q. Assume that the input X is distributed according to the Laplacian distribution. Then the quantized index
is distributed as follows
With the help of q, discrete (integer or quantized) DCT coefficients then take values of integers multiplied by q. (Hereafter, these integers will be referred to as DCT indices.) Note that pi in (55) is essentially a geometric distribution. Using a geometric distribution to model each segment, we then get the MGTCM with l+1 segments given by
11.2 ML Estimates of MGTCM Parameters
Let un=u1, u2, . . . , un be a sequence of DCT indices. Assume that un behaves according to the MGTCM defined by (56) with umaxmax{|ui|: 1≦i≦n}≦a. When the number l+1 of segments is given, the parameters
Let N0={j:uj=0}. For any 1≦i≦l+1, let
Given l, the log-likelihood function of un according to (56) is then equal to
Given
(
Then it follows from (57) that
and for any 1≦i≦l+1,
Finally, the ML estimate of
G(
Accordingly, the ML estimates of
λi*=λi(
for any 1≦i≦l+1 with
Given
and by convention, the solution to (60) is equal to 0 if C≦0, and +∞ if C≧[Ki−Ki−1−1]/2. Therefore, the complexity of computing
11.3 Greedy Estimation of l and Other MGTCM Parameters
When the number l+1 of segments in the MGTCM defined by (56) is unknown, it has to be estimated as well along with other parameters
Consider a generic truncated geometric distribution
Let VT=(V1, V2, . . . , VT) be a sequence of samples drawn independently according to the generic truncated geometric distribution given by (61). As shown in Algorithm 3 (
and by convention, the solution to (62) is equal to 0 if C≦0, and +∞ if C≧(K−1)/2. In parallel with (52) and (53), let λ+(VT) and λ−(VT) be respectively the unique solution to
Then (54) remains valid. Note that b0=|N0|/n. Let {circumflex over (n)}=n—|N0|. Sort |ui|, i∉N0, in ascending order into W1≦W2≦ . . . ≦W{circumflex over (n)}, and let
W=(W1,W2, . . . ,W{circumflex over (n)}).
Then a greedy algorithm similar to Algorithm 6 can be used to estimate K1, b1, and λ1, from W, which is described in detail in Algorithm 8 (
Denote the value of T at the end of Algorithm 8 in response to W as T(W). Applying repeatedly Algorithm 8 to translated remaining samples until there are no more remaining samples, we get a greedy method described as Algorithm 9 (
In practical implementation of Algorithms 8 and 9, the step of sorting ui could be avoided. Instead, one can equivalently collect the data histogram {hj, j=0,1, . . . , a} from un. Since solutions to (62) to (64) can be effectively computed by the exponentially fast convergent Algorithm 3 (
12 Experimental Results on Tests of Modeling Accuracy
This section presents experimental results obtained from applying BLTCM to model continuous DCT coefficients with comparison to GG and LPTCM in non-multiple TCM as described above, and applying MGTCM and BGTCM to model DCT indices with comparison to GG, Laplacian and GMTCM in non-multiple TCM as described above. As DCT coefficients in real world application are often in their quantized values or take integer values (e.g., an integer approximation of DCT is used in H264 and HEVC), this section is mostly focused on the modeling performance of the discrete models MGTCM and BGTCM.
12.1 Tests of Modeling Accuracy
Two criteria are applied again in this section to test the modeling accuracy of the developed models and to compare them with other models in the literature. Again, the first one is the χ2, defined as follows,
where l is the number of intervals into which the sample space is partitioned, n is the total number of samples, ni denotes the number of samples falling into the ith interval, and qi is the estimated probability by the underlying theoretical model that a sample falls into the interval i. Another criterion is the Kullback-Leibler (KL) divergence distance, which is defined as
where l is the alphabet size of a discrete source, pi represents probabilities observed from the data, and qi stands for probabilities obtained from a given model. Note that pi=0 is dealt with by defining 0 ln 0=0.
When a comparison is conducted, a factor wd is calculated to be the percentage of DCT frequencies among all tested AC positions that are in favor of one model over another model in terms of having a smaller KL divergence from the data distribution. Another factor wχ
To illustrate the improvement of BGTCM over GMTCM for modeling low frequency DCT coefficients, experimental results are collected for low frequency DCT coefficients. Specifically, a zig-zag scan is performed and only the first 15 ACs along the scanning order are used for testing the modeling accuracy.
Three sets of testing images are deliberately selected to cover a variety of image content, as what have been used in Section 5 described above. The first set, as shown in
12.2 Overall Comparisons for Each Image
For modeling continuous DCT coefficients, experiments have been conducted to do overall comparison among BLTCM, LPTCM, and GG model. For modeling DCT indices, comparative experiments have been conducted for two pairs of models. The first is to compare BGTCM and the GG model. The second comparison is between BGTCM and MGTCM. For the overall comparison between other pairs of models, the result can be seen without experimental data. For example, BGTCM always outperforms GMTCM and GMTCM always has a better modeling accuracy than the Laplacian model.
Table 10 (
Table 12 (
Table 14 (
12.3 Comparisons of Modeling Accuracy for Individual Frequencies
While Tables 12-14 show comparative results for each image over all frequencies, it is of some interests to see the performance of all models for individual frequencies. Due to the space limit, only results for four images have been chosen to be shown in
As
13 Conclusions to MTCM
Motivated by the need to improve modeling accuracy, especially for low frequency DCT coefficients, while having simplicity and practicality similar to those of the Laplacian model, the second part of this disclosure has extended the transparent composite model (TCM) concept disclosed in the first part of the disclosure (i.e., sections 1 to 7) by further separating DCT coefficients into multiple segments and modeling each segment by a different parametric distribution such as truncated Laplacian and geometric distributions, yielding a model dubbed a multiple segment TCM (MTCM). In the case of bi-segment TCMs, an efficient online algorithm has been developed for computing the maximum likelihood (ML) estimates of their parameters. In the case of general MTCMs based on truncated Laplacian and geometric distributions (referred to as MLTCM and MGTCM, respectively), a greedy algorithm has been further presented for determining a desired number of segments and for estimating other corresponding MTCM parameters. It has been shown that (1) the bi-segment TCM based on truncated Laplacian (BLTCM) and MLTCM derived by the greedy algorithm offer the best modeling accuracy for continuous DCT coefficients while having simplicity and practicality similar to those of Laplacian; and (2) the bi-segment TCM based on truncated geometric distribution (BGTCM) and MGTCM derived by the greedy algorithm offer the best modeling accuracy for discrete DCT coefficients while having simplicity and practicality similar to those of geometric distribution, thus making them a desirable choice for modeling continuous and discrete DCT coefficients (or other similar type of data) in real-world applications, respectively.
In accordance with an example embodiment, there is provided a non-transitory computer-readable medium containing instructions executable by a processor for performing any or all of the described methods.
In any or all of the described methods, the boxes or algorithm lines may represent events, steps, functions, processes, modules, state-based operations, etc. While some of the above examples have been described as occurring in a particular order, it will be appreciated by persons skilled in the art that some of the steps or processes may be performed in a different order provided that the result of the changed order of any given step will not prevent or impair the occurrence of subsequent steps. Furthermore, some of the messages or steps described above may be removed or combined in other embodiments, and some of the messages or steps described above may be separated into a number of sub-messages or sub-steps in other embodiments. Even further, some or all of the steps may be repeated, as necessary. Elements described as methods or steps similarly apply to systems or subcomponents, and vice-versa. Reference to such words as “sending” or “receiving” could be interchanged depending on the perspective of the particular device.
While some example embodiments have been described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that some example embodiments are also directed to the various components for performing at least some of the aspects and features of the described processes, be it by way of hardware components, software or any combination of the two, or in any other manner. Moreover, some example embodiments are also directed to a pre-recorded storage device or other similar computer-readable medium including program instructions stored thereon for performing the processes described herein. The computer-readable medium includes any non-transient storage medium, such as RAM, ROM, flash memory, compact discs, USB sticks, DVDs, HD-DVDs, or any other such computer-readable memory devices.
Although not specifically illustrated, it will be understood that the devices described herein include one or more processors and associated memory. The memory may include one or more application program, modules, or other programming constructs containing computer-executable instructions that, when executed by the one or more processors, implement the methods or processes described herein.
The various embodiments presented above are merely examples and are in no way meant to limit the scope of this disclosure. Variations of the innovations described herein will be apparent to persons of ordinary skill in the art, such variations being within the intended scope of the present disclosure. In particular, features from one or more of the above-described embodiments may be selected to create alternative embodiments comprised of a sub-combination of features which may not be explicitly described above. In addition, features from one or more of the above-described embodiments may be selected and combined to create alternative embodiments comprised of a combination of features which may not be explicitly described above. Features suitable for such combinations and sub-combinations would be readily apparent to persons skilled in the art upon review of the present disclosure as a whole. The subject matter described herein intends to cover and embrace all suitable changes in technology.
All patent references and publications described or referenced herein are hereby incorporated by reference in their entirety into the Detailed Description of Example Embodiments.
This application claims the benefit of priority to U.S. Provisional Patent Application Ser. No. 61/827,321 filed May 24, 2013 entitled TRANSPARENT COMPOSITE MODEL FOR DCT COEFFICIENTS: DESIGN AND ANALYSIS, the contents of which are hereby incorporated by reference into the Detailed Description of Example Embodiments.
Number | Date | Country | |
---|---|---|---|
61827321 | May 2013 | US |