1. Field of the Invention
The present invention relates to algorithms and more particularly to algorithms useful in image compression.
2. Brief Description of Prior Developments
The number of systems generating multiple images of the same scene at different frequency bands and their applications have rapidly increased in the last several years. Remote sensing systems, thematic mapper sensors, and multispectral Forward Looking Infrared (FLIR) systems are only a few examples of such systems. The data product of a multispectral sensor is a stack of images for each scene referred to as a multispectral image or data cube. Each band or component of a multispectral image can be considered as a monochrome (scalar) image. Multispectral imagery provides signal diversity that can lead to more precise and accurate results than otherwise possible with traditional single spectral imagery. A major limitation of multispectral imaging systems in practice is the required manipulation and delivery of the associated massive data files. In order to store and transmit such files, very efficient compression tools are needed. Because of this need, this disclosure shall focus on a new method for lossy compression. True lossless compression is limited in performance; here, we allow the compression to be lossy but implicitly maintain control over the fidelity of the compressed imagery so we may trade off rate vs. distortion.
To date, many of the approaches for lossy multispectral image compression rely on two stages of data processing assuming that the various bands are perfectly registered. Typically, these methods first apply a 1-Dimensional Karhunen-Loève transform (KLT) along the spectral dimension leading to a set of spectrally uncorrelated eigen-images, and subsequently each of the resultant eigen-planes is compressed using a scalar image compression method such as JPEG or wavelet based compression as is disclosed in J. A. Saghri, A. G. Tescher & J. T. Reagan, “Practical transform codeing of multispectral imagery,” IEEE Signal Processing Magazine, pp. 32–43, January, 1995. The compression rate for each eigen-image is determined through a bit allocation strategy. Other 2-stage compression methods first decorrelate the data by applying a 2D frequency transform along the spatial dimensions and a 1-dimensional KLT is subsequently applied across the spectral dimension as is disclosed in D. Tretter, C. A. Bouman, “Optical transforms for multispectral and multilayer image coding,” IEEE TIP:4:296–308, March, 1995, J. N. Bradley & C. M. Brislawn; “Spectrum analysis of multispectral imagery in conjunction with wavelet/kit data compression,” Proc. ASIMOLAR Conf. 1:26–30, November, 1993; and J. Vaisey, M. Barlaud & M. Anonini, “Multispectral Image Coding Using Lattice VQ and the Wavelet Transform,” Proc. ICIP 98, Chicago, Ill., 1998. Quantization and coding is then used to encode the transform coefficients. As is also reported in “Spectrum analysis of multispectral imagery in conjunction with wavelet/klt data compression,” Proc. ASIMOLAR Conf. 1:26–30, November, 1993; and J. Vaisey, M. Barlaud & M. Anonini, “Multispectral Image Coding Using Lattice VQ and the Wavelet Transform,” Proc. ICIP 98, Chicago, Ill., 1998 at high compression rates (less than 0.2 bit/pixel/image), both methods seem to have the same performance. The two-stage approaches mentioned above, however, have three disadvantages:
The following example illustrates these issues. Consider a vector-valued M1×M2 multispectral image f(m))=[f1(m), f2(m), . . . , fL(m)]T with L bands. m=(m1,m2) represents a pixel's spatial position with m1=1, . . . , M1 and m2=1, . . . , M2, and 1≦k≦L is the spectral band index. The multispectral image is thus composed of M1×M2 f-vectors of length L. The image is then processed by the two-stage compression method described above, where a 1-Dimension KLT is applied along the spectral dimension. Let R=E(f(m)fT(m)) be the spectral correlation matrix and let qi and λi, i=1, . . . , L, be the eigenvectors and eigenvalues of R respectively. The KLT of f(m) along the spectral dimension is g(m)=QTf(m) where Q=[q1, q2, . . . , qL] is the L×L matrix whose columns are the eigenvectors of R. The resultant KLT coefficients form a set of images, {g1(m), g2(m), . . . , gL(M)} referred here to as eigen-images.
An approach to overcome the shortcomings of the two-stage compression algorithm would be to design a coder that takes advantage of the intra-band and cross-band spatial similarities. A 3D (3D) zero-tree coder, for instance, can be designed to jointly exploit the intra-band and cross-band correlations inherent in a data cube as is disclosed in A. Bilgin, G. Zweig & M. W. Marcellin, “Three-dimensional Image Compression using Integer Wavelets,” Appl. Optics, 39#11:1799–1814, April, 2000. This, however, can become very complex since the number bands to be jointly encoded can be quite large. Also, the bands are typically separable in three-dimensions and the zero-tree coder spawns children at a cubic rate having implications for the production of zero-trees. Or three-dimensional data may be processed via some three-dimensional extension of a well-known transform such as the DCT G. Abousleman, M. W. Marcellin & B. Hunt, “Compression of Hyperspectral Imagery using 3-D DCT and Hybrid DPCM/DCT,” IEEE TGRS 33#1:26–34, January, 1995. Here, again, there is the question of complexity and the constraints imposed by block transformation. In reference 6, the authors back off from processing using a 3D DCT in favor of a compromise using 2D DCT and DPCM in the spectral domain.
The present invention is an approach to multispectral image compression where the intra- and cross-band correlations are jointly exploited in a surprisingly simple yet very effective manner. The method is based on the fact that it is more efficient to jointly encode all the components of a vector source than to encode each component separately as is disclosed in A. Segall, “Bit allocation and encoding for vector sources,” IEEE Trans. Inf. Thy. IT-22:162–169, March 1976. The key component of the algorithm is a simple transformation, a bijection mapping of the original multispectral image into a virtual two-dimensional (2D) scalar image. In this research, we combine the bijection mapping with a very efficient scalar image-coding algorithm, the SPIHT codec as is disclosed in A. Bilgin, G. Zweig & M. W. Marcellin, “Three-dimensional Image Compression using Integer Wavelets,” Appl. Optics, 39#11:1799–1814, April, 2000. The SPIHT codec need not be modified for 3D operation. The choice of the bijection mapping will determine the way in which spatial and spectral information are combined: there is no intrinsic limit in number of bands processed at a time nor in the way spectral and spatial pixels are combined. Combining the bijection mapping with the SPIHT codec gives a no frills, low complexity method for processing multispectral data. This may be improved by selecting the optimal bijection mapping which is equivalent to the problem of finding a permutation map which reduces activity across pixels to effect pixel prediction. This latter optimization problem has been investigated largely in the context of lossless compression, as is disclosed in S. Tate, “Band Ordering in Lossless Compression of Multispectral Images,” IEEE TCOM 46#4:477–483, April, 1997; J. Wang, K. Zhang & S. Tang, “Spectral and Saptial Decorrelation of Landsat-M Data for Lossless Compression,” IEEE GRS 33#5:1277–1285, September, 1995; N. Sidiropoulos, M. Pattichis, A. Bovik & J. Havlicek, “COPERM: Transform-Domain Energy Compaction by Optimal Permutation,” IEEE TSP 47#6:1679–1688, June, 1999; and Z. Arnavut, “Applications for inversions to lossless image compression,”, Opt. Eng. 36#4:1028–1034, April, 1997. Here we shall propose another optimization approach with the goal of optimizing lossy compression. Activity from pixel to pixel is reduced for the purpose of introducing zero-trees into the coded stream thereby increasing compression without increasing distortion. By optimally mapping the multispectral image set into a single 2D-array and by subsequently applying a 2D-decomposition, the spatial correlation as well as the spectral correlation are jointly exploited. In this manner, the self-similarities across the different scales of the virtual image are exploited. Based on the statistical characteristics of the multispectral data, the bijection mapping can be optimized to minimize the distortion introduced by the compression algorithm. The optimization proposed here reduces to the maximization of a function that depends on the second-order statistics of the multispectral data.
The present invention is further described with reference to the accompanying drawings wherein:
a) and 4(b) are flow charts, wherein
a) and (b) are schematic drawings showing two different bijection mappings of the bispectral FLIR image;
a)–6(d) are graphs of lines of images ouputted by bijection mapping in
a)–8(d) show the second band of the reconstructed images at 0.08 bit/pixel/band wherein
a)–10(d) show the first band of the non-training image at 0.1 bit/pixel/band, wherein
a)–11(d) show band VH of the polarimetric SAR data at 0.4 bit/pixel/band, wherein
1.1 Bijection Mappings
A bijection mapping is a one-to-one onto function that maps set A to set B. The set A is called the domain of φ(x) and the set of all elements φ(x)εB is called the range of φ(x). The one-to-one correspondence between an element in A and an element in B assures that the function is invertible. That is, if εB, φ−1(y) is the set of all xεA such that φ(x)=y. φ−1(x) is called the inverse of the function φ(x). In our application, a special type of bijection mapping is used. Let RM
where f(m1,m2)=[f1(m1,m2),f2(m1,m2) . . . fL(m1,m2)]T is an L-component vector and where each element of f(m1,m2) is defined over the set of real numbers, i.e. fk(m1,m2)εR. A M1×M2 multispectral image with L bands is an example of an element in RM
B:RM
m1=1, 2, . . . ,M1; m2=1, 2, . . . , M2; k=1, 2, . . . , L; i=1, 2, . . . , L1M1, j=1, 2, . . . , L2M2
where GεRL
B−1:RL
In summary, RM
1.2 Bijection Mappings onto Zero-Trees
For an array fεRM
and Ell is the energy compacted in the approximation of the wavelet decomposition. In compression, higher Ec generally translates into smaller distortion in the reconstructed data. The answer to the first question reduces to how much computational complexity can we tolerate? If N1>Ni, i=1,2, and letting {B(N1,N2)} be the set of all possible bijection mappings of a N1×N2 cell, then {B(N1, N2)}⊂{B(N′1, N′2). Hence, we will always do equal or better as N1 and N2 are incremented. The real issue is, how much gain does one attain as Ni is incremented to Ni+1, Ni+2, because the size of the search space increases as (LN1N2)!. It is conceivable that an optimum may be achieved for small values of N1, N2 in which case picking a larger cell adds computation. As N1 is incremented to N1+1, an additional (LN2) possible bijection mapping are added to the search space. A related and perhaps more important question is: if only one parameter is incremented, either N1 or N2, which one should be chosen. In summary, the performance gain as the size of N1 and N2 are monotonically increased, is somewhat reminiscent to the order of an AR model. In fact, some of the same issues that arise on selecting the order of an AR model also arise here such as overfitting, complexity, etc. The second question is perhaps the most important since it is directly related to energy compaction. Having fixed N1 and N2 which one of the (LN1N2)! bijection mappings will maximize the energy compaction? Answers to these questions will be given in Section 3. Table 1 shows through a simple numerical example the performance gain that is achieved using bijection mapping.
The approach to the design of the image partition size N1×N2 is through heuristics, although other more elaborated approaches can be used. In the heuristic approach, the type of the 2D transform used in the encoding stage determines the shape of the partition cell. To decorrelate the virtual image, SPIHT uses a two-dimensional discrete wavelet transform that exploits the correlation by filtering the image first along the rows and then along the columns. This fact provides a reliable guideline defining N1 and N2 as well as the shape of the two-dimensional array, i.e. L1 and L2, inherent in the bijection mapping. We have found experimentally for the data to be considered that the correlation existing in a 1×N2×L cell is better captured by the wavelet decomposition if each 1×N2×L cell is mapped into a 1×N2L array. Thus, N2 consecutive vector-pixels in the same row are mapped into a 1×N2L array. This leads to a bijection mapping B:R1×N
M1×M2 multispectral image f(m) with L bands reduces to B:RM
where
where f(m1,m2) is the vector-valued pixel at (m1,m2), μ=[μ1, μ2, . . . , μL]T and σ=diag(σ1, σ2, . . . , σL) normalize the data set, where σk and μk are chosen such that the resultant normalized spectral components have the same dynamic range. It is shown in Table 2 that σk and μk are given respectively by:
where cov(x,y), var(x) and
is the image obtained by averaging the different spectrum components. Note that in the above equations the spatial index m=(m1,m2) has been dropped for simplicity.
P in (5) is an LN2×LN2 matrix that contains exactly one 1 in each row and column. P is referred here to as a permutation matrix since it defines the re-ordering of the pixels into the array G.
where IN is an N×N identity matrix, 0N is an N×N all zero matrix and JN is an N×N right-to-left diagonal matrix. Note also that
gives re-ordering of the shaded samples into the two-dimensional array. Each permutation matrix defines a different re-ordering and therefore a different bijection mapping. The permutation matrix P defining the re-ordering of the samples thus characterizes a bijection mapping. A fundamental problem is thus the design of P that leads to the highest compression attainable. For a 1×N2×L cell, there exists (LN2)! different re-orderings of the samples inside a cell.
This, consequently, leads to (LN2)! different M1×LM2 virtual images. Some of them are more suitable for compression than others. We expect those mappings with the pixels arranged to maximize low frequency correlation to be the best for compression. The question that naturally arises is which one of the (LN2)! bijection mappings generates virtual images that are most suitable for compression. Under the energy compaction viewpoint, the optimal bijection mapping is the one that compacts the most energy onto the low frequency components in the transform domain. Thus, the optimal bijection mapping, B(P) is selected such that for a fixed wavelet transform, where Ec is the energy compaction
B0=maxpEc(B(P)), (8)
defined in (4). Since the total energy of a multispectral image does not depend on the bijection mapping nor on the wavelet transform used, (8) reduces to maximizing the energy compacted in the low-frequency components of the wavelet domain. If the wavelet transform is applied to an image that is smooth, having most of its energy in the lowpass band of the spectrum, the resultant energy compaction is high. On the other hand, if the energy of an image is concentrated in the highpass band of the spectrum, the wavelet transform will have more of the energy distributed on the high-frequency components of the wavelet domain, resulting in poor energy compaction.
To illustrate this point better,
Let {tilde over (f)} be the vector formed by concatenating the normalized vector-valued pixels in a cell of size 1×N2×L, i.e.
{tilde over (f)}=[σf(m1,m2)T+μT,σf(m1,m2+1)T+μT, . . . ,σf(m1,m2+N2)T+μT]T.
Furthermore, let g=P{tilde over (f)} be a re-ordered vector of {tilde over (f)} produced by the permutation matrix P. We want to find the best re-ordering of the components of {tilde over (f)} such that the sum of correlations between consecutive components of the re-ordered vector is maximized. Let ri,j be the correlation between the i th and j th components of gp. Since {tilde over (f)} is formed by concatenating N2 consecutive normalized vector-valued pixels, ri,j represents the spectral correlation, or the spatial correlation, or the cross-correlation between pixels. The objecting function to be maximized is then given by
where rLN
Since P{tilde over (f)}, the objective function (9) can be rewritten as a function of the permutation matrix P as follows:
maxPεΩ(E({tilde over (f)}TPTAP{tilde over (f)})), (11)
where Ω is the set of all permutation matrix of size LN2. Thus, finding the best re-ordering of the components of {tilde over (f)} that produces a smooth virtual image reduces to finding the permutation matrix P that maximizes (11). The optimal permutation matrix P is determined using a heuristic search of the space Ω.
2.1 Sub-Optimal Bijection Mapping
Searching for the optimal solution of (11) can be time consuming since the search space is quite large. In fact, equation (11) has to be computed for each element in the search space Ω, and the permutation matrix that yields the maximum value is chosen as the optimal solution. This can become computational expensive since the search space increases as (LN2)!. To overcome this limitation, we propose a sub-optimal bijection mapping that has been experimentally proved to yield good results. This sub-optimal bijection mapping is described next.
Suppose that we have a 1×N2×L 3D-cell to be mapped into a 1×N2L array. Without loss of generality, let us assume that L=4 and N2 is an even number. Denote the four normalized bands as A, B, C and D, and the pixels belonging to those bands as ai, bi, ci and di respectively where i denotes the spatial position of the pixel in the cell. Let ra, rb, rc and rd be the spatial correlations between two consecutive pixels in the bands A, B, C and D respectively. Furthermore, let rX,Y be the spectral correlation between the band X and Y where X □{A, B, C, D} and Y □{A, B, C, D}. The suboptimal bijection mapping follows the four-step procedure:
Repeat the previous step up to all the bands are multiplexed into a single band. Figure [REF:fig:bijection](a) shows, for this example, the sub-optimal bijection mapping for N2=2. This four-step procedure is repeated for each 3D cell and the resultant 2D array is tiled in the way described above for the optimal solution. Note that the key in the sub-optimal bijection mapping is to keep, as close as possible, those pixels that have a high correlation. Moreover, it takes into account both the spectral and spatial correlations in defining the re-ordering of the samples. For both optimal and sub-optimal solutions, N2 has to be set to a small number (between 2 and 6). The reason for doing so lies in the fact that the spatial correlation decays rapidly as the distance between pixels increases.
The proposed compression algorithm was applied to 4-band airborne multispectral data collected using the AISA Airborne Imaging Spectrometer. Each band is a monochrome 8 bit image of size 384×700. A 256×256 pixel area in the upper left part of each band is used as training data to compute the correlation matrix needed for the optimization algorithm. The optimal bijection mapping, B0, is determined following the algorithm described in Section 3. The data is partitioned into 1×2×4 m cells. The bijection mapping B0:R1×2×4→R1×8 is then applied to each cell to form the virtual image that is compressed by the SPIHT coder at a compression rate from 0.05 bit/pixel/band to 1 bit/pixel/band. The sub-optimal bijection mapping is also determined and used here for comparison.
In order to illustrate the efficiency of the proposed compression algorithm, we include the comparison of this approach with two additional compression algorithms. The first algorithm compresses each spectral band separately using the SPIHT coder, and thus only takes advantage of the spatial redundancy in the image set. We use this compression algorithm, referred here to as the separable approach, as a benchmark to see the advantage of exploiting all the redundancy existing in a multispectral data. The second algorithm uses the two-stage approach described in the introduction. That is, the KLT is computed along the spectral dimension to obtain the 4 eigen-planes. The SPIHT coder is then used to compress each eigen-plane at a compression rate that is proportional to the logarithm of the energy of the eigen-plane. In order to compute the covariance matrix needed for the KLT, the same 256×256 area used in the proposed algorithm is used here as training data. We use the Peak Signal-to-Noise Ration (PSNR) as a distortion measure of the quality of the reconstructed image. For 8-bit images of size M1×M2, PSNR is defined as:
where xi,j is the pixel at the i th row and j th column in the image and {circumflex over (x)}i,j is its corresponding reconstructed pixel. PSNR is separately computed for each band.
Note that for high compression rates (less than 0.25 bit/pixel/band), the proposed compression algorithm outperforms the two-stage compression algorithm. Note also that the sub-optimal bijection mapping yields similar results than those found with the optimal bijection mapping.
In order to see the performance of the various compression approaches acting over unobserved data, a section of the four-band multispectral data that was not used as training data is compressed using the different algorithms. The same optimal and sub-optimal bijection mappings determined using the training data are used in the compression of the unobserved data. Similarly, the same eigenvectors of the covariance matrix determined using the training data are used to compute the KLT of the unobserved data for the two-stage compression algorithm.
As a third example, we tested the algorithms on a four-component polarimetric Synthetic Aperture Radar (SAR) data which contains signal-dependent multiplicative noisei. The SAR data was collected by the EROS data center. 11(a) shows the Vertical-Horizontal (VH) polarization of the test data.
It will be understood by those skilled in the art that we have described a compression method for multispectral images that jointly exploits the spectral and spatial correlation inherent in a multispectral image in a simple yet efficient way. The proposed method uses a bijection mapping to project the 3D data cube into a two-dimensional array. A scalar image compression coder then compresses the obtained two-dimensional image. An optimization algorithm is developed to find the optimal parameters of the bijection mapping that leads to the most suitable virtual image for compression. The performance of the proposed compression algorithm was compared with those found using the separable approach and the traditional two-stage approach. In general, at very high compression rates, the bijection compression approach yields superior results, both perceptually and in the PSNR distortion measure.
While the present invention has been described in connection with the preferred embodiments of the various figures, it is to be understood that other similar embodiments may be used or modifications and additions may be made to the described embodiment for performing the same function of the present invention without deviating therefrom. Therefore, the present invention should not be limited to any single embodiment, but rather construed in breadth and scope in accordance with the recitation of the appended claims.
Assuming a simple mathematical model for the spatial and spectral correlations, it is shown here that energy compaction can be increased by combining multispectral planes. While the model does not accurately model the correlations of multispectral imagery, it does indicate how the bijection map can improve compression. Consider a bispectral sensor measuring the reflectance of the same scene at two different frequencies. Assume that the received signals are stationary and have been sampled at the same rate. Let f(n)=[f1(n), f2(n)]T n=1, . . . N, be the bispectral image where the one-dimensional spatial indexing variable “n” is used to denote pixel location for notational simplicity. For a given spectral location, the signal forms the vector fi=[fi(1), fi(2), . . . , fi(N)]T for i=1,2. The spatial correlation matrices for each spectral signal is given by
where a simple spatial correlation model is assumed for mathematical tractability. The correlation structure of the autocorrelation matrix may not be particularly accurate for multispectral imagery; however, the assumption of the more or less the same autocorrelation is a valid assumption as the multispectral images will be, more or less, similar. Since both f1 and f2 view the same scene, cross-spectral correlation between them exists. For mathematical tractability, assume that the cross-spectral correlation C=E(f1f2T) is given by
where ρ0>ρ is assumed. Again, the particular correlations may not be reasonable, but the cross-correlation structure is described exactly for this simple case i=1,2. Now, consider the two encoding strategies: (a) each signal is coded individually and (b) a virtual signal is formed by a bijection mapping followed by scalar encoding. For simplicity, the energy preserved in the reconstructed signals after encoding is used as a performance measure. Recall that a signal's energy is given by the sum of the eigenvalues of its correlation matrix. The eigenvalues of Ri, i=1,2, can be computed as:
Note that if each signal is individually compressed at a rate 1:N, the maximum energy preserved in each signal is given by the highest eigenvalue, namely, λ1=r0+Nr. Now, consider the case where both signals are mapped into one larger virtual signal, g=[f1T, f2T]T. The correlation matrix of g is given by
whose eigenvalues are
For a compression rate of 1:N, the maximum energy preserved is determined by the two highest eigenvalues, namely:
From (4) and (7), the compression of the virtual image always provides higher energy preservation suggesting the potential the advantage of bijection mappings.
The normalization factors given in Equation (6) are derived here. Let fk be k-th spectral component of a multispectral image. For notational simplicity, the pixel's spatial position index has been dropped. We want to normalize this component such that the resultant normalized component, denoted by {tilde over (f)}k, has the same dynamic range than the other normalized components. Let
Note that the average is performed for each spatial position m=(m1,m2). We want to normalize fk with respect to
minσk,μkE({tilde over (f)}k−
It can be easily shown that the values of σk and μk that minimize the MSE between the k th normalized component and the spectral average image are given by:
where cov(x,y), var(x) and
This application claims rights under U.S. Patent Application Ser. No. 60/360,457, filed Feb. 28, 2002 under 35 U.S.C. § 19(e).
The U.S. Government may have rights in this application as a result of activities conducted under Government Contract No. DAAL 01-96-2-0002 awarded by the Department of the Army.
Number | Name | Date | Kind |
---|---|---|---|
5737448 | Gardos | Apr 1998 | A |
5963935 | Ozbutun et al. | Oct 1999 | A |
6081800 | Ozbutun et al. | Jun 2000 | A |
6879716 | Ishibashi | Apr 2005 | B1 |
7035457 | Ishibashi | Apr 2006 | B2 |
Number | Date | Country | |
---|---|---|---|
60360457 | Feb 2002 | US |