Not Applicable
Not Applicable
1. Field of Invention
This invention relates to methods for processing image data collected over a body of water to minimize or reduce noise and glints thereby increasing the detection of objects below the waters surface.
2. Description of Related Art
Images acquired through 3-chip cameras are processed in such a way as to enable target detection either with naked eye or with machine. The acquired raw images usually have a large amount of glints, sensor noise, and electronic circuit noise that make the detection difficult. Unfortunately without processing of these images to minimize the noise and glints objects below the waters surface cannot be clearly detected.
Consequently, there is a need in the imaging industry for method of processing data that allow increased detection of objects below the surface of the water.
One aspect of the present invention is a processing method to increase the detection of objects below the surface of the water comprising the steps of:
In another aspect of the present invention a methods is provided that is a processing method to increase the detection of objects below the surface of the water comprising the steps of:
In still another aspect of the present invention a methods is provided that is a processing method to increase the detection of objects below the surface of the water comprising the steps of:
In yet another aspect of the present invention a methods is provided that is a processing method to increase the detection of objects below the surface of the water comprising the steps of:
In another embodiment the processing method may further comprise applying post-processing to enhance the subtracted image components or applying detection algorithm to the post-processed image components.
Unless defined otherwise, all terms used herein have the same meaning as are commonly understood by one of skill in the art to which this invention belongs. All patents, patent applications and publications referred to throughout the disclosure herein are incorporated by reference in their entirety. In the event that there is a plurality of definitions for a term herein, those in this section prevail.
The present invention is a processing method used for the detection of targets below the surface of the water using color images. These processing methods work with either two or three component images. In either case, the processing methods can be classified as optimal or suboptimal. Optimal processing methods use the statistics of the component images while the suboptimal ones use ad hoc parameters.
In optimal processing, the camera sensors produce intensity images with differing intensity values in the red, green, and blue spectral regions depending on the reflective properties of the waters surface, water column, and target. The main idea behind target detection, therefore, is to subtract one component image from another so that the difference image retains only the target since the light penetrates water with differing diffusion depth at the three spectral regions. Preferably the signal from the surface reflection has been completely removed. It should be pointed out that the contrast due to the target between the component images is very small and depends on the penetration depth of light in the spectral regions. Thus, the process methods discriminate a very small contrast from image components. The ability to detect such a miniscule contrast depends on 1) the sensor dynamic range, 2) effectiveness of the preprocessing methods in eliminating glint and surface reflections, and 3) detection algorithm itself. Our processing methods address the third element primarily because the first is a limitation of the sensor and the second can be eliminated with standard methods known to those skilled in the art.
The processing methods of the present invention may be applied to two channel and three channel imaging. The processing methods for two channel imaging may be used to minimize the mean square error, error variance, correlation and/or covariance. To minimize the mean square error a scaled image component is subtracted from another to enhance the low contrast of the target. Consider the red and green channels. Let R(m,n) and G(m,n) be the red and green channel pixels at location(m,n). Then, define the difference pixel eR(m,n) at the same pixel location as
eR(m,n)=R(m,n)−αRG(m,n) Equation 1
We want to determine the scale factor αR so as to minimize the mean square error
The optimal coefficient α*R is obtained by setting the derivative of εR with respect to αR, to zero. This results in the optimal coefficient value as given by
Note that both the numerator and denominator have the same normalizing factor 1/MN and can, therefore, be omitted. The numerator of Equation 2 is a measure of the correlation between the red and green channels.
In a similar manner we can obtain the optimal scaling coefficients for the green-blue and blue-red channels and are given, respectively, by
Minimization of the error variance εR instead of minimizing the mean square error, is given by
where μR is given by
This results in the optimal coefficient, which is expressed as:
The numerator of Equation 6 is a measure of the covariance rather than correlation and the denominator is the variance rather than sum of squares. Following the same argument for the red-green channel, we obtain the optimal coefficients for the green-blue and blue-red channels as:
While the above procedures determine the optimal coefficients by minimizing the mean square error or the error variance, there exists an alternative minimization criterion to compute the optimal weights. This is the minimum correlation between the difference channels. Consider the two error images corresponding to the red and green channels, as given by
eR(m,n)=R(m,n)−αR1G(m,n), eG(m,n)=G(m,n)−αG1B(m,n) Equation 8
Determination of the two coefficients, αR1 and αG1, by minimizing the cross correlation is expressed as
Using the standard procedure for the minimization, the following normal equations are obtained:
Since Equation 11 and Equation 14 are distinct as are the other equations, by adding them α*R1 can be resolved by
In a like manner, the optimal coefficients for the green and blue channels are given by:
These are listed in Table 1.
By minimizing the covariance instead of the correlation between the channels, we obtain another set of optimal coefficients. These are identical in form to the coefficients in Equation 11 and Equation 14 except that each variable is replaced by its mean-removed value. These are listed in the Table 1.
The processing methods for three channel imaging may also be used to minimize the mean square error, error variance, correlation and/or covariance. In the three-channel imaging, all the three channels are considered to generate the difference image. Consequently, Equation 1 is rewritten as:
eR(m,n)=R(m,n)−αRG(m,n)−βRB(m,n) Equation 19
Similarly, the other two difference channels are given by
eG(m,n)=G(m,n)−αGB(m,n)−βGR(m,n) Equation 20
eB(m,n)=B(m,n)−αBR(m,n)−βBG(m,n) Equation 21
Similar to the two-channel case, the minimization of the mean square error gives us the normal equations for the optimal coefficients:
The optimal coefficients will be identical to those in Equation 22 through Equation 24, for the three-channel case, that yield the minimum error variance except that each term must be replaced by its mean removed value. These are listed in Table 2.
The difference channels are as defined in Equation 19 to Equation 21. Minimization of the pair-wise correlations is as follows.
By setting the derivatives of the three terms in Equation 25 with respect to the three coefficients the following equations are obtained.
Since equations (29) and (34) are identical we retain one of them. Equations (27) and (36) involve the same set of variables but are distinct. Therefore, we form a new equation by subtracting (27) from (36) to get
Now equations (29) and (38) can be consolidated as:
In Equation (39), the double subscripts are omitted for want of space. In a similar manner, we arrive at the rest of the equations:
The optimal coefficients for the three-channel minimum correlation case are obtained by solving the normal equations (39) through (41).
For the minimum covariance case the optimal coefficients will be identical to those corresponding to the minimum correlation case, except that each entry in the Equation 39 through Equation 41 must be replaced by its mean-removed value. These are listed in Table 2.
In suboptimal processing there is no cost function involved, the coefficient value is chosen on an ad hoc basis and the resulting mean square error or error variance is not necessarily the minimum. Further, only two components are used in the differencing process.
The difference images are obtained by selecting the weights as the ratio of the mean values of the respective image components involved, as given by:
In, the coefficients are chosen as the ratio of the standard deviations of the component images and the weights are given by:
For a third case we choose the ratio of the maximum component values as the coefficient values. Thus,
The image may be processed in whole or in sub-blocks. The pre-detection processing methods just described can be applied to the whole image or they can be applied to sub-blocks of an image. Images in general and video sequence images in particular, are non-stationary. This implies that the statistics change over regions within an image. To exploit non-stationary of images, it is advantageous to apply the pre-detection processing methods to subimages. As shown in
Quad-tree decomposition is an efficient procedure to decompose an image into sub-blocks. The only requirement is that the image height and width be integer powers of 2. In this procedure one starts with a given image as the root node of a tree and a suitable metric for the whole image is calculated. Next the metrics for the four sub-blocks, each of size ¼th the size of the whole image are computed. A decision is made to split each sub-block sequentially depending on whether that sub-block metric is greater than the metric of its parent node. This process is continued until the specified sub-block size is reached. The minimum number of sub-blocks is 4.
The contrast of the processed component images can be improved by contrast stretching. The contrast regions are automatically selected using the image statistics such as the image histogram.
Once the images are processed we need to draw the boundaries of the targets present in the scene for further data collection such as target classification. This involves first smoothing of the processed images. Smoothing generally tends to blur the image. Hence we use special, non-blurring filtering such as the diffusion filtering to smooth the processed images. Next an edge detection processing method, typically Canny's method, is used followed by morphological processing to calculate properties of the object boundaries.
The acquired raw images usually contain a number of pixels that are saturated. These may occur as isolated single pixels or as clusters of pixels scattered over the image. These clusters of saturated pixels are known as glints. The presence of glints reduces the contrast of areas of interest, especially if the glints occupy a sizeable fraction of the image area. Moreover, glints dominate the statistical quantities of interest due to their very large values. Therefore, images must be preprocessed to remove the presence of glints. Preprocessing is also necessary to reduce image noise. In addition to glints the glare or surface illumination, which is due to the illumination source must be reduced.
Preprocessing images in the wavelet domain is effective in removing glints and noise from the images. It also has the advantage of computational savings. Specifically, applications of two-dimensional Discrete Wavelet Transform (DWT) once to the image decomposes it into one approximation and three detail coefficients, each ¼th the size of the original image. It has been found that biorthogonal wavelet performs better than orthogonal wavelet. The approximation coefficients contain the information in the original image at a lower resolution. Hence glint removal operation is performed on the approximation coefficients. As pointed out, since the approximation component of the DWT coefficients is only ¼th the size of the original image, computational saving is achieved. As the detail coefficients carry edge information in the original image at lower resolutions, image denoising (a term used to imply the removal of noise) is effected in these components. Once glint removal and denoising operations are performed, the image is reconstructed by applying inverse 2D DWT (IDWT) to the DWT coefficients.
The detail components of the 2D DWT of each channel image contain edge information as well as noise. Noise coefficients have amplitudes according to a Laplacian distribution and are usually large while edges have smaller amplitudes. Thus noise is reduced by thresholding the detail coefficients in the wavelet domain. These thresholds are determined based on the statistics of the detail coefficients.
Glints are specular reflections of the light source by the surface being imaged. Therefore, glints produce highly saturated pixels in all the channels whose size depend on various properties of the reflecting surface and it is random. Consequently, the approximation coefficients of the 2D DWT must be inspected. In one approach, glints may be removed or reduced using morphological processing to determine the boundaries of the glint regions, which in turn are determined by hard or soft thresholding. The thresholds are determined from the statistics of the approximation coefficients of the 2D DWT. Thresholding can be performed on each channel or on the dominant channel. Thresholding yields a binary image from which the boundaries of the glints are calculated. Glint regions are eliminated by replacing them with corresponding mean values in all the channels.
The surface and target reflections appear as multiplicative components in the acquired images. In order to selectively filter out the surface reflection from that of the target, image filtering may be performed in the log domain, as logarithm of product of signals equals sum of signal components. This type of filtering is called homomorphic filtering. Since the surface reflection is a lowpass signal while the target reflection is highpass, these two components can be selectively reduced using a suitable homomorphic filter. Thus, a homomorphic filter is a band-pass filter. After homomorphic filtering, the resulting image is smoothed and the pixels are exponentiated to obtain images in the pixel domain.
A. Acquiring an Image
A 24-inch diameter disc was submerged under the surface of the water at a depth of approximately 23 feet. An image is obtained of a desired area of water using C0610004 digital camera (Micro USA, Poway, Calif.), that captures the image as intensity values in the separate red, green and blue spectral regions. These data sets are stored in a portable Personal Computer as data points.
B. Converting Raw Data into Arrays
The raw data intensity values obtained in A above for the red, green and blue spectral regions are processed into arrays. The raw intensity values obtained in A above for the red, green and blue spectral regions are transferred from the camera storage buffer into the computer memory, one for each of the red, green and blue arrays. A variety of software is available that can perform this function, in this example Lab View software (National Instruments Corp., Austin, Tex.) was utilized to transfer the data. For example, the red array transferred to the computer memory is labeled R, the green array G and the blue array B. Each numerical value in each array in the computer memory is called a pixel or picture element and its numerical value can have 3, 4 or 5 decimal digits (e.g. the maximum value can vary from 255 to 65531). A pixel in the red array can be located by its row and column address as R(m,n), where m refers to the row and n to the column. Similarly, G(m,n) refers to the pixel in the green array with row address m and column address n. In a similar fashion, B(m,n) denotes a pixel in the blue array with row address m and column address n.
C. Converting Data Arrays into 2-Dimensional Discrete Wavelet Transform Arrays (2-D DWT Arrays).
Data arrays in the red, green and blue spectral regions may be converted into 2-D DWT arrays in two steps. For example, assume that the number of rows is M and number of columns N for the red array. In the first step each row of the red array is transformed into a corresponding row of DWT coefficients retaining every other point as given in Equations 1a and 1b obtaining two intermediate arrays of size M rows and N/2 columns:
In the second step, each of the arrays in (1a) and (1b) is again considered sequentially one column at a time and the corresponding DWT coefficients are computed retaining every other point as given in (2a), (2b), (2c) and (2d) obtaining a total of four arrays LL, LH, HL and HH of size each M/2 rows and N/2 columns:
This process is repeated on the green and blue arrays obtaining corresponding 2D DWT coefficient arrays similar to the red DWT coefficients.
D. Removing Glints and Electronic Noise.
The 2-D DWT array coefficients are examined one array at a time. For each subarray of DWT coefficients, the standard deviation is computed first as given by Equations (2e) and (2f):
Equations (2e) and (2f) are repeated using the DWT coefficient values corresponding to LH, HL and HH subarrays. The same procedure is used for the green and blue DWT coefficient arrays. For example, if a DWT coefficient value in the LL component of the red DWT coefficient array exceeds a value equal to three times the standard deviation of the LL coefficient array, then that coefficient value is replaced by a value equal to the average of coefficients within a 3×3 neighborhood of the coefficient being considered. This is repeated until all coefficients in the red LL DWT coefficient array are considered. The procedure is repeated for the green and blue arrays sequentially.
Data arrays are restored following glint and electronic noise reduction by performing a 2D-inverse DWT. For example, the 2D inverse DWT (IDWT) of the red coefficient array can be computed in two steps. In the first step each row of the red DWT coefficient array is doubled in size by inserting zero values between every pair of coefficients and then inverse transforming into a corresponding row of pixels as given in Equations 3a obtaining an intermediate array of size M/2 rows and N columns:
In the second step, the array in (3a) is again considered sequentially one column at a time, first doubling the number of rows to M by inserting zero values between every pair of DWT coefficients and then computing the IDWT, thus obtaining the corresponding pixels in the glint removed red array of size each M rows and N columns as given in (3b:
This process is repeated on the green and blue arrays obtaining corresponding modified pixel arrays similar to the red IDWT calculations.
E. Removal of Overall Surface Reflection.
The process used to remove the overall surface reflection requires that the logarithm of the data obtained in F be calculated, that this data be filtered and restoring the data by computing the inverse logarithm of the filtered images in the pixel domain. For the sake of illustration, assume the red pixel array after glint removal from step D to be R(m,n). Logarithm of this array is computed by computing the logarithm of each pixel using the logarithm math function built into the computer yielding equation (4a):
R1(m,n)=ln(R(m,n)+1) (4a)
In equation (4a), the pixel values are greater than or equal to zero. The logarithm array obtained as in equation (4a) is filtered by a filter whose impulse response is denoted h(m,n), resulting in a filtered array R2(m,n) as given by (4b):
The filtered pixel array R2(m,n) is then exponentiated obtaining a red image with surface reflections removed as given by equation (4c):
R3(m,n)=exp(R2(m,n))−1 (4c)
F. Computing the Optimal Multiplication Coefficient αR αG and αB.
The optimal multiplication coefficient αR in equation (1) to form the red-green difference image is calculated as follows:
eR(m,n)
eR(m,n)=R(m,n)−αRG(m,n) (1)
minimizing the Mean Square Error,
where M is the number of rows and N is the number of columns of the image producing the image of the object below the surface as shown in
The optimal multiplication coefficient αG in equation (2) to form the green-blue difference image is calculated as follows:
eG(m,n)
eG(m,n)=G(m,n)−αGB(m,n) (2)
minimizing the Mean Square Error,
where M is the number of rows and N is the number of columns of the image producing the image of the object below the surface as shown in
The optimal multiplication coefficient αB in equation (3) to form the blue-red difference image is calculated as follows:
eB(m,n)
eB(m,n)=B(m,n)−αBR(m,n) (3)
minimizing the Mean Square Error,
where M is the number of rows and N is the number of columns of the image producing the image of the object below the surface as shown in
This application is a continuation-in-part of provisional patent application Ser. No. 60/839,613 filed 23 Aug. 2006.
Number | Name | Date | Kind |
---|---|---|---|
5243541 | Ulich | Sep 1993 | A |
7474334 | Patterson | Jan 2009 | B1 |
Number | Date | Country | |
---|---|---|---|
20080049974 A1 | Feb 2008 | US |
Number | Date | Country | |
---|---|---|---|
60839613 | Aug 2006 | US |